Python-How to find duplicated name/document in mongo db?










0















I want to find the duplicated document in my mongodb based on name, I have the following code:



def Check_BFA_DB(options):
issue_list=
client = MongoClient(options.host, int(options.port))
db = client[options.db]
collection = db[options.collection]
names = ['$project': 'name':'$name']
name_cursor = collection.aggregate(names, cursor=)
for name in name_cursor:
issue_list.append(name)
print(name)


It will print all names, how can I print only the duplicated ones?



Appritiated for any help!










share|improve this question


























    0















    I want to find the duplicated document in my mongodb based on name, I have the following code:



    def Check_BFA_DB(options):
    issue_list=
    client = MongoClient(options.host, int(options.port))
    db = client[options.db]
    collection = db[options.collection]
    names = ['$project': 'name':'$name']
    name_cursor = collection.aggregate(names, cursor=)
    for name in name_cursor:
    issue_list.append(name)
    print(name)


    It will print all names, how can I print only the duplicated ones?



    Appritiated for any help!










    share|improve this question
























      0












      0








      0








      I want to find the duplicated document in my mongodb based on name, I have the following code:



      def Check_BFA_DB(options):
      issue_list=
      client = MongoClient(options.host, int(options.port))
      db = client[options.db]
      collection = db[options.collection]
      names = ['$project': 'name':'$name']
      name_cursor = collection.aggregate(names, cursor=)
      for name in name_cursor:
      issue_list.append(name)
      print(name)


      It will print all names, how can I print only the duplicated ones?



      Appritiated for any help!










      share|improve this question














      I want to find the duplicated document in my mongodb based on name, I have the following code:



      def Check_BFA_DB(options):
      issue_list=
      client = MongoClient(options.host, int(options.port))
      db = client[options.db]
      collection = db[options.collection]
      names = ['$project': 'name':'$name']
      name_cursor = collection.aggregate(names, cursor=)
      for name in name_cursor:
      issue_list.append(name)
      print(name)


      It will print all names, how can I print only the duplicated ones?



      Appritiated for any help!







      python mongodb pymongo






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 16 '18 at 1:26









      Jan TammJan Tamm

      427




      427






















          1 Answer
          1






          active

          oldest

          votes


















          1














          The following query will show only duplicates:



          db['collection_name'].aggregate(['$group': '_id':'$name', 'count': '$sum': 1, '$match': 'count': '$gt': 1])


          How it works:



          Step 1:
          Go over the whole collection, and group the documents by the property called name, and for each name count how many times it is used in the collection.



          Step 2:
          filter (using the keyword match) only documents in which the count is greater than 1 (the gt operator).



          An example (written for mongo shell, but can be easily adapted for python):



          db.a.insert(name: "name1")
          db.a.insert(name: "name1")
          db.a.insert(name: "name2")
          db.a.aggregate(["$group": _id:"$name", count: "$sum": 1, $match: count: "$gt": 1])


          Result is "_id" : "name1", "count" : 2



          So your code should look something like this:



          def Check_BFA_DB(options):
          issue_list=
          client = MongoClient(options.host, int(options.port))
          db = client[options.db]
          name_cursor = db[options.collection].aggregate([
          '$group': '_id': '$name', 'count': '$sum': 1,
          '$match': 'count': '$gt': 1
          ])

          for document in name_cursor:
          name = document['_id']
          issue_list.append(name)
          print(name)


          BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()






          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330174%2fpython-how-to-find-duplicated-name-document-in-mongo-db%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            The following query will show only duplicates:



            db['collection_name'].aggregate(['$group': '_id':'$name', 'count': '$sum': 1, '$match': 'count': '$gt': 1])


            How it works:



            Step 1:
            Go over the whole collection, and group the documents by the property called name, and for each name count how many times it is used in the collection.



            Step 2:
            filter (using the keyword match) only documents in which the count is greater than 1 (the gt operator).



            An example (written for mongo shell, but can be easily adapted for python):



            db.a.insert(name: "name1")
            db.a.insert(name: "name1")
            db.a.insert(name: "name2")
            db.a.aggregate(["$group": _id:"$name", count: "$sum": 1, $match: count: "$gt": 1])


            Result is "_id" : "name1", "count" : 2



            So your code should look something like this:



            def Check_BFA_DB(options):
            issue_list=
            client = MongoClient(options.host, int(options.port))
            db = client[options.db]
            name_cursor = db[options.collection].aggregate([
            '$group': '_id': '$name', 'count': '$sum': 1,
            '$match': 'count': '$gt': 1
            ])

            for document in name_cursor:
            name = document['_id']
            issue_list.append(name)
            print(name)


            BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()






            share|improve this answer





























              1














              The following query will show only duplicates:



              db['collection_name'].aggregate(['$group': '_id':'$name', 'count': '$sum': 1, '$match': 'count': '$gt': 1])


              How it works:



              Step 1:
              Go over the whole collection, and group the documents by the property called name, and for each name count how many times it is used in the collection.



              Step 2:
              filter (using the keyword match) only documents in which the count is greater than 1 (the gt operator).



              An example (written for mongo shell, but can be easily adapted for python):



              db.a.insert(name: "name1")
              db.a.insert(name: "name1")
              db.a.insert(name: "name2")
              db.a.aggregate(["$group": _id:"$name", count: "$sum": 1, $match: count: "$gt": 1])


              Result is "_id" : "name1", "count" : 2



              So your code should look something like this:



              def Check_BFA_DB(options):
              issue_list=
              client = MongoClient(options.host, int(options.port))
              db = client[options.db]
              name_cursor = db[options.collection].aggregate([
              '$group': '_id': '$name', 'count': '$sum': 1,
              '$match': 'count': '$gt': 1
              ])

              for document in name_cursor:
              name = document['_id']
              issue_list.append(name)
              print(name)


              BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()






              share|improve this answer



























                1












                1








                1







                The following query will show only duplicates:



                db['collection_name'].aggregate(['$group': '_id':'$name', 'count': '$sum': 1, '$match': 'count': '$gt': 1])


                How it works:



                Step 1:
                Go over the whole collection, and group the documents by the property called name, and for each name count how many times it is used in the collection.



                Step 2:
                filter (using the keyword match) only documents in which the count is greater than 1 (the gt operator).



                An example (written for mongo shell, but can be easily adapted for python):



                db.a.insert(name: "name1")
                db.a.insert(name: "name1")
                db.a.insert(name: "name2")
                db.a.aggregate(["$group": _id:"$name", count: "$sum": 1, $match: count: "$gt": 1])


                Result is "_id" : "name1", "count" : 2



                So your code should look something like this:



                def Check_BFA_DB(options):
                issue_list=
                client = MongoClient(options.host, int(options.port))
                db = client[options.db]
                name_cursor = db[options.collection].aggregate([
                '$group': '_id': '$name', 'count': '$sum': 1,
                '$match': 'count': '$gt': 1
                ])

                for document in name_cursor:
                name = document['_id']
                issue_list.append(name)
                print(name)


                BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()






                share|improve this answer















                The following query will show only duplicates:



                db['collection_name'].aggregate(['$group': '_id':'$name', 'count': '$sum': 1, '$match': 'count': '$gt': 1])


                How it works:



                Step 1:
                Go over the whole collection, and group the documents by the property called name, and for each name count how many times it is used in the collection.



                Step 2:
                filter (using the keyword match) only documents in which the count is greater than 1 (the gt operator).



                An example (written for mongo shell, but can be easily adapted for python):



                db.a.insert(name: "name1")
                db.a.insert(name: "name1")
                db.a.insert(name: "name2")
                db.a.aggregate(["$group": _id:"$name", count: "$sum": 1, $match: count: "$gt": 1])


                Result is "_id" : "name1", "count" : 2



                So your code should look something like this:



                def Check_BFA_DB(options):
                issue_list=
                client = MongoClient(options.host, int(options.port))
                db = client[options.db]
                name_cursor = db[options.collection].aggregate([
                '$group': '_id': '$name', 'count': '$sum': 1,
                '$match': 'count': '$gt': 1
                ])

                for document in name_cursor:
                name = document['_id']
                issue_list.append(name)
                print(name)


                BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 16 '18 at 8:20

























                answered Nov 16 '18 at 8:13









                HagaiHagai

                643720




                643720





























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330174%2fpython-how-to-find-duplicated-name-document-in-mongo-db%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Top Tejano songwriter Luis Silva dead of heart attack at 64

                    政党

                    天津地下鉄3号線