Python-How to find duplicated name/document in mongo db?
I want to find the duplicated document in my mongodb based on name, I have the following code:
def Check_BFA_DB(options):
issue_list=
client = MongoClient(options.host, int(options.port))
db = client[options.db]
collection = db[options.collection]
names = ['$project': 'name':'$name']
name_cursor = collection.aggregate(names, cursor=)
for name in name_cursor:
issue_list.append(name)
print(name)
It will print all names, how can I print only the duplicated ones?
Appritiated for any help!
python mongodb pymongo
add a comment |
I want to find the duplicated document in my mongodb based on name, I have the following code:
def Check_BFA_DB(options):
issue_list=
client = MongoClient(options.host, int(options.port))
db = client[options.db]
collection = db[options.collection]
names = ['$project': 'name':'$name']
name_cursor = collection.aggregate(names, cursor=)
for name in name_cursor:
issue_list.append(name)
print(name)
It will print all names, how can I print only the duplicated ones?
Appritiated for any help!
python mongodb pymongo
add a comment |
I want to find the duplicated document in my mongodb based on name, I have the following code:
def Check_BFA_DB(options):
issue_list=
client = MongoClient(options.host, int(options.port))
db = client[options.db]
collection = db[options.collection]
names = ['$project': 'name':'$name']
name_cursor = collection.aggregate(names, cursor=)
for name in name_cursor:
issue_list.append(name)
print(name)
It will print all names, how can I print only the duplicated ones?
Appritiated for any help!
python mongodb pymongo
I want to find the duplicated document in my mongodb based on name, I have the following code:
def Check_BFA_DB(options):
issue_list=
client = MongoClient(options.host, int(options.port))
db = client[options.db]
collection = db[options.collection]
names = ['$project': 'name':'$name']
name_cursor = collection.aggregate(names, cursor=)
for name in name_cursor:
issue_list.append(name)
print(name)
It will print all names, how can I print only the duplicated ones?
Appritiated for any help!
python mongodb pymongo
python mongodb pymongo
asked Nov 16 '18 at 1:26
Jan TammJan Tamm
427
427
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The following query will show only duplicates:
db['collection_name'].aggregate(['$group': '_id':'$name', 'count': '$sum': 1, '$match': 'count': '$gt': 1])
How it works:
Step 1:
Go over the whole collection, and group the documents by the property called name
, and for each name count how many times it is used in the collection.
Step 2:
filter (using the keyword match
) only documents in which the count is greater than 1 (the gt
operator).
An example (written for mongo shell, but can be easily adapted for python):
db.a.insert(name: "name1")
db.a.insert(name: "name1")
db.a.insert(name: "name2")
db.a.aggregate(["$group": _id:"$name", count: "$sum": 1, $match: count: "$gt": 1])
Result is "_id" : "name1", "count" : 2
So your code should look something like this:
def Check_BFA_DB(options):
issue_list=
client = MongoClient(options.host, int(options.port))
db = client[options.db]
name_cursor = db[options.collection].aggregate([
'$group': '_id': '$name', 'count': '$sum': 1,
'$match': 'count': '$gt': 1
])
for document in name_cursor:
name = document['_id']
issue_list.append(name)
print(name)
BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330174%2fpython-how-to-find-duplicated-name-document-in-mongo-db%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The following query will show only duplicates:
db['collection_name'].aggregate(['$group': '_id':'$name', 'count': '$sum': 1, '$match': 'count': '$gt': 1])
How it works:
Step 1:
Go over the whole collection, and group the documents by the property called name
, and for each name count how many times it is used in the collection.
Step 2:
filter (using the keyword match
) only documents in which the count is greater than 1 (the gt
operator).
An example (written for mongo shell, but can be easily adapted for python):
db.a.insert(name: "name1")
db.a.insert(name: "name1")
db.a.insert(name: "name2")
db.a.aggregate(["$group": _id:"$name", count: "$sum": 1, $match: count: "$gt": 1])
Result is "_id" : "name1", "count" : 2
So your code should look something like this:
def Check_BFA_DB(options):
issue_list=
client = MongoClient(options.host, int(options.port))
db = client[options.db]
name_cursor = db[options.collection].aggregate([
'$group': '_id': '$name', 'count': '$sum': 1,
'$match': 'count': '$gt': 1
])
for document in name_cursor:
name = document['_id']
issue_list.append(name)
print(name)
BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()
add a comment |
The following query will show only duplicates:
db['collection_name'].aggregate(['$group': '_id':'$name', 'count': '$sum': 1, '$match': 'count': '$gt': 1])
How it works:
Step 1:
Go over the whole collection, and group the documents by the property called name
, and for each name count how many times it is used in the collection.
Step 2:
filter (using the keyword match
) only documents in which the count is greater than 1 (the gt
operator).
An example (written for mongo shell, but can be easily adapted for python):
db.a.insert(name: "name1")
db.a.insert(name: "name1")
db.a.insert(name: "name2")
db.a.aggregate(["$group": _id:"$name", count: "$sum": 1, $match: count: "$gt": 1])
Result is "_id" : "name1", "count" : 2
So your code should look something like this:
def Check_BFA_DB(options):
issue_list=
client = MongoClient(options.host, int(options.port))
db = client[options.db]
name_cursor = db[options.collection].aggregate([
'$group': '_id': '$name', 'count': '$sum': 1,
'$match': 'count': '$gt': 1
])
for document in name_cursor:
name = document['_id']
issue_list.append(name)
print(name)
BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()
add a comment |
The following query will show only duplicates:
db['collection_name'].aggregate(['$group': '_id':'$name', 'count': '$sum': 1, '$match': 'count': '$gt': 1])
How it works:
Step 1:
Go over the whole collection, and group the documents by the property called name
, and for each name count how many times it is used in the collection.
Step 2:
filter (using the keyword match
) only documents in which the count is greater than 1 (the gt
operator).
An example (written for mongo shell, but can be easily adapted for python):
db.a.insert(name: "name1")
db.a.insert(name: "name1")
db.a.insert(name: "name2")
db.a.aggregate(["$group": _id:"$name", count: "$sum": 1, $match: count: "$gt": 1])
Result is "_id" : "name1", "count" : 2
So your code should look something like this:
def Check_BFA_DB(options):
issue_list=
client = MongoClient(options.host, int(options.port))
db = client[options.db]
name_cursor = db[options.collection].aggregate([
'$group': '_id': '$name', 'count': '$sum': 1,
'$match': 'count': '$gt': 1
])
for document in name_cursor:
name = document['_id']
issue_list.append(name)
print(name)
BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()
The following query will show only duplicates:
db['collection_name'].aggregate(['$group': '_id':'$name', 'count': '$sum': 1, '$match': 'count': '$gt': 1])
How it works:
Step 1:
Go over the whole collection, and group the documents by the property called name
, and for each name count how many times it is used in the collection.
Step 2:
filter (using the keyword match
) only documents in which the count is greater than 1 (the gt
operator).
An example (written for mongo shell, but can be easily adapted for python):
db.a.insert(name: "name1")
db.a.insert(name: "name1")
db.a.insert(name: "name2")
db.a.aggregate(["$group": _id:"$name", count: "$sum": 1, $match: count: "$gt": 1])
Result is "_id" : "name1", "count" : 2
So your code should look something like this:
def Check_BFA_DB(options):
issue_list=
client = MongoClient(options.host, int(options.port))
db = client[options.db]
name_cursor = db[options.collection].aggregate([
'$group': '_id': '$name', 'count': '$sum': 1,
'$match': 'count': '$gt': 1
])
for document in name_cursor:
name = document['_id']
issue_list.append(name)
print(name)
BTW (not related to the question), python naming convention for function names is lowercase letters, so you might want to call it check_bfa_db()
edited Nov 16 '18 at 8:20
answered Nov 16 '18 at 8:13
HagaiHagai
643720
643720
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330174%2fpython-how-to-find-duplicated-name-document-in-mongo-db%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown