Lazy sorted function, timsort
Having a list with N (large number) elements:
from random import randint
eles = [randint(0, 10) for i in range(3000000)]
I'm trying to implement the best way (performance/resources spent) this function below:
def mosty(lst):
sort = sorted((v, k) for k, v in enumerate(lst))
count, maxi, last_ele, idxs = 0, 0, None,
for ele, idx in sort:
if(last_ele != ele):
count = 1
idxs =
idxs.append(idx)
if(last_ele == ele):
count += 1
if(maxi < count):
results = (ele, count, idxs)
maxi = count
last_ele = ele
return results
This function returns the most common element, number of occurrences, and the indexes where it was found.
Here is the benchmark with 300000 eles.
But I think I could improve, one of the reasons being python3 sorted
function (timsort), if it returned a generator I didn't have to loop through the list twice right?
My questions are:
Is there any way for this code to be optimized? How?
With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort
python python-3.x performance sorting
add a comment |
Having a list with N (large number) elements:
from random import randint
eles = [randint(0, 10) for i in range(3000000)]
I'm trying to implement the best way (performance/resources spent) this function below:
def mosty(lst):
sort = sorted((v, k) for k, v in enumerate(lst))
count, maxi, last_ele, idxs = 0, 0, None,
for ele, idx in sort:
if(last_ele != ele):
count = 1
idxs =
idxs.append(idx)
if(last_ele == ele):
count += 1
if(maxi < count):
results = (ele, count, idxs)
maxi = count
last_ele = ele
return results
This function returns the most common element, number of occurrences, and the indexes where it was found.
Here is the benchmark with 300000 eles.
But I think I could improve, one of the reasons being python3 sorted
function (timsort), if it returned a generator I didn't have to loop through the list twice right?
My questions are:
Is there any way for this code to be optimized? How?
With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort
python python-3.x performance sorting
Try if you can use generators in this place.
– Devendra Bhat
Nov 16 '18 at 9:37
What is that function trying to do exactly?
– SilverSlash
Nov 16 '18 at 9:39
relevant stackoverflow.com/a/4154954/6260170
– Chris_Rands
Nov 16 '18 at 9:41
you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so
– Chris_Rands
Nov 16 '18 at 9:43
add a comment |
Having a list with N (large number) elements:
from random import randint
eles = [randint(0, 10) for i in range(3000000)]
I'm trying to implement the best way (performance/resources spent) this function below:
def mosty(lst):
sort = sorted((v, k) for k, v in enumerate(lst))
count, maxi, last_ele, idxs = 0, 0, None,
for ele, idx in sort:
if(last_ele != ele):
count = 1
idxs =
idxs.append(idx)
if(last_ele == ele):
count += 1
if(maxi < count):
results = (ele, count, idxs)
maxi = count
last_ele = ele
return results
This function returns the most common element, number of occurrences, and the indexes where it was found.
Here is the benchmark with 300000 eles.
But I think I could improve, one of the reasons being python3 sorted
function (timsort), if it returned a generator I didn't have to loop through the list twice right?
My questions are:
Is there any way for this code to be optimized? How?
With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort
python python-3.x performance sorting
Having a list with N (large number) elements:
from random import randint
eles = [randint(0, 10) for i in range(3000000)]
I'm trying to implement the best way (performance/resources spent) this function below:
def mosty(lst):
sort = sorted((v, k) for k, v in enumerate(lst))
count, maxi, last_ele, idxs = 0, 0, None,
for ele, idx in sort:
if(last_ele != ele):
count = 1
idxs =
idxs.append(idx)
if(last_ele == ele):
count += 1
if(maxi < count):
results = (ele, count, idxs)
maxi = count
last_ele = ele
return results
This function returns the most common element, number of occurrences, and the indexes where it was found.
Here is the benchmark with 300000 eles.
But I think I could improve, one of the reasons being python3 sorted
function (timsort), if it returned a generator I didn't have to loop through the list twice right?
My questions are:
Is there any way for this code to be optimized? How?
With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort
python python-3.x performance sorting
python python-3.x performance sorting
asked Nov 16 '18 at 9:33
Hula HulaHula Hula
154110
154110
Try if you can use generators in this place.
– Devendra Bhat
Nov 16 '18 at 9:37
What is that function trying to do exactly?
– SilverSlash
Nov 16 '18 at 9:39
relevant stackoverflow.com/a/4154954/6260170
– Chris_Rands
Nov 16 '18 at 9:41
you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so
– Chris_Rands
Nov 16 '18 at 9:43
add a comment |
Try if you can use generators in this place.
– Devendra Bhat
Nov 16 '18 at 9:37
What is that function trying to do exactly?
– SilverSlash
Nov 16 '18 at 9:39
relevant stackoverflow.com/a/4154954/6260170
– Chris_Rands
Nov 16 '18 at 9:41
you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so
– Chris_Rands
Nov 16 '18 at 9:43
Try if you can use generators in this place.
– Devendra Bhat
Nov 16 '18 at 9:37
Try if you can use generators in this place.
– Devendra Bhat
Nov 16 '18 at 9:37
What is that function trying to do exactly?
– SilverSlash
Nov 16 '18 at 9:39
What is that function trying to do exactly?
– SilverSlash
Nov 16 '18 at 9:39
relevant stackoverflow.com/a/4154954/6260170
– Chris_Rands
Nov 16 '18 at 9:41
relevant stackoverflow.com/a/4154954/6260170
– Chris_Rands
Nov 16 '18 at 9:41
you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so
– Chris_Rands
Nov 16 '18 at 9:43
you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so
– Chris_Rands
Nov 16 '18 at 9:43
add a comment |
2 Answers
2
active
oldest
votes
did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):
from collections import Counter
from random import randint
eles = [randint(0, 10) for i in range(30)]
counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]
print(most_common_element, number_of_occurrences, indices)
and the indices (the second iteration) can be found lazily in a generator expression:
indices = (i for i, x in enumerate(eles) if x == most_common_element)
if you need to care about multiple elements being the most common, this might work for you:
from collections import Counter
from itertools import groupby
from operator import itemgetter
eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]
counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common
for i, x in enumerate(eles):
if x in indices:
indices[x].append(i)
print(most_common)
print(indices)
you could of course still make the indices
lazy the same way as above.
@Chris_Rands was not quite there yet... sorry. updated now.
– hiro protagonist
Nov 16 '18 at 9:50
you fail to handle joint top elements e.g.eles = [1,1,2,2]
– Chris_Rands
Nov 16 '18 at 9:52
that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...
– hiro protagonist
Nov 16 '18 at 9:55
add a comment |
If you are willing to use numpy, then you can do something like this:
arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)
HTH.
add a comment |
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53334994%2flazy-sorted-function-timsort%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):
from collections import Counter
from random import randint
eles = [randint(0, 10) for i in range(30)]
counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]
print(most_common_element, number_of_occurrences, indices)
and the indices (the second iteration) can be found lazily in a generator expression:
indices = (i for i, x in enumerate(eles) if x == most_common_element)
if you need to care about multiple elements being the most common, this might work for you:
from collections import Counter
from itertools import groupby
from operator import itemgetter
eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]
counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common
for i, x in enumerate(eles):
if x in indices:
indices[x].append(i)
print(most_common)
print(indices)
you could of course still make the indices
lazy the same way as above.
@Chris_Rands was not quite there yet... sorry. updated now.
– hiro protagonist
Nov 16 '18 at 9:50
you fail to handle joint top elements e.g.eles = [1,1,2,2]
– Chris_Rands
Nov 16 '18 at 9:52
that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...
– hiro protagonist
Nov 16 '18 at 9:55
add a comment |
did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):
from collections import Counter
from random import randint
eles = [randint(0, 10) for i in range(30)]
counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]
print(most_common_element, number_of_occurrences, indices)
and the indices (the second iteration) can be found lazily in a generator expression:
indices = (i for i, x in enumerate(eles) if x == most_common_element)
if you need to care about multiple elements being the most common, this might work for you:
from collections import Counter
from itertools import groupby
from operator import itemgetter
eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]
counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common
for i, x in enumerate(eles):
if x in indices:
indices[x].append(i)
print(most_common)
print(indices)
you could of course still make the indices
lazy the same way as above.
@Chris_Rands was not quite there yet... sorry. updated now.
– hiro protagonist
Nov 16 '18 at 9:50
you fail to handle joint top elements e.g.eles = [1,1,2,2]
– Chris_Rands
Nov 16 '18 at 9:52
that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...
– hiro protagonist
Nov 16 '18 at 9:55
add a comment |
did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):
from collections import Counter
from random import randint
eles = [randint(0, 10) for i in range(30)]
counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]
print(most_common_element, number_of_occurrences, indices)
and the indices (the second iteration) can be found lazily in a generator expression:
indices = (i for i, x in enumerate(eles) if x == most_common_element)
if you need to care about multiple elements being the most common, this might work for you:
from collections import Counter
from itertools import groupby
from operator import itemgetter
eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]
counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common
for i, x in enumerate(eles):
if x in indices:
indices[x].append(i)
print(most_common)
print(indices)
you could of course still make the indices
lazy the same way as above.
did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):
from collections import Counter
from random import randint
eles = [randint(0, 10) for i in range(30)]
counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]
print(most_common_element, number_of_occurrences, indices)
and the indices (the second iteration) can be found lazily in a generator expression:
indices = (i for i, x in enumerate(eles) if x == most_common_element)
if you need to care about multiple elements being the most common, this might work for you:
from collections import Counter
from itertools import groupby
from operator import itemgetter
eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]
counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common
for i, x in enumerate(eles):
if x in indices:
indices[x].append(i)
print(most_common)
print(indices)
you could of course still make the indices
lazy the same way as above.
edited Nov 16 '18 at 14:06
answered Nov 16 '18 at 9:39
hiro protagonisthiro protagonist
20.1k74162
20.1k74162
@Chris_Rands was not quite there yet... sorry. updated now.
– hiro protagonist
Nov 16 '18 at 9:50
you fail to handle joint top elements e.g.eles = [1,1,2,2]
– Chris_Rands
Nov 16 '18 at 9:52
that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...
– hiro protagonist
Nov 16 '18 at 9:55
add a comment |
@Chris_Rands was not quite there yet... sorry. updated now.
– hiro protagonist
Nov 16 '18 at 9:50
you fail to handle joint top elements e.g.eles = [1,1,2,2]
– Chris_Rands
Nov 16 '18 at 9:52
that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...
– hiro protagonist
Nov 16 '18 at 9:55
@Chris_Rands was not quite there yet... sorry. updated now.
– hiro protagonist
Nov 16 '18 at 9:50
@Chris_Rands was not quite there yet... sorry. updated now.
– hiro protagonist
Nov 16 '18 at 9:50
you fail to handle joint top elements e.g.
eles = [1,1,2,2]
– Chris_Rands
Nov 16 '18 at 9:52
you fail to handle joint top elements e.g.
eles = [1,1,2,2]
– Chris_Rands
Nov 16 '18 at 9:52
that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...
– hiro protagonist
Nov 16 '18 at 9:55
that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...
– hiro protagonist
Nov 16 '18 at 9:55
add a comment |
If you are willing to use numpy, then you can do something like this:
arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)
HTH.
add a comment |
If you are willing to use numpy, then you can do something like this:
arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)
HTH.
add a comment |
If you are willing to use numpy, then you can do something like this:
arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)
HTH.
If you are willing to use numpy, then you can do something like this:
arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)
HTH.
edited Nov 18 '18 at 11:51
answered Nov 16 '18 at 10:12
Deepak SainiDeepak Saini
1,629816
1,629816
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53334994%2flazy-sorted-function-timsort%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Try if you can use generators in this place.
– Devendra Bhat
Nov 16 '18 at 9:37
What is that function trying to do exactly?
– SilverSlash
Nov 16 '18 at 9:39
relevant stackoverflow.com/a/4154954/6260170
– Chris_Rands
Nov 16 '18 at 9:41
you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so
– Chris_Rands
Nov 16 '18 at 9:43