Lazy sorted function, timsort

Having a list with N (large number) elements:

from random import randint

eles = [randint(0, 10) for i in range(3000000)]

I'm trying to implement the best way (performance/resources spent) this function below:

def mosty(lst):
 sort = sorted((v, k) for k, v in enumerate(lst))
 count, maxi, last_ele, idxs = 0, 0, None, 
 for ele, idx in sort:
 if(last_ele != ele):
 count = 1
 idxs = 
 idxs.append(idx)
 if(last_ele == ele):
 count += 1
 if(maxi < count):
 results = (ele, count, idxs)
 maxi = count
 last_ele = ele
 return results

This function returns the most common element, number of occurrences, and the indexes where it was found.

Here is the benchmark with 300000 eles.

But I think I could improve, one of the reasons being python3 sorted function (timsort), if it returned a generator I didn't have to loop through the list twice right?

My questions are:

Is there any way for this code to be optimized? How?

With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort

asked Nov 16 '18 at 9:33

Hula Hula

154110

Try if you can use generators in this place.

– Devendra Bhat
Nov 16 '18 at 9:37

What is that function trying to do exactly?

– SilverSlash
Nov 16 '18 at 9:39

relevant stackoverflow.com/a/4154954/6260170

– Chris_Rands
Nov 16 '18 at 9:41

you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

– Chris_Rands
Nov 16 '18 at 9:43

add a comment |

Having a list with N (large number) elements:

from random import randint

eles = [randint(0, 10) for i in range(3000000)]

I'm trying to implement the best way (performance/resources spent) this function below:

def mosty(lst):
 sort = sorted((v, k) for k, v in enumerate(lst))
 count, maxi, last_ele, idxs = 0, 0, None, 
 for ele, idx in sort:
 if(last_ele != ele):
 count = 1
 idxs = 
 idxs.append(idx)
 if(last_ele == ele):
 count += 1
 if(maxi < count):
 results = (ele, count, idxs)
 maxi = count
 last_ele = ele
 return results

This function returns the most common element, number of occurrences, and the indexes where it was found.

Here is the benchmark with 300000 eles.

But I think I could improve, one of the reasons being python3 sorted function (timsort), if it returned a generator I didn't have to loop through the list twice right?

My questions are:

Is there any way for this code to be optimized? How?

With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort

asked Nov 16 '18 at 9:33

Hula Hula

154110

Try if you can use generators in this place.

– Devendra Bhat
Nov 16 '18 at 9:37

What is that function trying to do exactly?

– SilverSlash
Nov 16 '18 at 9:39

relevant stackoverflow.com/a/4154954/6260170

– Chris_Rands
Nov 16 '18 at 9:41

you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

– Chris_Rands
Nov 16 '18 at 9:43

add a comment |

Having a list with N (large number) elements:

from random import randint

eles = [randint(0, 10) for i in range(3000000)]

I'm trying to implement the best way (performance/resources spent) this function below:

def mosty(lst):
 sort = sorted((v, k) for k, v in enumerate(lst))
 count, maxi, last_ele, idxs = 0, 0, None, 
 for ele, idx in sort:
 if(last_ele != ele):
 count = 1
 idxs = 
 idxs.append(idx)
 if(last_ele == ele):
 count += 1
 if(maxi < count):
 results = (ele, count, idxs)
 maxi = count
 last_ele = ele
 return results

This function returns the most common element, number of occurrences, and the indexes where it was found.

Here is the benchmark with 300000 eles.

But I think I could improve, one of the reasons being python3 sorted function (timsort), if it returned a generator I didn't have to loop through the list twice right?

My questions are:

Is there any way for this code to be optimized? How?

With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort

asked Nov 16 '18 at 9:33

Hula Hula

154110

Having a list with N (large number) elements:

from random import randint

eles = [randint(0, 10) for i in range(3000000)]

I'm trying to implement the best way (performance/resources spent) this function below:

def mosty(lst):
 sort = sorted((v, k) for k, v in enumerate(lst))
 count, maxi, last_ele, idxs = 0, 0, None, 
 for ele, idx in sort:
 if(last_ele != ele):
 count = 1
 idxs = 
 idxs.append(idx)
 if(last_ele == ele):
 count += 1
 if(maxi < count):
 results = (ele, count, idxs)
 maxi = count
 last_ele = ele
 return results

This function returns the most common element, number of occurrences, and the indexes where it was found.

Here is the benchmark with 300000 eles.

But I think I could improve, one of the reasons being python3 sorted function (timsort), if it returned a generator I didn't have to loop through the list twice right?

My questions are:

Is there any way for this code to be optimized? How?

With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort

python python-3.x performance sorting

asked Nov 16 '18 at 9:33

Hula Hula

154110

asked Nov 16 '18 at 9:33

Hula Hula

154110

asked Nov 16 '18 at 9:33

Hula Hula

154110

asked Nov 16 '18 at 9:33

Hula Hula

154110

asked Nov 16 '18 at 9:33

Hula Hula

154110

Try if you can use generators in this place.

– Devendra Bhat
Nov 16 '18 at 9:37

What is that function trying to do exactly?

– SilverSlash
Nov 16 '18 at 9:39

relevant stackoverflow.com/a/4154954/6260170

– Chris_Rands
Nov 16 '18 at 9:41

you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

– Chris_Rands
Nov 16 '18 at 9:43

add a comment |

Try if you can use generators in this place.

– Devendra Bhat
Nov 16 '18 at 9:37

What is that function trying to do exactly?

– SilverSlash
Nov 16 '18 at 9:39

relevant stackoverflow.com/a/4154954/6260170

– Chris_Rands
Nov 16 '18 at 9:41

you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

– Chris_Rands
Nov 16 '18 at 9:43

Try if you can use generators in this place.

– Devendra Bhat
Nov 16 '18 at 9:37

What is that function trying to do exactly?

– SilverSlash
Nov 16 '18 at 9:39

relevant stackoverflow.com/a/4154954/6260170

– Chris_Rands
Nov 16 '18 at 9:41

you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

– Chris_Rands
Nov 16 '18 at 9:43

add a comment |

2 Answers
2

active

oldest

votes

did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):

from collections import Counter
from random import randint

eles = [randint(0, 10) for i in range(30)]

counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]

print(most_common_element, number_of_occurrences, indices)

and the indices (the second iteration) can be found lazily in a generator expression:

indices = (i for i, x in enumerate(eles) if x == most_common_element)

if you need to care about multiple elements being the most common, this might work for you:

from collections import Counter
from itertools import groupby
from operator import itemgetter

eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common

for i, x in enumerate(eles):
 if x in indices:
 indices[x].append(i)

print(most_common)
print(indices)

you could of course still make the indices lazy the same way as above.

edited Nov 16 '18 at 14:06

answered Nov 16 '18 at 9:39

hiro protagonist

20.1k74162

@Chris_Rands was not quite there yet... sorry. updated now.

– hiro protagonist
Nov 16 '18 at 9:50

you fail to handle joint top elements e.g. eles = [1,1,2,2]

– Chris_Rands
Nov 16 '18 at 9:52

that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

– hiro protagonist
Nov 16 '18 at 9:55

add a comment |

If you are willing to use numpy, then you can do something like this:

arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)

HTH.

edited Nov 18 '18 at 11:51

answered Nov 16 '18 at 10:12

Deepak Saini

1,629816

add a comment |

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53334994%2flazy-sorted-function-timsort%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):

from collections import Counter
from random import randint

eles = [randint(0, 10) for i in range(30)]

counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]

print(most_common_element, number_of_occurrences, indices)

and the indices (the second iteration) can be found lazily in a generator expression:

indices = (i for i, x in enumerate(eles) if x == most_common_element)

if you need to care about multiple elements being the most common, this might work for you:

from collections import Counter
from itertools import groupby
from operator import itemgetter

eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common

for i, x in enumerate(eles):
 if x in indices:
 indices[x].append(i)

print(most_common)
print(indices)

you could of course still make the indices lazy the same way as above.

edited Nov 16 '18 at 14:06

answered Nov 16 '18 at 9:39

hiro protagonist

20.1k74162

@Chris_Rands was not quite there yet... sorry. updated now.

– hiro protagonist
Nov 16 '18 at 9:50

you fail to handle joint top elements e.g. eles = [1,1,2,2]

– Chris_Rands
Nov 16 '18 at 9:52

that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

– hiro protagonist
Nov 16 '18 at 9:55

add a comment |

did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):

from collections import Counter
from random import randint

eles = [randint(0, 10) for i in range(30)]

counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]

print(most_common_element, number_of_occurrences, indices)

and the indices (the second iteration) can be found lazily in a generator expression:

indices = (i for i, x in enumerate(eles) if x == most_common_element)

if you need to care about multiple elements being the most common, this might work for you:

from collections import Counter
from itertools import groupby
from operator import itemgetter

eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common

for i, x in enumerate(eles):
 if x in indices:
 indices[x].append(i)

print(most_common)
print(indices)

you could of course still make the indices lazy the same way as above.

edited Nov 16 '18 at 14:06

answered Nov 16 '18 at 9:39

hiro protagonist

20.1k74162

@Chris_Rands was not quite there yet... sorry. updated now.

– hiro protagonist
Nov 16 '18 at 9:50

you fail to handle joint top elements e.g. eles = [1,1,2,2]

– Chris_Rands
Nov 16 '18 at 9:52

that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

– hiro protagonist
Nov 16 '18 at 9:55

add a comment |

did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):

from collections import Counter
from random import randint

eles = [randint(0, 10) for i in range(30)]

counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]

print(most_common_element, number_of_occurrences, indices)

and the indices (the second iteration) can be found lazily in a generator expression:

indices = (i for i, x in enumerate(eles) if x == most_common_element)

if you need to care about multiple elements being the most common, this might work for you:

from collections import Counter
from itertools import groupby
from operator import itemgetter

eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common

for i, x in enumerate(eles):
 if x in indices:
 indices[x].append(i)

print(most_common)
print(indices)

you could of course still make the indices lazy the same way as above.

edited Nov 16 '18 at 14:06

answered Nov 16 '18 at 9:39

hiro protagonist

20.1k74162

did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):

from collections import Counter
from random import randint

eles = [randint(0, 10) for i in range(30)]

counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]

print(most_common_element, number_of_occurrences, indices)

and the indices (the second iteration) can be found lazily in a generator expression:

indices = (i for i, x in enumerate(eles) if x == most_common_element)

if you need to care about multiple elements being the most common, this might work for you:

from collections import Counter
from itertools import groupby
from operator import itemgetter

eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common

for i, x in enumerate(eles):
 if x in indices:
 indices[x].append(i)

print(most_common)
print(indices)

you could of course still make the indices lazy the same way as above.

edited Nov 16 '18 at 14:06

answered Nov 16 '18 at 9:39

hiro protagonist

20.1k74162

edited Nov 16 '18 at 14:06

answered Nov 16 '18 at 9:39

hiro protagonist

20.1k74162

answered Nov 16 '18 at 9:39

hiro protagonist

20.1k74162

answered Nov 16 '18 at 9:39

hiro protagonist

20.1k74162

@Chris_Rands was not quite there yet... sorry. updated now.

– hiro protagonist
Nov 16 '18 at 9:50

you fail to handle joint top elements e.g. eles = [1,1,2,2]

– Chris_Rands
Nov 16 '18 at 9:52

that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

– hiro protagonist
Nov 16 '18 at 9:55

add a comment |

@Chris_Rands was not quite there yet... sorry. updated now.

– hiro protagonist
Nov 16 '18 at 9:50

you fail to handle joint top elements e.g. eles = [1,1,2,2]

– Chris_Rands
Nov 16 '18 at 9:52

that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

– hiro protagonist
Nov 16 '18 at 9:55

@Chris_Rands was not quite there yet... sorry. updated now.

– hiro protagonist
Nov 16 '18 at 9:50

you fail to handle joint top elements e.g. eles = [1,1,2,2]

– Chris_Rands
Nov 16 '18 at 9:52

that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

– hiro protagonist
Nov 16 '18 at 9:55

add a comment |

If you are willing to use numpy, then you can do something like this:

arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)

HTH.

edited Nov 18 '18 at 11:51

answered Nov 16 '18 at 10:12

Deepak Saini

1,629816

add a comment |

If you are willing to use numpy, then you can do something like this:

arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)

HTH.

edited Nov 18 '18 at 11:51

answered Nov 16 '18 at 10:12

Deepak Saini

1,629816

add a comment |

If you are willing to use numpy, then you can do something like this:

arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)

HTH.

edited Nov 18 '18 at 11:51

answered Nov 16 '18 at 10:12

Deepak Saini

1,629816

If you are willing to use numpy, then you can do something like this:

arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)

HTH.

edited Nov 18 '18 at 11:51

answered Nov 16 '18 at 10:12

Deepak Saini

1,629816

edited Nov 18 '18 at 11:51

answered Nov 16 '18 at 10:12

Deepak Saini

1,629816

answered Nov 16 '18 at 10:12

Deepak Saini

1,629816

answered Nov 16 '18 at 10:12

Deepak Saini

1,629816

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth