Lazy sorted function, timsort










0















Having a list with N (large number) elements:



from random import randint

eles = [randint(0, 10) for i in range(3000000)]


I'm trying to implement the best way (performance/resources spent) this function below:



def mosty(lst):
sort = sorted((v, k) for k, v in enumerate(lst))
count, maxi, last_ele, idxs = 0, 0, None,
for ele, idx in sort:
if(last_ele != ele):
count = 1
idxs =
idxs.append(idx)
if(last_ele == ele):
count += 1
if(maxi < count):
results = (ele, count, idxs)
maxi = count
last_ele = ele
return results


This function returns the most common element, number of occurrences, and the indexes where it was found.



Here is the benchmark with 300000 eles.



But I think I could improve, one of the reasons being python3 sorted function (timsort), if it returned a generator I didn't have to loop through the list twice right?



My questions are:



Is there any way for this code to be optimized? How?

With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort










share|improve this question






















  • Try if you can use generators in this place.

    – Devendra Bhat
    Nov 16 '18 at 9:37











  • What is that function trying to do exactly?

    – SilverSlash
    Nov 16 '18 at 9:39











  • relevant stackoverflow.com/a/4154954/6260170

    – Chris_Rands
    Nov 16 '18 at 9:41











  • you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

    – Chris_Rands
    Nov 16 '18 at 9:43















0















Having a list with N (large number) elements:



from random import randint

eles = [randint(0, 10) for i in range(3000000)]


I'm trying to implement the best way (performance/resources spent) this function below:



def mosty(lst):
sort = sorted((v, k) for k, v in enumerate(lst))
count, maxi, last_ele, idxs = 0, 0, None,
for ele, idx in sort:
if(last_ele != ele):
count = 1
idxs =
idxs.append(idx)
if(last_ele == ele):
count += 1
if(maxi < count):
results = (ele, count, idxs)
maxi = count
last_ele = ele
return results


This function returns the most common element, number of occurrences, and the indexes where it was found.



Here is the benchmark with 300000 eles.



But I think I could improve, one of the reasons being python3 sorted function (timsort), if it returned a generator I didn't have to loop through the list twice right?



My questions are:



Is there any way for this code to be optimized? How?

With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort










share|improve this question






















  • Try if you can use generators in this place.

    – Devendra Bhat
    Nov 16 '18 at 9:37











  • What is that function trying to do exactly?

    – SilverSlash
    Nov 16 '18 at 9:39











  • relevant stackoverflow.com/a/4154954/6260170

    – Chris_Rands
    Nov 16 '18 at 9:41











  • you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

    – Chris_Rands
    Nov 16 '18 at 9:43













0












0








0


0






Having a list with N (large number) elements:



from random import randint

eles = [randint(0, 10) for i in range(3000000)]


I'm trying to implement the best way (performance/resources spent) this function below:



def mosty(lst):
sort = sorted((v, k) for k, v in enumerate(lst))
count, maxi, last_ele, idxs = 0, 0, None,
for ele, idx in sort:
if(last_ele != ele):
count = 1
idxs =
idxs.append(idx)
if(last_ele == ele):
count += 1
if(maxi < count):
results = (ele, count, idxs)
maxi = count
last_ele = ele
return results


This function returns the most common element, number of occurrences, and the indexes where it was found.



Here is the benchmark with 300000 eles.



But I think I could improve, one of the reasons being python3 sorted function (timsort), if it returned a generator I didn't have to loop through the list twice right?



My questions are:



Is there any way for this code to be optimized? How?

With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort










share|improve this question














Having a list with N (large number) elements:



from random import randint

eles = [randint(0, 10) for i in range(3000000)]


I'm trying to implement the best way (performance/resources spent) this function below:



def mosty(lst):
sort = sorted((v, k) for k, v in enumerate(lst))
count, maxi, last_ele, idxs = 0, 0, None,
for ele, idx in sort:
if(last_ele != ele):
count = 1
idxs =
idxs.append(idx)
if(last_ele == ele):
count += 1
if(maxi < count):
results = (ele, count, idxs)
maxi = count
last_ele = ele
return results


This function returns the most common element, number of occurrences, and the indexes where it was found.



Here is the benchmark with 300000 eles.



But I think I could improve, one of the reasons being python3 sorted function (timsort), if it returned a generator I didn't have to loop through the list twice right?



My questions are:



Is there any way for this code to be optimized? How?

With a lazy sorting I sure it would be, am I right? How can I implement lazy timsort







python python-3.x performance sorting






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 16 '18 at 9:33









Hula HulaHula Hula

154110




154110












  • Try if you can use generators in this place.

    – Devendra Bhat
    Nov 16 '18 at 9:37











  • What is that function trying to do exactly?

    – SilverSlash
    Nov 16 '18 at 9:39











  • relevant stackoverflow.com/a/4154954/6260170

    – Chris_Rands
    Nov 16 '18 at 9:41











  • you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

    – Chris_Rands
    Nov 16 '18 at 9:43

















  • Try if you can use generators in this place.

    – Devendra Bhat
    Nov 16 '18 at 9:37











  • What is that function trying to do exactly?

    – SilverSlash
    Nov 16 '18 at 9:39











  • relevant stackoverflow.com/a/4154954/6260170

    – Chris_Rands
    Nov 16 '18 at 9:41











  • you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

    – Chris_Rands
    Nov 16 '18 at 9:43
















Try if you can use generators in this place.

– Devendra Bhat
Nov 16 '18 at 9:37





Try if you can use generators in this place.

– Devendra Bhat
Nov 16 '18 at 9:37













What is that function trying to do exactly?

– SilverSlash
Nov 16 '18 at 9:39





What is that function trying to do exactly?

– SilverSlash
Nov 16 '18 at 9:39













relevant stackoverflow.com/a/4154954/6260170

– Chris_Rands
Nov 16 '18 at 9:41





relevant stackoverflow.com/a/4154954/6260170

– Chris_Rands
Nov 16 '18 at 9:41













you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

– Chris_Rands
Nov 16 '18 at 9:43





you can certainly improve your implementation, but I think the lazy sorting idea is not the right way to do so

– Chris_Rands
Nov 16 '18 at 9:43












2 Answers
2






active

oldest

votes


















2














did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):



from collections import Counter
from random import randint

eles = [randint(0, 10) for i in range(30)]

counter = Counter(eles)
most_common_element, number_of_occurrences = counter.most_common(1)[0]
indices = [i for i, x in enumerate(eles) if x == most_common_element]

print(most_common_element, number_of_occurrences, indices)


and the indices (the second iteration) can be found lazily in a generator expression:



indices = (i for i, x in enumerate(eles) if x == most_common_element)



if you need to care about multiple elements being the most common, this might work for you:



from collections import Counter
from itertools import groupby
from operator import itemgetter

eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

counter = Counter(eles)
_key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
most_common = dict(group)
indices = key: for key in most_common

for i, x in enumerate(eles):
if x in indices:
indices[x].append(i)

print(most_common)
print(indices)


you could of course still make the indices lazy the same way as above.






share|improve this answer

























  • @Chris_Rands was not quite there yet... sorry. updated now.

    – hiro protagonist
    Nov 16 '18 at 9:50











  • you fail to handle joint top elements e.g. eles = [1,1,2,2]

    – Chris_Rands
    Nov 16 '18 at 9:52











  • that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

    – hiro protagonist
    Nov 16 '18 at 9:55


















2














If you are willing to use numpy, then you can do something like this:



arr = np.array(eles)
values, counts = np.unique(arr, return_counts=True)
ind = np.argmax(counts)
most_common_elem, its_count = values[ind], counts[ind]
indices = np.where(arr == most_common_elem)


HTH.






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53334994%2flazy-sorted-function-timsort%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):



    from collections import Counter
    from random import randint

    eles = [randint(0, 10) for i in range(30)]

    counter = Counter(eles)
    most_common_element, number_of_occurrences = counter.most_common(1)[0]
    indices = [i for i, x in enumerate(eles) if x == most_common_element]

    print(most_common_element, number_of_occurrences, indices)


    and the indices (the second iteration) can be found lazily in a generator expression:



    indices = (i for i, x in enumerate(eles) if x == most_common_element)



    if you need to care about multiple elements being the most common, this might work for you:



    from collections import Counter
    from itertools import groupby
    from operator import itemgetter

    eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

    counter = Counter(eles)
    _key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
    most_common = dict(group)
    indices = key: for key in most_common

    for i, x in enumerate(eles):
    if x in indices:
    indices[x].append(i)

    print(most_common)
    print(indices)


    you could of course still make the indices lazy the same way as above.






    share|improve this answer

























    • @Chris_Rands was not quite there yet... sorry. updated now.

      – hiro protagonist
      Nov 16 '18 at 9:50











    • you fail to handle joint top elements e.g. eles = [1,1,2,2]

      – Chris_Rands
      Nov 16 '18 at 9:52











    • that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

      – hiro protagonist
      Nov 16 '18 at 9:55















    2














    did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):



    from collections import Counter
    from random import randint

    eles = [randint(0, 10) for i in range(30)]

    counter = Counter(eles)
    most_common_element, number_of_occurrences = counter.most_common(1)[0]
    indices = [i for i, x in enumerate(eles) if x == most_common_element]

    print(most_common_element, number_of_occurrences, indices)


    and the indices (the second iteration) can be found lazily in a generator expression:



    indices = (i for i, x in enumerate(eles) if x == most_common_element)



    if you need to care about multiple elements being the most common, this might work for you:



    from collections import Counter
    from itertools import groupby
    from operator import itemgetter

    eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

    counter = Counter(eles)
    _key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
    most_common = dict(group)
    indices = key: for key in most_common

    for i, x in enumerate(eles):
    if x in indices:
    indices[x].append(i)

    print(most_common)
    print(indices)


    you could of course still make the indices lazy the same way as above.






    share|improve this answer

























    • @Chris_Rands was not quite there yet... sorry. updated now.

      – hiro protagonist
      Nov 16 '18 at 9:50











    • you fail to handle joint top elements e.g. eles = [1,1,2,2]

      – Chris_Rands
      Nov 16 '18 at 9:52











    • that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

      – hiro protagonist
      Nov 16 '18 at 9:55













    2












    2








    2







    did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):



    from collections import Counter
    from random import randint

    eles = [randint(0, 10) for i in range(30)]

    counter = Counter(eles)
    most_common_element, number_of_occurrences = counter.most_common(1)[0]
    indices = [i for i, x in enumerate(eles) if x == most_common_element]

    print(most_common_element, number_of_occurrences, indices)


    and the indices (the second iteration) can be found lazily in a generator expression:



    indices = (i for i, x in enumerate(eles) if x == most_common_element)



    if you need to care about multiple elements being the most common, this might work for you:



    from collections import Counter
    from itertools import groupby
    from operator import itemgetter

    eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

    counter = Counter(eles)
    _key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
    most_common = dict(group)
    indices = key: for key in most_common

    for i, x in enumerate(eles):
    if x in indices:
    indices[x].append(i)

    print(most_common)
    print(indices)


    you could of course still make the indices lazy the same way as above.






    share|improve this answer















    did not do any benchmarks, but that should not perform that badly (even though it iterates twice over the list):



    from collections import Counter
    from random import randint

    eles = [randint(0, 10) for i in range(30)]

    counter = Counter(eles)
    most_common_element, number_of_occurrences = counter.most_common(1)[0]
    indices = [i for i, x in enumerate(eles) if x == most_common_element]

    print(most_common_element, number_of_occurrences, indices)


    and the indices (the second iteration) can be found lazily in a generator expression:



    indices = (i for i, x in enumerate(eles) if x == most_common_element)



    if you need to care about multiple elements being the most common, this might work for you:



    from collections import Counter
    from itertools import groupby
    from operator import itemgetter

    eles = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5]

    counter = Counter(eles)
    _key, group = next(groupby(counter.most_common(), key=itemgetter(1)))
    most_common = dict(group)
    indices = key: for key in most_common

    for i, x in enumerate(eles):
    if x in indices:
    indices[x].append(i)

    print(most_common)
    print(indices)


    you could of course still make the indices lazy the same way as above.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 16 '18 at 14:06

























    answered Nov 16 '18 at 9:39









    hiro protagonisthiro protagonist

    20.1k74162




    20.1k74162












    • @Chris_Rands was not quite there yet... sorry. updated now.

      – hiro protagonist
      Nov 16 '18 at 9:50











    • you fail to handle joint top elements e.g. eles = [1,1,2,2]

      – Chris_Rands
      Nov 16 '18 at 9:52











    • that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

      – hiro protagonist
      Nov 16 '18 at 9:55

















    • @Chris_Rands was not quite there yet... sorry. updated now.

      – hiro protagonist
      Nov 16 '18 at 9:50











    • you fail to handle joint top elements e.g. eles = [1,1,2,2]

      – Chris_Rands
      Nov 16 '18 at 9:52











    • that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

      – hiro protagonist
      Nov 16 '18 at 9:55
















    @Chris_Rands was not quite there yet... sorry. updated now.

    – hiro protagonist
    Nov 16 '18 at 9:50





    @Chris_Rands was not quite there yet... sorry. updated now.

    – hiro protagonist
    Nov 16 '18 at 9:50













    you fail to handle joint top elements e.g. eles = [1,1,2,2]

    – Chris_Rands
    Nov 16 '18 at 9:52





    you fail to handle joint top elements e.g. eles = [1,1,2,2]

    – Chris_Rands
    Nov 16 '18 at 9:52













    that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

    – hiro protagonist
    Nov 16 '18 at 9:55





    that is correct. thanks for pointing out. that could be easily fixed though. not looking into that as long as i'm unsure the OP is interested in this variant...

    – hiro protagonist
    Nov 16 '18 at 9:55













    2














    If you are willing to use numpy, then you can do something like this:



    arr = np.array(eles)
    values, counts = np.unique(arr, return_counts=True)
    ind = np.argmax(counts)
    most_common_elem, its_count = values[ind], counts[ind]
    indices = np.where(arr == most_common_elem)


    HTH.






    share|improve this answer





























      2














      If you are willing to use numpy, then you can do something like this:



      arr = np.array(eles)
      values, counts = np.unique(arr, return_counts=True)
      ind = np.argmax(counts)
      most_common_elem, its_count = values[ind], counts[ind]
      indices = np.where(arr == most_common_elem)


      HTH.






      share|improve this answer



























        2












        2








        2







        If you are willing to use numpy, then you can do something like this:



        arr = np.array(eles)
        values, counts = np.unique(arr, return_counts=True)
        ind = np.argmax(counts)
        most_common_elem, its_count = values[ind], counts[ind]
        indices = np.where(arr == most_common_elem)


        HTH.






        share|improve this answer















        If you are willing to use numpy, then you can do something like this:



        arr = np.array(eles)
        values, counts = np.unique(arr, return_counts=True)
        ind = np.argmax(counts)
        most_common_elem, its_count = values[ind], counts[ind]
        indices = np.where(arr == most_common_elem)


        HTH.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 18 '18 at 11:51

























        answered Nov 16 '18 at 10:12









        Deepak SainiDeepak Saini

        1,629816




        1,629816



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53334994%2flazy-sorted-function-timsort%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Top Tejano songwriter Luis Silva dead of heart attack at 64

            政党

            天津地下鉄3号線