How to calculate columns that have circular dependency in pandas dataframe?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I have a pandas dataframe like this-



 Tstamp Token LTP Cum_bsdiffs Cum_ltpdiffs counts Entry Correl Exit ltpchange ltpcumchange ltppercumchange
0 2018-10-29 11:40:33 415745 138.40 NaN NaN 0 0 NaN 0 0 0 0
1 2018-10-29 11:40:34 415745 138.40 -200.0 0.00 1 0 NaN 0 0 0 0
2 2018-10-29 11:40:34 415745 138.35 -1437.0 -0.05 2 0 NaN 0 0 0 0
3 2018-10-29 11:40:36 415745 138.35 -1337.0 -0.05 3 0 NaN 0 0 0 0


Now the columns Entry,Exit,ltpchange and ltpcumchange are interdependent as follows-





  1. Entry becomes "Buy" or "Sell" based on a condition depending on
    other columns. Otherwise it will remain 0.

    1. Just when Entry becomes not equal to 0, ltpchange starts taking changes in subsequent values of LTP. Otherwise it will
      remain 0.


    2. ltpcumchange will take cumulative sum of ltpchange.

    3. Just when ltpcumchange reaches a target value (any direction), Exit will become 1.


    4. Entry will remain "Buy" or "Sell", depending on its previous row, untill Exitbecomes 1 after which it will revert to 0.




I have used iterrows() to go for this logic, however, it is superslow. My dataframe contains more than 2 million rows and it is going by the speed of almost 5 rows per second.



I tried using dataframe column logic but failed to get the desired result. Can anyone help me out here?










share|improve this question






















  • Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example

    – zipa
    Nov 16 '18 at 12:04






  • 1





    I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.

    – Charles R
    Nov 16 '18 at 12:43











  • @CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'

    – Sagar Upadhyay
    Nov 16 '18 at 13:20











  • Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice

    – Charles R
    Nov 16 '18 at 13:38











  • iterrows() will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.

    – kon_u
    Nov 16 '18 at 14:34

















1















I have a pandas dataframe like this-



 Tstamp Token LTP Cum_bsdiffs Cum_ltpdiffs counts Entry Correl Exit ltpchange ltpcumchange ltppercumchange
0 2018-10-29 11:40:33 415745 138.40 NaN NaN 0 0 NaN 0 0 0 0
1 2018-10-29 11:40:34 415745 138.40 -200.0 0.00 1 0 NaN 0 0 0 0
2 2018-10-29 11:40:34 415745 138.35 -1437.0 -0.05 2 0 NaN 0 0 0 0
3 2018-10-29 11:40:36 415745 138.35 -1337.0 -0.05 3 0 NaN 0 0 0 0


Now the columns Entry,Exit,ltpchange and ltpcumchange are interdependent as follows-





  1. Entry becomes "Buy" or "Sell" based on a condition depending on
    other columns. Otherwise it will remain 0.

    1. Just when Entry becomes not equal to 0, ltpchange starts taking changes in subsequent values of LTP. Otherwise it will
      remain 0.


    2. ltpcumchange will take cumulative sum of ltpchange.

    3. Just when ltpcumchange reaches a target value (any direction), Exit will become 1.


    4. Entry will remain "Buy" or "Sell", depending on its previous row, untill Exitbecomes 1 after which it will revert to 0.




I have used iterrows() to go for this logic, however, it is superslow. My dataframe contains more than 2 million rows and it is going by the speed of almost 5 rows per second.



I tried using dataframe column logic but failed to get the desired result. Can anyone help me out here?










share|improve this question






















  • Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example

    – zipa
    Nov 16 '18 at 12:04






  • 1





    I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.

    – Charles R
    Nov 16 '18 at 12:43











  • @CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'

    – Sagar Upadhyay
    Nov 16 '18 at 13:20











  • Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice

    – Charles R
    Nov 16 '18 at 13:38











  • iterrows() will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.

    – kon_u
    Nov 16 '18 at 14:34













1












1








1








I have a pandas dataframe like this-



 Tstamp Token LTP Cum_bsdiffs Cum_ltpdiffs counts Entry Correl Exit ltpchange ltpcumchange ltppercumchange
0 2018-10-29 11:40:33 415745 138.40 NaN NaN 0 0 NaN 0 0 0 0
1 2018-10-29 11:40:34 415745 138.40 -200.0 0.00 1 0 NaN 0 0 0 0
2 2018-10-29 11:40:34 415745 138.35 -1437.0 -0.05 2 0 NaN 0 0 0 0
3 2018-10-29 11:40:36 415745 138.35 -1337.0 -0.05 3 0 NaN 0 0 0 0


Now the columns Entry,Exit,ltpchange and ltpcumchange are interdependent as follows-





  1. Entry becomes "Buy" or "Sell" based on a condition depending on
    other columns. Otherwise it will remain 0.

    1. Just when Entry becomes not equal to 0, ltpchange starts taking changes in subsequent values of LTP. Otherwise it will
      remain 0.


    2. ltpcumchange will take cumulative sum of ltpchange.

    3. Just when ltpcumchange reaches a target value (any direction), Exit will become 1.


    4. Entry will remain "Buy" or "Sell", depending on its previous row, untill Exitbecomes 1 after which it will revert to 0.




I have used iterrows() to go for this logic, however, it is superslow. My dataframe contains more than 2 million rows and it is going by the speed of almost 5 rows per second.



I tried using dataframe column logic but failed to get the desired result. Can anyone help me out here?










share|improve this question














I have a pandas dataframe like this-



 Tstamp Token LTP Cum_bsdiffs Cum_ltpdiffs counts Entry Correl Exit ltpchange ltpcumchange ltppercumchange
0 2018-10-29 11:40:33 415745 138.40 NaN NaN 0 0 NaN 0 0 0 0
1 2018-10-29 11:40:34 415745 138.40 -200.0 0.00 1 0 NaN 0 0 0 0
2 2018-10-29 11:40:34 415745 138.35 -1437.0 -0.05 2 0 NaN 0 0 0 0
3 2018-10-29 11:40:36 415745 138.35 -1337.0 -0.05 3 0 NaN 0 0 0 0


Now the columns Entry,Exit,ltpchange and ltpcumchange are interdependent as follows-





  1. Entry becomes "Buy" or "Sell" based on a condition depending on
    other columns. Otherwise it will remain 0.

    1. Just when Entry becomes not equal to 0, ltpchange starts taking changes in subsequent values of LTP. Otherwise it will
      remain 0.


    2. ltpcumchange will take cumulative sum of ltpchange.

    3. Just when ltpcumchange reaches a target value (any direction), Exit will become 1.


    4. Entry will remain "Buy" or "Sell", depending on its previous row, untill Exitbecomes 1 after which it will revert to 0.




I have used iterrows() to go for this logic, however, it is superslow. My dataframe contains more than 2 million rows and it is going by the speed of almost 5 rows per second.



I tried using dataframe column logic but failed to get the desired result. Can anyone help me out here?







python pandas dataframe






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 16 '18 at 11:49









Sagar UpadhyaySagar Upadhyay

1163




1163












  • Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example

    – zipa
    Nov 16 '18 at 12:04






  • 1





    I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.

    – Charles R
    Nov 16 '18 at 12:43











  • @CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'

    – Sagar Upadhyay
    Nov 16 '18 at 13:20











  • Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice

    – Charles R
    Nov 16 '18 at 13:38











  • iterrows() will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.

    – kon_u
    Nov 16 '18 at 14:34

















  • Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example

    – zipa
    Nov 16 '18 at 12:04






  • 1





    I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.

    – Charles R
    Nov 16 '18 at 12:43











  • @CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'

    – Sagar Upadhyay
    Nov 16 '18 at 13:20











  • Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice

    – Charles R
    Nov 16 '18 at 13:38











  • iterrows() will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.

    – kon_u
    Nov 16 '18 at 14:34
















Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example

– zipa
Nov 16 '18 at 12:04





Seems that your dataframe changes based only on those 4 columns so you can exclude the rest. Also, given the above example, what would the desire output look like? Should you need help in editing your question do take a look at how to create Minimal, Complete, and Verifiable example

– zipa
Nov 16 '18 at 12:04




1




1





I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.

– Charles R
Nov 16 '18 at 12:43





I gess you want to create a robot for trading and you don't want to share your strategie. But if you do not share your code, we cannot optimize it. iterrows() is slow, there might be a way to avoid using it.

– Charles R
Nov 16 '18 at 12:43













@CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'

– Sagar Upadhyay
Nov 16 '18 at 13:20





@CharlesR yes you are right. but i have mentioned all the logic that i want to be 'vectorized'

– Sagar Upadhyay
Nov 16 '18 at 13:20













Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice

– Charles R
Nov 16 '18 at 13:38





Maybe try to use a maximum of .loc in order to filter the rows you want to modify on each step of your code. Also numpy is good for vectorizing your code, ty to use np.where, np.select, np.choice

– Charles R
Nov 16 '18 at 13:38













iterrows() will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.

– kon_u
Nov 16 '18 at 14:34





iterrows() will be slow indeed. Recommend using numpy as well and modify the elements of the array as the length of the array goes on.

– kon_u
Nov 16 '18 at 14:34












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53337297%2fhow-to-calculate-columns-that-have-circular-dependency-in-pandas-dataframe%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53337297%2fhow-to-calculate-columns-that-have-circular-dependency-in-pandas-dataframe%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

ReactJS Fetched API data displays live - need Data displayed static

Evgeni Malkin