Manipulating Pandas Dataframe










1















I have a DataFrame A with one column location_ms. I want to split by ; and : to get DataFrame B.



DataFrame A(Beginning):



Beginning



DataFrame B(Final):



Final



My code below seems to be very roundabout and would love to see a better implementation towards the problem. By doing the splits I create a DataFrame with each element being a list of lists. Then I flatten that list of lists to create the final DataFrame.



def locpapersrc_table(df):
toflattenrows = df['location_ms'].str.split(';').apply(lambda x:[c.split(':') for c in x]).values.tolist()
singlelistoflist = [item for sublist in toflatten for item in sublist]
tmp = pd.DataFrame(singlelistoflist)
return tmp


This version2 is slower than the first but is another method that is also very roundabout.



def version2(df):
xx = df["location_ms"].str.split(';',expand = True).T
tmp = pd.melt(xx).dropna().drop(['variable'],axis=1)['value'].str.split(':',expand=True)
return tmp


Thank You!










share|improve this question



















  • 3





    Please do not post code or dataframes as images, make them text please

    – U9-Forward
    Nov 16 '18 at 1:40















1















I have a DataFrame A with one column location_ms. I want to split by ; and : to get DataFrame B.



DataFrame A(Beginning):



Beginning



DataFrame B(Final):



Final



My code below seems to be very roundabout and would love to see a better implementation towards the problem. By doing the splits I create a DataFrame with each element being a list of lists. Then I flatten that list of lists to create the final DataFrame.



def locpapersrc_table(df):
toflattenrows = df['location_ms'].str.split(';').apply(lambda x:[c.split(':') for c in x]).values.tolist()
singlelistoflist = [item for sublist in toflatten for item in sublist]
tmp = pd.DataFrame(singlelistoflist)
return tmp


This version2 is slower than the first but is another method that is also very roundabout.



def version2(df):
xx = df["location_ms"].str.split(';',expand = True).T
tmp = pd.melt(xx).dropna().drop(['variable'],axis=1)['value'].str.split(':',expand=True)
return tmp


Thank You!










share|improve this question



















  • 3





    Please do not post code or dataframes as images, make them text please

    – U9-Forward
    Nov 16 '18 at 1:40













1












1








1








I have a DataFrame A with one column location_ms. I want to split by ; and : to get DataFrame B.



DataFrame A(Beginning):



Beginning



DataFrame B(Final):



Final



My code below seems to be very roundabout and would love to see a better implementation towards the problem. By doing the splits I create a DataFrame with each element being a list of lists. Then I flatten that list of lists to create the final DataFrame.



def locpapersrc_table(df):
toflattenrows = df['location_ms'].str.split(';').apply(lambda x:[c.split(':') for c in x]).values.tolist()
singlelistoflist = [item for sublist in toflatten for item in sublist]
tmp = pd.DataFrame(singlelistoflist)
return tmp


This version2 is slower than the first but is another method that is also very roundabout.



def version2(df):
xx = df["location_ms"].str.split(';',expand = True).T
tmp = pd.melt(xx).dropna().drop(['variable'],axis=1)['value'].str.split(':',expand=True)
return tmp


Thank You!










share|improve this question
















I have a DataFrame A with one column location_ms. I want to split by ; and : to get DataFrame B.



DataFrame A(Beginning):



Beginning



DataFrame B(Final):



Final



My code below seems to be very roundabout and would love to see a better implementation towards the problem. By doing the splits I create a DataFrame with each element being a list of lists. Then I flatten that list of lists to create the final DataFrame.



def locpapersrc_table(df):
toflattenrows = df['location_ms'].str.split(';').apply(lambda x:[c.split(':') for c in x]).values.tolist()
singlelistoflist = [item for sublist in toflatten for item in sublist]
tmp = pd.DataFrame(singlelistoflist)
return tmp


This version2 is slower than the first but is another method that is also very roundabout.



def version2(df):
xx = df["location_ms"].str.split(';',expand = True).T
tmp = pd.melt(xx).dropna().drop(['variable'],axis=1)['value'].str.split(':',expand=True)
return tmp


Thank You!







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 21:33







kradja

















asked Nov 16 '18 at 1:34









kradjakradja

486




486







  • 3





    Please do not post code or dataframes as images, make them text please

    – U9-Forward
    Nov 16 '18 at 1:40












  • 3





    Please do not post code or dataframes as images, make them text please

    – U9-Forward
    Nov 16 '18 at 1:40







3




3





Please do not post code or dataframes as images, make them text please

– U9-Forward
Nov 16 '18 at 1:40





Please do not post code or dataframes as images, make them text please

– U9-Forward
Nov 16 '18 at 1:40












1 Answer
1






active

oldest

votes


















2














Try something like this.



split_df = df['location_ms'].str.split(pat=";", expand=True)


Throw in something like this if you want to merge it back into the original dataframe.



df = df.merge(split_df, left_index=True, right_index=True)
df = df.drop('location_ms')


For your new problem (splitting by ; and :):



split_df = df['location_ms'].str.split(pat=";", expand=True)
subsplit_df = pd.DataFrame(index = split_df.index)
for i in range(split_df.shape[1]):
subsplit_df = subsplit_df.merge(split_df.iloc[:, i].str.split(pat=":", expand=True), left_index=True, right_index=True)
subsplit_df.columns = range(subsplit_df.shape[1])


You can merge it back in as above if you want.






share|improve this answer

























  • You need to split by both delimiters, the ";" and the ":"

    – kradja
    Nov 16 '18 at 19:04











  • This does not work since you have a list of lists when delimiting by both characters which then has to be manipulating into the format of the final dataframe.

    – kradja
    Nov 16 '18 at 19:11











  • "I want to split by ; and ; to get DataFrame B" is a direct quote from your problem. I've edited the answer to match your new criteria.

    – CJR
    Nov 16 '18 at 19:41











  • Oops sorry about that typo! If you look at the initial Dataframe and code that sentence would not make sense. Sorry about the mistake! This is more roundabout than the code that I have. You should be using apply instead of iterating through the dataframe.

    – kradja
    Nov 16 '18 at 21:30










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330228%2fmanipulating-pandas-dataframe%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Try something like this.



split_df = df['location_ms'].str.split(pat=";", expand=True)


Throw in something like this if you want to merge it back into the original dataframe.



df = df.merge(split_df, left_index=True, right_index=True)
df = df.drop('location_ms')


For your new problem (splitting by ; and :):



split_df = df['location_ms'].str.split(pat=";", expand=True)
subsplit_df = pd.DataFrame(index = split_df.index)
for i in range(split_df.shape[1]):
subsplit_df = subsplit_df.merge(split_df.iloc[:, i].str.split(pat=":", expand=True), left_index=True, right_index=True)
subsplit_df.columns = range(subsplit_df.shape[1])


You can merge it back in as above if you want.






share|improve this answer

























  • You need to split by both delimiters, the ";" and the ":"

    – kradja
    Nov 16 '18 at 19:04











  • This does not work since you have a list of lists when delimiting by both characters which then has to be manipulating into the format of the final dataframe.

    – kradja
    Nov 16 '18 at 19:11











  • "I want to split by ; and ; to get DataFrame B" is a direct quote from your problem. I've edited the answer to match your new criteria.

    – CJR
    Nov 16 '18 at 19:41











  • Oops sorry about that typo! If you look at the initial Dataframe and code that sentence would not make sense. Sorry about the mistake! This is more roundabout than the code that I have. You should be using apply instead of iterating through the dataframe.

    – kradja
    Nov 16 '18 at 21:30















2














Try something like this.



split_df = df['location_ms'].str.split(pat=";", expand=True)


Throw in something like this if you want to merge it back into the original dataframe.



df = df.merge(split_df, left_index=True, right_index=True)
df = df.drop('location_ms')


For your new problem (splitting by ; and :):



split_df = df['location_ms'].str.split(pat=";", expand=True)
subsplit_df = pd.DataFrame(index = split_df.index)
for i in range(split_df.shape[1]):
subsplit_df = subsplit_df.merge(split_df.iloc[:, i].str.split(pat=":", expand=True), left_index=True, right_index=True)
subsplit_df.columns = range(subsplit_df.shape[1])


You can merge it back in as above if you want.






share|improve this answer

























  • You need to split by both delimiters, the ";" and the ":"

    – kradja
    Nov 16 '18 at 19:04











  • This does not work since you have a list of lists when delimiting by both characters which then has to be manipulating into the format of the final dataframe.

    – kradja
    Nov 16 '18 at 19:11











  • "I want to split by ; and ; to get DataFrame B" is a direct quote from your problem. I've edited the answer to match your new criteria.

    – CJR
    Nov 16 '18 at 19:41











  • Oops sorry about that typo! If you look at the initial Dataframe and code that sentence would not make sense. Sorry about the mistake! This is more roundabout than the code that I have. You should be using apply instead of iterating through the dataframe.

    – kradja
    Nov 16 '18 at 21:30













2












2








2







Try something like this.



split_df = df['location_ms'].str.split(pat=";", expand=True)


Throw in something like this if you want to merge it back into the original dataframe.



df = df.merge(split_df, left_index=True, right_index=True)
df = df.drop('location_ms')


For your new problem (splitting by ; and :):



split_df = df['location_ms'].str.split(pat=";", expand=True)
subsplit_df = pd.DataFrame(index = split_df.index)
for i in range(split_df.shape[1]):
subsplit_df = subsplit_df.merge(split_df.iloc[:, i].str.split(pat=":", expand=True), left_index=True, right_index=True)
subsplit_df.columns = range(subsplit_df.shape[1])


You can merge it back in as above if you want.






share|improve this answer















Try something like this.



split_df = df['location_ms'].str.split(pat=";", expand=True)


Throw in something like this if you want to merge it back into the original dataframe.



df = df.merge(split_df, left_index=True, right_index=True)
df = df.drop('location_ms')


For your new problem (splitting by ; and :):



split_df = df['location_ms'].str.split(pat=";", expand=True)
subsplit_df = pd.DataFrame(index = split_df.index)
for i in range(split_df.shape[1]):
subsplit_df = subsplit_df.merge(split_df.iloc[:, i].str.split(pat=":", expand=True), left_index=True, right_index=True)
subsplit_df.columns = range(subsplit_df.shape[1])


You can merge it back in as above if you want.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 16 '18 at 19:46

























answered Nov 16 '18 at 1:54









CJRCJR

1,2322316




1,2322316












  • You need to split by both delimiters, the ";" and the ":"

    – kradja
    Nov 16 '18 at 19:04











  • This does not work since you have a list of lists when delimiting by both characters which then has to be manipulating into the format of the final dataframe.

    – kradja
    Nov 16 '18 at 19:11











  • "I want to split by ; and ; to get DataFrame B" is a direct quote from your problem. I've edited the answer to match your new criteria.

    – CJR
    Nov 16 '18 at 19:41











  • Oops sorry about that typo! If you look at the initial Dataframe and code that sentence would not make sense. Sorry about the mistake! This is more roundabout than the code that I have. You should be using apply instead of iterating through the dataframe.

    – kradja
    Nov 16 '18 at 21:30

















  • You need to split by both delimiters, the ";" and the ":"

    – kradja
    Nov 16 '18 at 19:04











  • This does not work since you have a list of lists when delimiting by both characters which then has to be manipulating into the format of the final dataframe.

    – kradja
    Nov 16 '18 at 19:11











  • "I want to split by ; and ; to get DataFrame B" is a direct quote from your problem. I've edited the answer to match your new criteria.

    – CJR
    Nov 16 '18 at 19:41











  • Oops sorry about that typo! If you look at the initial Dataframe and code that sentence would not make sense. Sorry about the mistake! This is more roundabout than the code that I have. You should be using apply instead of iterating through the dataframe.

    – kradja
    Nov 16 '18 at 21:30
















You need to split by both delimiters, the ";" and the ":"

– kradja
Nov 16 '18 at 19:04





You need to split by both delimiters, the ";" and the ":"

– kradja
Nov 16 '18 at 19:04













This does not work since you have a list of lists when delimiting by both characters which then has to be manipulating into the format of the final dataframe.

– kradja
Nov 16 '18 at 19:11





This does not work since you have a list of lists when delimiting by both characters which then has to be manipulating into the format of the final dataframe.

– kradja
Nov 16 '18 at 19:11













"I want to split by ; and ; to get DataFrame B" is a direct quote from your problem. I've edited the answer to match your new criteria.

– CJR
Nov 16 '18 at 19:41





"I want to split by ; and ; to get DataFrame B" is a direct quote from your problem. I've edited the answer to match your new criteria.

– CJR
Nov 16 '18 at 19:41













Oops sorry about that typo! If you look at the initial Dataframe and code that sentence would not make sense. Sorry about the mistake! This is more roundabout than the code that I have. You should be using apply instead of iterating through the dataframe.

– kradja
Nov 16 '18 at 21:30





Oops sorry about that typo! If you look at the initial Dataframe and code that sentence would not make sense. Sorry about the mistake! This is more roundabout than the code that I have. You should be using apply instead of iterating through the dataframe.

– kradja
Nov 16 '18 at 21:30



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330228%2fmanipulating-pandas-dataframe%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

政党

天津地下鉄3号線