Pandas Dataframe interpreting columns as float instead of String










4















I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.



 ID xyz
0 12345 4.56
1 45.60
2 54231 987.00


I want to read this column as String, but even if I specifiy it with



df=pd.read_csv(filename,dtype='ID': str)



I get



 ID xyz
0 '12345.0' 4.56
1 NaN 45.60
2 '54231.0' 987.00


Is there an easy way get the ID as a string without decimal like '12345'without having to edit the Strings after importing the table?










share|improve this question
























  • Is possible empty values in numeric columns?

    – jezrael
    Nov 13 '18 at 12:34











  • If your concern is output format, then fix this when you export the data (e.g. to_csv, to_string), not by changing your underlying data (which looks fine) to awkward types.

    – jpp
    Nov 13 '18 at 13:18












  • I think you can upgrade your pandas version and all working nice.

    – jezrael
    Nov 13 '18 at 13:18











  • I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.

    – Georg B
    Nov 13 '18 at 13:24















4















I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.



 ID xyz
0 12345 4.56
1 45.60
2 54231 987.00


I want to read this column as String, but even if I specifiy it with



df=pd.read_csv(filename,dtype='ID': str)



I get



 ID xyz
0 '12345.0' 4.56
1 NaN 45.60
2 '54231.0' 987.00


Is there an easy way get the ID as a string without decimal like '12345'without having to edit the Strings after importing the table?










share|improve this question
























  • Is possible empty values in numeric columns?

    – jezrael
    Nov 13 '18 at 12:34











  • If your concern is output format, then fix this when you export the data (e.g. to_csv, to_string), not by changing your underlying data (which looks fine) to awkward types.

    – jpp
    Nov 13 '18 at 13:18












  • I think you can upgrade your pandas version and all working nice.

    – jezrael
    Nov 13 '18 at 13:18











  • I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.

    – Georg B
    Nov 13 '18 at 13:24













4












4








4








I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.



 ID xyz
0 12345 4.56
1 45.60
2 54231 987.00


I want to read this column as String, but even if I specifiy it with



df=pd.read_csv(filename,dtype='ID': str)



I get



 ID xyz
0 '12345.0' 4.56
1 NaN 45.60
2 '54231.0' 987.00


Is there an easy way get the ID as a string without decimal like '12345'without having to edit the Strings after importing the table?










share|improve this question
















I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.



 ID xyz
0 12345 4.56
1 45.60
2 54231 987.00


I want to read this column as String, but even if I specifiy it with



df=pd.read_csv(filename,dtype='ID': str)



I get



 ID xyz
0 '12345.0' 4.56
1 NaN 45.60
2 '54231.0' 987.00


Is there an easy way get the ID as a string without decimal like '12345'without having to edit the Strings after importing the table?







python python-3.x pandas dataframe types






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 13:16









Malik Asad

283110




283110










asked Nov 13 '18 at 12:03









Georg BGeorg B

264




264












  • Is possible empty values in numeric columns?

    – jezrael
    Nov 13 '18 at 12:34











  • If your concern is output format, then fix this when you export the data (e.g. to_csv, to_string), not by changing your underlying data (which looks fine) to awkward types.

    – jpp
    Nov 13 '18 at 13:18












  • I think you can upgrade your pandas version and all working nice.

    – jezrael
    Nov 13 '18 at 13:18











  • I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.

    – Georg B
    Nov 13 '18 at 13:24

















  • Is possible empty values in numeric columns?

    – jezrael
    Nov 13 '18 at 12:34











  • If your concern is output format, then fix this when you export the data (e.g. to_csv, to_string), not by changing your underlying data (which looks fine) to awkward types.

    – jpp
    Nov 13 '18 at 13:18












  • I think you can upgrade your pandas version and all working nice.

    – jezrael
    Nov 13 '18 at 13:18











  • I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.

    – Georg B
    Nov 13 '18 at 13:24
















Is possible empty values in numeric columns?

– jezrael
Nov 13 '18 at 12:34





Is possible empty values in numeric columns?

– jezrael
Nov 13 '18 at 12:34













If your concern is output format, then fix this when you export the data (e.g. to_csv, to_string), not by changing your underlying data (which looks fine) to awkward types.

– jpp
Nov 13 '18 at 13:18






If your concern is output format, then fix this when you export the data (e.g. to_csv, to_string), not by changing your underlying data (which looks fine) to awkward types.

– jpp
Nov 13 '18 at 13:18














I think you can upgrade your pandas version and all working nice.

– jezrael
Nov 13 '18 at 13:18





I think you can upgrade your pandas version and all working nice.

– jezrael
Nov 13 '18 at 13:18













I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.

– Georg B
Nov 13 '18 at 13:24





I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.

– Georg B
Nov 13 '18 at 13:24












3 Answers
3






active

oldest

votes


















1














A solution could be this, but after you have imported the df:



df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)


Or since there are NaN with:



df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))





share|improve this answer




















  • 1





    Doesn't work, because I have empty cells, and NaN values can't be converted to int

    – Georg B
    Nov 13 '18 at 12:26











  • @GeorgB I've updated the answer

    – Joe
    Nov 13 '18 at 12:47






  • 1





    That worked thank you. Was trying smthng similar but yours works way better

    – Georg B
    Nov 13 '18 at 12:54


















1














Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:



import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00


EDIT:



For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:



import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str)
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00





share|improve this answer

























  • It works for your example, but not my csv file. Only difference to previous result is that NaN became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.

    – Georg B
    Nov 13 '18 at 12:46











  • @GeorgB - what is expected output in ID column instead empty string?

    – jezrael
    Nov 13 '18 at 12:50











  • the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.

    – Georg B
    Nov 13 '18 at 12:59











  • @GeorgB - df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x))) is your solution?

    – jezrael
    Nov 13 '18 at 13:04











  • @GeorgB - What is your pandas version? Because in pandas 0.23.4 it working nice like you need.

    – jezrael
    Nov 13 '18 at 13:09


















0














Specify float format when writing to csv



Since your underlying problem is output format when exporting data, no manipulation is required. Just use:



df.to_csv('file.csv', float_format='%.0f')


Since you want only specific columns to have this formatting you can use to_string:



def format_int(x):
return f'x:.0f' if x==x else ''

with open('file.csv', 'w') as fout:
fout.write(df.to_string(formatters='ID': format_int))


Keep numeric data numeric




There is a column with IDs, which consist of only numbers




If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.



Float NaN prompts upcasting



Your issue is NaN values can't coexist with integers in a numeric series. Since NaN is a float, Pandas forces upcasting. This is natural, because the object dtype alternative is inefficient and not recommended.



If viable, you can use a sentinel value, e.g. -1 to indicate nulls:



df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)

print(df)

ID xyz
0 12345 4.56
1 -1 45.60
2 54231 987.00





share|improve this answer

























  • If your column only includes numbers, don't convert to strings! - it OP need convert numeric to strings, why not? What is wrong about it?

    – jezrael
    Nov 13 '18 at 12:40











  • @jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."

    – jpp
    Nov 13 '18 at 12:40







  • 1





    OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.

    – jezrael
    Nov 13 '18 at 12:46






  • 1





    I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.

    – Georg B
    Nov 13 '18 at 12:49






  • 1





    I donvote becasue don't convert to strings! is wrong statement.

    – jezrael
    Nov 13 '18 at 13:16











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280650%2fpandas-dataframe-interpreting-columns-as-float-instead-of-string%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














A solution could be this, but after you have imported the df:



df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)


Or since there are NaN with:



df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))





share|improve this answer




















  • 1





    Doesn't work, because I have empty cells, and NaN values can't be converted to int

    – Georg B
    Nov 13 '18 at 12:26











  • @GeorgB I've updated the answer

    – Joe
    Nov 13 '18 at 12:47






  • 1





    That worked thank you. Was trying smthng similar but yours works way better

    – Georg B
    Nov 13 '18 at 12:54















1














A solution could be this, but after you have imported the df:



df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)


Or since there are NaN with:



df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))





share|improve this answer




















  • 1





    Doesn't work, because I have empty cells, and NaN values can't be converted to int

    – Georg B
    Nov 13 '18 at 12:26











  • @GeorgB I've updated the answer

    – Joe
    Nov 13 '18 at 12:47






  • 1





    That worked thank you. Was trying smthng similar but yours works way better

    – Georg B
    Nov 13 '18 at 12:54













1












1








1







A solution could be this, but after you have imported the df:



df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)


Or since there are NaN with:



df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))





share|improve this answer















A solution could be this, but after you have imported the df:



df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)


Or since there are NaN with:



df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 '18 at 12:46

























answered Nov 13 '18 at 12:23









JoeJoe

5,89621129




5,89621129







  • 1





    Doesn't work, because I have empty cells, and NaN values can't be converted to int

    – Georg B
    Nov 13 '18 at 12:26











  • @GeorgB I've updated the answer

    – Joe
    Nov 13 '18 at 12:47






  • 1





    That worked thank you. Was trying smthng similar but yours works way better

    – Georg B
    Nov 13 '18 at 12:54












  • 1





    Doesn't work, because I have empty cells, and NaN values can't be converted to int

    – Georg B
    Nov 13 '18 at 12:26











  • @GeorgB I've updated the answer

    – Joe
    Nov 13 '18 at 12:47






  • 1





    That worked thank you. Was trying smthng similar but yours works way better

    – Georg B
    Nov 13 '18 at 12:54







1




1





Doesn't work, because I have empty cells, and NaN values can't be converted to int

– Georg B
Nov 13 '18 at 12:26





Doesn't work, because I have empty cells, and NaN values can't be converted to int

– Georg B
Nov 13 '18 at 12:26













@GeorgB I've updated the answer

– Joe
Nov 13 '18 at 12:47





@GeorgB I've updated the answer

– Joe
Nov 13 '18 at 12:47




1




1





That worked thank you. Was trying smthng similar but yours works way better

– Georg B
Nov 13 '18 at 12:54





That worked thank you. Was trying smthng similar but yours works way better

– Georg B
Nov 13 '18 at 12:54













1














Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:



import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00


EDIT:



For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:



import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str)
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00





share|improve this answer

























  • It works for your example, but not my csv file. Only difference to previous result is that NaN became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.

    – Georg B
    Nov 13 '18 at 12:46











  • @GeorgB - what is expected output in ID column instead empty string?

    – jezrael
    Nov 13 '18 at 12:50











  • the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.

    – Georg B
    Nov 13 '18 at 12:59











  • @GeorgB - df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x))) is your solution?

    – jezrael
    Nov 13 '18 at 13:04











  • @GeorgB - What is your pandas version? Because in pandas 0.23.4 it working nice like you need.

    – jezrael
    Nov 13 '18 at 13:09















1














Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:



import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00


EDIT:



For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:



import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str)
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00





share|improve this answer

























  • It works for your example, but not my csv file. Only difference to previous result is that NaN became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.

    – Georg B
    Nov 13 '18 at 12:46











  • @GeorgB - what is expected output in ID column instead empty string?

    – jezrael
    Nov 13 '18 at 12:50











  • the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.

    – Georg B
    Nov 13 '18 at 12:59











  • @GeorgB - df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x))) is your solution?

    – jezrael
    Nov 13 '18 at 13:04











  • @GeorgB - What is your pandas version? Because in pandas 0.23.4 it working nice like you need.

    – jezrael
    Nov 13 '18 at 13:09













1












1








1







Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:



import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00


EDIT:



For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:



import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str)
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00





share|improve this answer















Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:



import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00


EDIT:



For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:



import pandas as pd

temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str)
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 '18 at 13:08

























answered Nov 13 '18 at 12:30









jezraeljezrael

324k22266342




324k22266342












  • It works for your example, but not my csv file. Only difference to previous result is that NaN became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.

    – Georg B
    Nov 13 '18 at 12:46











  • @GeorgB - what is expected output in ID column instead empty string?

    – jezrael
    Nov 13 '18 at 12:50











  • the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.

    – Georg B
    Nov 13 '18 at 12:59











  • @GeorgB - df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x))) is your solution?

    – jezrael
    Nov 13 '18 at 13:04











  • @GeorgB - What is your pandas version? Because in pandas 0.23.4 it working nice like you need.

    – jezrael
    Nov 13 '18 at 13:09

















  • It works for your example, but not my csv file. Only difference to previous result is that NaN became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.

    – Georg B
    Nov 13 '18 at 12:46











  • @GeorgB - what is expected output in ID column instead empty string?

    – jezrael
    Nov 13 '18 at 12:50











  • the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.

    – Georg B
    Nov 13 '18 at 12:59











  • @GeorgB - df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x))) is your solution?

    – jezrael
    Nov 13 '18 at 13:04











  • @GeorgB - What is your pandas version? Because in pandas 0.23.4 it working nice like you need.

    – jezrael
    Nov 13 '18 at 13:09
















It works for your example, but not my csv file. Only difference to previous result is that NaN became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.

– Georg B
Nov 13 '18 at 12:46





It works for your example, but not my csv file. Only difference to previous result is that NaN became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.

– Georg B
Nov 13 '18 at 12:46













@GeorgB - what is expected output in ID column instead empty string?

– jezrael
Nov 13 '18 at 12:50





@GeorgB - what is expected output in ID column instead empty string?

– jezrael
Nov 13 '18 at 12:50













the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.

– Georg B
Nov 13 '18 at 12:59





the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.

– Georg B
Nov 13 '18 at 12:59













@GeorgB - df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x))) is your solution?

– jezrael
Nov 13 '18 at 13:04





@GeorgB - df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x))) is your solution?

– jezrael
Nov 13 '18 at 13:04













@GeorgB - What is your pandas version? Because in pandas 0.23.4 it working nice like you need.

– jezrael
Nov 13 '18 at 13:09





@GeorgB - What is your pandas version? Because in pandas 0.23.4 it working nice like you need.

– jezrael
Nov 13 '18 at 13:09











0














Specify float format when writing to csv



Since your underlying problem is output format when exporting data, no manipulation is required. Just use:



df.to_csv('file.csv', float_format='%.0f')


Since you want only specific columns to have this formatting you can use to_string:



def format_int(x):
return f'x:.0f' if x==x else ''

with open('file.csv', 'w') as fout:
fout.write(df.to_string(formatters='ID': format_int))


Keep numeric data numeric




There is a column with IDs, which consist of only numbers




If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.



Float NaN prompts upcasting



Your issue is NaN values can't coexist with integers in a numeric series. Since NaN is a float, Pandas forces upcasting. This is natural, because the object dtype alternative is inefficient and not recommended.



If viable, you can use a sentinel value, e.g. -1 to indicate nulls:



df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)

print(df)

ID xyz
0 12345 4.56
1 -1 45.60
2 54231 987.00





share|improve this answer

























  • If your column only includes numbers, don't convert to strings! - it OP need convert numeric to strings, why not? What is wrong about it?

    – jezrael
    Nov 13 '18 at 12:40











  • @jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."

    – jpp
    Nov 13 '18 at 12:40







  • 1





    OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.

    – jezrael
    Nov 13 '18 at 12:46






  • 1





    I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.

    – Georg B
    Nov 13 '18 at 12:49






  • 1





    I donvote becasue don't convert to strings! is wrong statement.

    – jezrael
    Nov 13 '18 at 13:16
















0














Specify float format when writing to csv



Since your underlying problem is output format when exporting data, no manipulation is required. Just use:



df.to_csv('file.csv', float_format='%.0f')


Since you want only specific columns to have this formatting you can use to_string:



def format_int(x):
return f'x:.0f' if x==x else ''

with open('file.csv', 'w') as fout:
fout.write(df.to_string(formatters='ID': format_int))


Keep numeric data numeric




There is a column with IDs, which consist of only numbers




If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.



Float NaN prompts upcasting



Your issue is NaN values can't coexist with integers in a numeric series. Since NaN is a float, Pandas forces upcasting. This is natural, because the object dtype alternative is inefficient and not recommended.



If viable, you can use a sentinel value, e.g. -1 to indicate nulls:



df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)

print(df)

ID xyz
0 12345 4.56
1 -1 45.60
2 54231 987.00





share|improve this answer

























  • If your column only includes numbers, don't convert to strings! - it OP need convert numeric to strings, why not? What is wrong about it?

    – jezrael
    Nov 13 '18 at 12:40











  • @jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."

    – jpp
    Nov 13 '18 at 12:40







  • 1





    OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.

    – jezrael
    Nov 13 '18 at 12:46






  • 1





    I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.

    – Georg B
    Nov 13 '18 at 12:49






  • 1





    I donvote becasue don't convert to strings! is wrong statement.

    – jezrael
    Nov 13 '18 at 13:16














0












0








0







Specify float format when writing to csv



Since your underlying problem is output format when exporting data, no manipulation is required. Just use:



df.to_csv('file.csv', float_format='%.0f')


Since you want only specific columns to have this formatting you can use to_string:



def format_int(x):
return f'x:.0f' if x==x else ''

with open('file.csv', 'w') as fout:
fout.write(df.to_string(formatters='ID': format_int))


Keep numeric data numeric




There is a column with IDs, which consist of only numbers




If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.



Float NaN prompts upcasting



Your issue is NaN values can't coexist with integers in a numeric series. Since NaN is a float, Pandas forces upcasting. This is natural, because the object dtype alternative is inefficient and not recommended.



If viable, you can use a sentinel value, e.g. -1 to indicate nulls:



df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)

print(df)

ID xyz
0 12345 4.56
1 -1 45.60
2 54231 987.00





share|improve this answer















Specify float format when writing to csv



Since your underlying problem is output format when exporting data, no manipulation is required. Just use:



df.to_csv('file.csv', float_format='%.0f')


Since you want only specific columns to have this formatting you can use to_string:



def format_int(x):
return f'x:.0f' if x==x else ''

with open('file.csv', 'w') as fout:
fout.write(df.to_string(formatters='ID': format_int))


Keep numeric data numeric




There is a column with IDs, which consist of only numbers




If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.



Float NaN prompts upcasting



Your issue is NaN values can't coexist with integers in a numeric series. Since NaN is a float, Pandas forces upcasting. This is natural, because the object dtype alternative is inefficient and not recommended.



If viable, you can use a sentinel value, e.g. -1 to indicate nulls:



df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)

print(df)

ID xyz
0 12345 4.56
1 -1 45.60
2 54231 987.00






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 '18 at 13:06

























answered Nov 13 '18 at 12:28









jppjpp

94.3k2156108




94.3k2156108












  • If your column only includes numbers, don't convert to strings! - it OP need convert numeric to strings, why not? What is wrong about it?

    – jezrael
    Nov 13 '18 at 12:40











  • @jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."

    – jpp
    Nov 13 '18 at 12:40







  • 1





    OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.

    – jezrael
    Nov 13 '18 at 12:46






  • 1





    I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.

    – Georg B
    Nov 13 '18 at 12:49






  • 1





    I donvote becasue don't convert to strings! is wrong statement.

    – jezrael
    Nov 13 '18 at 13:16


















  • If your column only includes numbers, don't convert to strings! - it OP need convert numeric to strings, why not? What is wrong about it?

    – jezrael
    Nov 13 '18 at 12:40











  • @jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."

    – jpp
    Nov 13 '18 at 12:40







  • 1





    OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.

    – jezrael
    Nov 13 '18 at 12:46






  • 1





    I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.

    – Georg B
    Nov 13 '18 at 12:49






  • 1





    I donvote becasue don't convert to strings! is wrong statement.

    – jezrael
    Nov 13 '18 at 13:16

















If your column only includes numbers, don't convert to strings! - it OP need convert numeric to strings, why not? What is wrong about it?

– jezrael
Nov 13 '18 at 12:40





If your column only includes numbers, don't convert to strings! - it OP need convert numeric to strings, why not? What is wrong about it?

– jezrael
Nov 13 '18 at 12:40













@jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."

– jpp
Nov 13 '18 at 12:40






@jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."

– jpp
Nov 13 '18 at 12:40





1




1





OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.

– jezrael
Nov 13 '18 at 12:46





OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.

– jezrael
Nov 13 '18 at 12:46




1




1





I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.

– Georg B
Nov 13 '18 at 12:49





I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.

– Georg B
Nov 13 '18 at 12:49




1




1





I donvote becasue don't convert to strings! is wrong statement.

– jezrael
Nov 13 '18 at 13:16






I donvote becasue don't convert to strings! is wrong statement.

– jezrael
Nov 13 '18 at 13:16


















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280650%2fpandas-dataframe-interpreting-columns-as-float-instead-of-string%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

政党

天津地下鉄3号線