Pandas Dataframe interpreting columns as float instead of String
I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
I want to read this column as String, but even if I specifiy it with
df=pd.read_csv(filename,dtype='ID': str)
I get
ID xyz
0 '12345.0' 4.56
1 NaN 45.60
2 '54231.0' 987.00
Is there an easy way get the ID as a string without decimal like '12345'
without having to edit the Strings after importing the table?
python python-3.x pandas dataframe types
add a comment |
I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
I want to read this column as String, but even if I specifiy it with
df=pd.read_csv(filename,dtype='ID': str)
I get
ID xyz
0 '12345.0' 4.56
1 NaN 45.60
2 '54231.0' 987.00
Is there an easy way get the ID as a string without decimal like '12345'
without having to edit the Strings after importing the table?
python python-3.x pandas dataframe types
Is possible empty values in numeric columns?
– jezrael
Nov 13 '18 at 12:34
If your concern is output format, then fix this when you export the data (e.g.to_csv
,to_string
), not by changing your underlying data (which looks fine) to awkward types.
– jpp
Nov 13 '18 at 13:18
I think you can upgrade your pandas version and all working nice.
– jezrael
Nov 13 '18 at 13:18
I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.
– Georg B
Nov 13 '18 at 13:24
add a comment |
I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
I want to read this column as String, but even if I specifiy it with
df=pd.read_csv(filename,dtype='ID': str)
I get
ID xyz
0 '12345.0' 4.56
1 NaN 45.60
2 '54231.0' 987.00
Is there an easy way get the ID as a string without decimal like '12345'
without having to edit the Strings after importing the table?
python python-3.x pandas dataframe types
I want to import a csv file into a pandas dataframe. There is a column with IDs, which consist of only numbers, but not every row has an ID.
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
I want to read this column as String, but even if I specifiy it with
df=pd.read_csv(filename,dtype='ID': str)
I get
ID xyz
0 '12345.0' 4.56
1 NaN 45.60
2 '54231.0' 987.00
Is there an easy way get the ID as a string without decimal like '12345'
without having to edit the Strings after importing the table?
python python-3.x pandas dataframe types
python python-3.x pandas dataframe types
edited Nov 13 '18 at 13:16
Malik Asad
283110
283110
asked Nov 13 '18 at 12:03
Georg BGeorg B
264
264
Is possible empty values in numeric columns?
– jezrael
Nov 13 '18 at 12:34
If your concern is output format, then fix this when you export the data (e.g.to_csv
,to_string
), not by changing your underlying data (which looks fine) to awkward types.
– jpp
Nov 13 '18 at 13:18
I think you can upgrade your pandas version and all working nice.
– jezrael
Nov 13 '18 at 13:18
I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.
– Georg B
Nov 13 '18 at 13:24
add a comment |
Is possible empty values in numeric columns?
– jezrael
Nov 13 '18 at 12:34
If your concern is output format, then fix this when you export the data (e.g.to_csv
,to_string
), not by changing your underlying data (which looks fine) to awkward types.
– jpp
Nov 13 '18 at 13:18
I think you can upgrade your pandas version and all working nice.
– jezrael
Nov 13 '18 at 13:18
I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.
– Georg B
Nov 13 '18 at 13:24
Is possible empty values in numeric columns?
– jezrael
Nov 13 '18 at 12:34
Is possible empty values in numeric columns?
– jezrael
Nov 13 '18 at 12:34
If your concern is output format, then fix this when you export the data (e.g.
to_csv
, to_string
), not by changing your underlying data (which looks fine) to awkward types.– jpp
Nov 13 '18 at 13:18
If your concern is output format, then fix this when you export the data (e.g.
to_csv
, to_string
), not by changing your underlying data (which looks fine) to awkward types.– jpp
Nov 13 '18 at 13:18
I think you can upgrade your pandas version and all working nice.
– jezrael
Nov 13 '18 at 13:18
I think you can upgrade your pandas version and all working nice.
– jezrael
Nov 13 '18 at 13:18
I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.
– Georg B
Nov 13 '18 at 13:24
I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.
– Georg B
Nov 13 '18 at 13:24
add a comment |
3 Answers
3
active
oldest
votes
A solution could be this, but after you have imported the df:
df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)
Or since there are NaN
with:
df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
1
Doesn't work, because I have empty cells, and NaN values can't be converted to int
– Georg B
Nov 13 '18 at 12:26
@GeorgB I've updated the answer
– Joe
Nov 13 '18 at 12:47
1
That worked thank you. Was trying smthng similar but yours works way better
– Georg B
Nov 13 '18 at 12:54
add a comment |
Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False
for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
EDIT:
For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str)
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00
It works for your example, but not my csv file. Only difference to previous result is thatNaN
became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.
– Georg B
Nov 13 '18 at 12:46
@GeorgB - what is expected output inID
column instead empty string?
– jezrael
Nov 13 '18 at 12:50
the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.
– Georg B
Nov 13 '18 at 12:59
@GeorgB -df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
is your solution?
– jezrael
Nov 13 '18 at 13:04
@GeorgB - What is your pandas version? Because in pandas0.23.4
it working nice like you need.
– jezrael
Nov 13 '18 at 13:09
|
show 1 more comment
Specify float format when writing to csv
Since your underlying problem is output format when exporting data, no manipulation is required. Just use:
df.to_csv('file.csv', float_format='%.0f')
Since you want only specific columns to have this formatting you can use to_string
:
def format_int(x):
return f'x:.0f' if x==x else ''
with open('file.csv', 'w') as fout:
fout.write(df.to_string(formatters='ID': format_int))
Keep numeric data numeric
There is a column with IDs, which consist of only numbers
If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.
Float NaN
prompts upcasting
Your issue is NaN
values can't coexist with integers in a numeric series. Since NaN
is a float
, Pandas forces upcasting. This is natural, because the object
dtype alternative is inefficient and not recommended.
If viable, you can use a sentinel value, e.g. -1
to indicate nulls:
df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)
print(df)
ID xyz
0 12345 4.56
1 -1 45.60
2 54231 987.00
If your column only includes numbers, don't convert to strings!
- it OP need convert numeric to strings, why not? What is wrong about it?
– jezrael
Nov 13 '18 at 12:40
@jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."
– jpp
Nov 13 '18 at 12:40
1
OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.
– jezrael
Nov 13 '18 at 12:46
1
I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.
– Georg B
Nov 13 '18 at 12:49
1
I donvote becasuedon't convert to strings!
is wrong statement.
– jezrael
Nov 13 '18 at 13:16
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280650%2fpandas-dataframe-interpreting-columns-as-float-instead-of-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
A solution could be this, but after you have imported the df:
df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)
Or since there are NaN
with:
df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
1
Doesn't work, because I have empty cells, and NaN values can't be converted to int
– Georg B
Nov 13 '18 at 12:26
@GeorgB I've updated the answer
– Joe
Nov 13 '18 at 12:47
1
That worked thank you. Was trying smthng similar but yours works way better
– Georg B
Nov 13 '18 at 12:54
add a comment |
A solution could be this, but after you have imported the df:
df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)
Or since there are NaN
with:
df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
1
Doesn't work, because I have empty cells, and NaN values can't be converted to int
– Georg B
Nov 13 '18 at 12:26
@GeorgB I've updated the answer
– Joe
Nov 13 '18 at 12:47
1
That worked thank you. Was trying smthng similar but yours works way better
– Georg B
Nov 13 '18 at 12:54
add a comment |
A solution could be this, but after you have imported the df:
df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)
Or since there are NaN
with:
df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
A solution could be this, but after you have imported the df:
df = pd.read_csv(filename)
df['ID'] = df['ID'].astype(int).astype(str)
Or since there are NaN
with:
df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
edited Nov 13 '18 at 12:46
answered Nov 13 '18 at 12:23
JoeJoe
5,89621129
5,89621129
1
Doesn't work, because I have empty cells, and NaN values can't be converted to int
– Georg B
Nov 13 '18 at 12:26
@GeorgB I've updated the answer
– Joe
Nov 13 '18 at 12:47
1
That worked thank you. Was trying smthng similar but yours works way better
– Georg B
Nov 13 '18 at 12:54
add a comment |
1
Doesn't work, because I have empty cells, and NaN values can't be converted to int
– Georg B
Nov 13 '18 at 12:26
@GeorgB I've updated the answer
– Joe
Nov 13 '18 at 12:47
1
That worked thank you. Was trying smthng similar but yours works way better
– Georg B
Nov 13 '18 at 12:54
1
1
Doesn't work, because I have empty cells, and NaN values can't be converted to int
– Georg B
Nov 13 '18 at 12:26
Doesn't work, because I have empty cells, and NaN values can't be converted to int
– Georg B
Nov 13 '18 at 12:26
@GeorgB I've updated the answer
– Joe
Nov 13 '18 at 12:47
@GeorgB I've updated the answer
– Joe
Nov 13 '18 at 12:47
1
1
That worked thank you. Was trying smthng similar but yours works way better
– Georg B
Nov 13 '18 at 12:54
That worked thank you. Was trying smthng similar but yours works way better
– Georg B
Nov 13 '18 at 12:54
add a comment |
Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False
for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
EDIT:
For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str)
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00
It works for your example, but not my csv file. Only difference to previous result is thatNaN
became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.
– Georg B
Nov 13 '18 at 12:46
@GeorgB - what is expected output inID
column instead empty string?
– jezrael
Nov 13 '18 at 12:50
the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.
– Georg B
Nov 13 '18 at 12:59
@GeorgB -df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
is your solution?
– jezrael
Nov 13 '18 at 13:04
@GeorgB - What is your pandas version? Because in pandas0.23.4
it working nice like you need.
– jezrael
Nov 13 '18 at 13:09
|
show 1 more comment
Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False
for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
EDIT:
For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str)
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00
It works for your example, but not my csv file. Only difference to previous result is thatNaN
became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.
– Georg B
Nov 13 '18 at 12:46
@GeorgB - what is expected output inID
column instead empty string?
– jezrael
Nov 13 '18 at 12:50
the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.
– Georg B
Nov 13 '18 at 12:59
@GeorgB -df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
is your solution?
– jezrael
Nov 13 '18 at 13:04
@GeorgB - What is your pandas version? Because in pandas0.23.4
it working nice like you need.
– jezrael
Nov 13 '18 at 13:09
|
show 1 more comment
Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False
for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
EDIT:
For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str)
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00
Possible solution if missing values are not in numeric columns - ad parameter keep_default_na=False
for not convert empty values to strings, but it NOT convert to NaNs in all data, not always in first column, check also docs:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str, keep_default_na=False)
print (df)
ID xyz
0 12345 4.56
1 45.60
2 54231 987.00
EDIT:
For me in pandas 0.23.4 working your solution perfectly, so it means bug in lower pandas versions:
import pandas as pd
temp=u"""ID;xyz
0;12345;4.56
1;;45.60
2;54231;987.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", dtype='ID': str)
print (df)
ID xyz
0 12345 4.56
1 NaN 45.60
2 54231 987.00
edited Nov 13 '18 at 13:08
answered Nov 13 '18 at 12:30
jezraeljezrael
324k22266342
324k22266342
It works for your example, but not my csv file. Only difference to previous result is thatNaN
became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.
– Georg B
Nov 13 '18 at 12:46
@GeorgB - what is expected output inID
column instead empty string?
– jezrael
Nov 13 '18 at 12:50
the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.
– Georg B
Nov 13 '18 at 12:59
@GeorgB -df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
is your solution?
– jezrael
Nov 13 '18 at 13:04
@GeorgB - What is your pandas version? Because in pandas0.23.4
it working nice like you need.
– jezrael
Nov 13 '18 at 13:09
|
show 1 more comment
It works for your example, but not my csv file. Only difference to previous result is thatNaN
became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.
– Georg B
Nov 13 '18 at 12:46
@GeorgB - what is expected output inID
column instead empty string?
– jezrael
Nov 13 '18 at 12:50
the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.
– Georg B
Nov 13 '18 at 12:59
@GeorgB -df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
is your solution?
– jezrael
Nov 13 '18 at 13:04
@GeorgB - What is your pandas version? Because in pandas0.23.4
it working nice like you need.
– jezrael
Nov 13 '18 at 13:09
It works for your example, but not my csv file. Only difference to previous result is that
NaN
became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.– Georg B
Nov 13 '18 at 12:46
It works for your example, but not my csv file. Only difference to previous result is that
NaN
became an empty string. I am really confused, I checked my file again, but there are definetly no floats there.– Georg B
Nov 13 '18 at 12:46
@GeorgB - what is expected output in
ID
column instead empty string?– jezrael
Nov 13 '18 at 12:50
@GeorgB - what is expected output in
ID
column instead empty string?– jezrael
Nov 13 '18 at 12:50
the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.
– Georg B
Nov 13 '18 at 12:59
the empty columns don't matter, as long as I have an easy way to filter them out. I only need the non empty IDs as String without a ".0" at the end. The user Joe gave an answer that worked, so I can continue. Just have the feeling there is a way to do it while reading in the file and not afterwards.
– Georg B
Nov 13 '18 at 12:59
@GeorgB -
df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
is your solution?– jezrael
Nov 13 '18 at 13:04
@GeorgB -
df['ID'] = df['ID'].apply(lambda x: x if pd.isnull(x) else str(int(x)))
is your solution?– jezrael
Nov 13 '18 at 13:04
@GeorgB - What is your pandas version? Because in pandas
0.23.4
it working nice like you need.– jezrael
Nov 13 '18 at 13:09
@GeorgB - What is your pandas version? Because in pandas
0.23.4
it working nice like you need.– jezrael
Nov 13 '18 at 13:09
|
show 1 more comment
Specify float format when writing to csv
Since your underlying problem is output format when exporting data, no manipulation is required. Just use:
df.to_csv('file.csv', float_format='%.0f')
Since you want only specific columns to have this formatting you can use to_string
:
def format_int(x):
return f'x:.0f' if x==x else ''
with open('file.csv', 'w') as fout:
fout.write(df.to_string(formatters='ID': format_int))
Keep numeric data numeric
There is a column with IDs, which consist of only numbers
If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.
Float NaN
prompts upcasting
Your issue is NaN
values can't coexist with integers in a numeric series. Since NaN
is a float
, Pandas forces upcasting. This is natural, because the object
dtype alternative is inefficient and not recommended.
If viable, you can use a sentinel value, e.g. -1
to indicate nulls:
df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)
print(df)
ID xyz
0 12345 4.56
1 -1 45.60
2 54231 987.00
If your column only includes numbers, don't convert to strings!
- it OP need convert numeric to strings, why not? What is wrong about it?
– jezrael
Nov 13 '18 at 12:40
@jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."
– jpp
Nov 13 '18 at 12:40
1
OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.
– jezrael
Nov 13 '18 at 12:46
1
I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.
– Georg B
Nov 13 '18 at 12:49
1
I donvote becasuedon't convert to strings!
is wrong statement.
– jezrael
Nov 13 '18 at 13:16
add a comment |
Specify float format when writing to csv
Since your underlying problem is output format when exporting data, no manipulation is required. Just use:
df.to_csv('file.csv', float_format='%.0f')
Since you want only specific columns to have this formatting you can use to_string
:
def format_int(x):
return f'x:.0f' if x==x else ''
with open('file.csv', 'w') as fout:
fout.write(df.to_string(formatters='ID': format_int))
Keep numeric data numeric
There is a column with IDs, which consist of only numbers
If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.
Float NaN
prompts upcasting
Your issue is NaN
values can't coexist with integers in a numeric series. Since NaN
is a float
, Pandas forces upcasting. This is natural, because the object
dtype alternative is inefficient and not recommended.
If viable, you can use a sentinel value, e.g. -1
to indicate nulls:
df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)
print(df)
ID xyz
0 12345 4.56
1 -1 45.60
2 54231 987.00
If your column only includes numbers, don't convert to strings!
- it OP need convert numeric to strings, why not? What is wrong about it?
– jezrael
Nov 13 '18 at 12:40
@jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."
– jpp
Nov 13 '18 at 12:40
1
OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.
– jezrael
Nov 13 '18 at 12:46
1
I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.
– Georg B
Nov 13 '18 at 12:49
1
I donvote becasuedon't convert to strings!
is wrong statement.
– jezrael
Nov 13 '18 at 13:16
add a comment |
Specify float format when writing to csv
Since your underlying problem is output format when exporting data, no manipulation is required. Just use:
df.to_csv('file.csv', float_format='%.0f')
Since you want only specific columns to have this formatting you can use to_string
:
def format_int(x):
return f'x:.0f' if x==x else ''
with open('file.csv', 'w') as fout:
fout.write(df.to_string(formatters='ID': format_int))
Keep numeric data numeric
There is a column with IDs, which consist of only numbers
If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.
Float NaN
prompts upcasting
Your issue is NaN
values can't coexist with integers in a numeric series. Since NaN
is a float
, Pandas forces upcasting. This is natural, because the object
dtype alternative is inefficient and not recommended.
If viable, you can use a sentinel value, e.g. -1
to indicate nulls:
df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)
print(df)
ID xyz
0 12345 4.56
1 -1 45.60
2 54231 987.00
Specify float format when writing to csv
Since your underlying problem is output format when exporting data, no manipulation is required. Just use:
df.to_csv('file.csv', float_format='%.0f')
Since you want only specific columns to have this formatting you can use to_string
:
def format_int(x):
return f'x:.0f' if x==x else ''
with open('file.csv', 'w') as fout:
fout.write(df.to_string(formatters='ID': format_int))
Keep numeric data numeric
There is a column with IDs, which consist of only numbers
If your column only includes numbers, don't convert to strings! Your desire to convert to strings seems an XY problem. Numeric identifiers should stay numeric.
Float NaN
prompts upcasting
Your issue is NaN
values can't coexist with integers in a numeric series. Since NaN
is a float
, Pandas forces upcasting. This is natural, because the object
dtype alternative is inefficient and not recommended.
If viable, you can use a sentinel value, e.g. -1
to indicate nulls:
df['ID'] = pd.to_numeric(df['ID'], errors='coerce').fillna(-1).astype(int)
print(df)
ID xyz
0 12345 4.56
1 -1 45.60
2 54231 987.00
edited Nov 13 '18 at 13:06
answered Nov 13 '18 at 12:28
jppjpp
94.3k2156108
94.3k2156108
If your column only includes numbers, don't convert to strings!
- it OP need convert numeric to strings, why not? What is wrong about it?
– jezrael
Nov 13 '18 at 12:40
@jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."
– jpp
Nov 13 '18 at 12:40
1
OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.
– jezrael
Nov 13 '18 at 12:46
1
I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.
– Georg B
Nov 13 '18 at 12:49
1
I donvote becasuedon't convert to strings!
is wrong statement.
– jezrael
Nov 13 '18 at 13:16
add a comment |
If your column only includes numbers, don't convert to strings!
- it OP need convert numeric to strings, why not? What is wrong about it?
– jezrael
Nov 13 '18 at 12:40
@jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."
– jpp
Nov 13 '18 at 12:40
1
OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.
– jezrael
Nov 13 '18 at 12:46
1
I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.
– Georg B
Nov 13 '18 at 12:49
1
I donvote becasuedon't convert to strings!
is wrong statement.
– jezrael
Nov 13 '18 at 13:16
If your column only includes numbers, don't convert to strings!
- it OP need convert numeric to strings, why not? What is wrong about it?– jezrael
Nov 13 '18 at 12:40
If your column only includes numbers, don't convert to strings!
- it OP need convert numeric to strings, why not? What is wrong about it?– jezrael
Nov 13 '18 at 12:40
@jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."
– jpp
Nov 13 '18 at 12:40
@jezrael, XY problem: "The XY problem is asking about your attempted solution rather than your actual problem."
– jpp
Nov 13 '18 at 12:40
1
1
OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.
– jezrael
Nov 13 '18 at 12:46
OK, please add your commnet about XY problem to comment under question, but if need convert to strings numeric column it is absolutely not wrong.
– jezrael
Nov 13 '18 at 12:46
1
1
I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.
– Georg B
Nov 13 '18 at 12:49
I need them as strings, or at least integers that can be converted to strings. I will try your method if I dont find another option, but I'll have to remove the -1 every time I save the file.
– Georg B
Nov 13 '18 at 12:49
1
1
I donvote becasue
don't convert to strings!
is wrong statement.– jezrael
Nov 13 '18 at 13:16
I donvote becasue
don't convert to strings!
is wrong statement.– jezrael
Nov 13 '18 at 13:16
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280650%2fpandas-dataframe-interpreting-columns-as-float-instead-of-string%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is possible empty values in numeric columns?
– jezrael
Nov 13 '18 at 12:34
If your concern is output format, then fix this when you export the data (e.g.
to_csv
,to_string
), not by changing your underlying data (which looks fine) to awkward types.– jpp
Nov 13 '18 at 13:18
I think you can upgrade your pandas version and all working nice.
– jezrael
Nov 13 '18 at 13:18
I mean my underlying data is a csv file with an ID that is not ment to be taken numeric but as the name suggest as an identification. String seems to be the best representation for that.
– Georg B
Nov 13 '18 at 13:24