Read in a csv and recognize em dash (u'u2014') and en dash (u'u2013') in python










0















I am trying to bring in a file with a bunch of text with em dashes and/or en dashes, these are not to be confused with the regular hyphen (minus sign). The problem is that every time I read in this CSV, the dashes are turned into the replacement character (�). If I try to encode or decode the file I just get error messages about how utf-8 doesn't recognize the dashes. Do I just try to write to the CSV file from python? This just seems like a really dumb problem that should be easy to fix.



My code is:



df = pd.read_csv('csv file with em dash or en dash')
print(df)


My output is:



col_name
� �


I have tried replacing the dashes after it has been read in but that isn't working. I have also tried replacing the replacement character, but that hasn't worked either. My ideal solution would that the dashes would just show up how they are in the CSV file. I think is has something to do with how the file is being read into python but whenever I try an encoder/decoder, I just get errors that the dashes aren't supported.










share|improve this question

















  • 1





    python2 or python3? What happens if you write print(u"u2014")? Is the dashed outputted correct? In case you are on windows, you know of chcp, see superuser.com/questions/269818/… ?

    – quant
    Nov 15 '18 at 21:47











  • stackoverflow.com/questions/33307690/…

    – Xogle
    Nov 15 '18 at 21:47











  • You need to determine the actual encoding of the file; it seems it's not UTF-8.

    – Mark Ransom
    Nov 15 '18 at 21:50











  • This is in python 2.7.

    – mgh5021
    Nov 15 '18 at 22:07






  • 1





    @mgh5021 No, because it is python 2.7 and python 2.7 internal default encoding is not UTF-8! But at least the output of the character is already working correctly - which is not always the case ...

    – quant
    Nov 15 '18 at 22:12
















0















I am trying to bring in a file with a bunch of text with em dashes and/or en dashes, these are not to be confused with the regular hyphen (minus sign). The problem is that every time I read in this CSV, the dashes are turned into the replacement character (�). If I try to encode or decode the file I just get error messages about how utf-8 doesn't recognize the dashes. Do I just try to write to the CSV file from python? This just seems like a really dumb problem that should be easy to fix.



My code is:



df = pd.read_csv('csv file with em dash or en dash')
print(df)


My output is:



col_name
� �


I have tried replacing the dashes after it has been read in but that isn't working. I have also tried replacing the replacement character, but that hasn't worked either. My ideal solution would that the dashes would just show up how they are in the CSV file. I think is has something to do with how the file is being read into python but whenever I try an encoder/decoder, I just get errors that the dashes aren't supported.










share|improve this question

















  • 1





    python2 or python3? What happens if you write print(u"u2014")? Is the dashed outputted correct? In case you are on windows, you know of chcp, see superuser.com/questions/269818/… ?

    – quant
    Nov 15 '18 at 21:47











  • stackoverflow.com/questions/33307690/…

    – Xogle
    Nov 15 '18 at 21:47











  • You need to determine the actual encoding of the file; it seems it's not UTF-8.

    – Mark Ransom
    Nov 15 '18 at 21:50











  • This is in python 2.7.

    – mgh5021
    Nov 15 '18 at 22:07






  • 1





    @mgh5021 No, because it is python 2.7 and python 2.7 internal default encoding is not UTF-8! But at least the output of the character is already working correctly - which is not always the case ...

    – quant
    Nov 15 '18 at 22:12














0












0








0








I am trying to bring in a file with a bunch of text with em dashes and/or en dashes, these are not to be confused with the regular hyphen (minus sign). The problem is that every time I read in this CSV, the dashes are turned into the replacement character (�). If I try to encode or decode the file I just get error messages about how utf-8 doesn't recognize the dashes. Do I just try to write to the CSV file from python? This just seems like a really dumb problem that should be easy to fix.



My code is:



df = pd.read_csv('csv file with em dash or en dash')
print(df)


My output is:



col_name
� �


I have tried replacing the dashes after it has been read in but that isn't working. I have also tried replacing the replacement character, but that hasn't worked either. My ideal solution would that the dashes would just show up how they are in the CSV file. I think is has something to do with how the file is being read into python but whenever I try an encoder/decoder, I just get errors that the dashes aren't supported.










share|improve this question














I am trying to bring in a file with a bunch of text with em dashes and/or en dashes, these are not to be confused with the regular hyphen (minus sign). The problem is that every time I read in this CSV, the dashes are turned into the replacement character (�). If I try to encode or decode the file I just get error messages about how utf-8 doesn't recognize the dashes. Do I just try to write to the CSV file from python? This just seems like a really dumb problem that should be easy to fix.



My code is:



df = pd.read_csv('csv file with em dash or en dash')
print(df)


My output is:



col_name
� �


I have tried replacing the dashes after it has been read in but that isn't working. I have also tried replacing the replacement character, but that hasn't worked either. My ideal solution would that the dashes would just show up how they are in the CSV file. I think is has something to do with how the file is being read into python but whenever I try an encoder/decoder, I just get errors that the dashes aren't supported.







python unicode utf-8 ascii special-characters






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 15 '18 at 21:42









mgh5021mgh5021

414




414







  • 1





    python2 or python3? What happens if you write print(u"u2014")? Is the dashed outputted correct? In case you are on windows, you know of chcp, see superuser.com/questions/269818/… ?

    – quant
    Nov 15 '18 at 21:47











  • stackoverflow.com/questions/33307690/…

    – Xogle
    Nov 15 '18 at 21:47











  • You need to determine the actual encoding of the file; it seems it's not UTF-8.

    – Mark Ransom
    Nov 15 '18 at 21:50











  • This is in python 2.7.

    – mgh5021
    Nov 15 '18 at 22:07






  • 1





    @mgh5021 No, because it is python 2.7 and python 2.7 internal default encoding is not UTF-8! But at least the output of the character is already working correctly - which is not always the case ...

    – quant
    Nov 15 '18 at 22:12













  • 1





    python2 or python3? What happens if you write print(u"u2014")? Is the dashed outputted correct? In case you are on windows, you know of chcp, see superuser.com/questions/269818/… ?

    – quant
    Nov 15 '18 at 21:47











  • stackoverflow.com/questions/33307690/…

    – Xogle
    Nov 15 '18 at 21:47











  • You need to determine the actual encoding of the file; it seems it's not UTF-8.

    – Mark Ransom
    Nov 15 '18 at 21:50











  • This is in python 2.7.

    – mgh5021
    Nov 15 '18 at 22:07






  • 1





    @mgh5021 No, because it is python 2.7 and python 2.7 internal default encoding is not UTF-8! But at least the output of the character is already working correctly - which is not always the case ...

    – quant
    Nov 15 '18 at 22:12








1




1





python2 or python3? What happens if you write print(u"u2014")? Is the dashed outputted correct? In case you are on windows, you know of chcp, see superuser.com/questions/269818/… ?

– quant
Nov 15 '18 at 21:47





python2 or python3? What happens if you write print(u"u2014")? Is the dashed outputted correct? In case you are on windows, you know of chcp, see superuser.com/questions/269818/… ?

– quant
Nov 15 '18 at 21:47













stackoverflow.com/questions/33307690/…

– Xogle
Nov 15 '18 at 21:47





stackoverflow.com/questions/33307690/…

– Xogle
Nov 15 '18 at 21:47













You need to determine the actual encoding of the file; it seems it's not UTF-8.

– Mark Ransom
Nov 15 '18 at 21:50





You need to determine the actual encoding of the file; it seems it's not UTF-8.

– Mark Ransom
Nov 15 '18 at 21:50













This is in python 2.7.

– mgh5021
Nov 15 '18 at 22:07





This is in python 2.7.

– mgh5021
Nov 15 '18 at 22:07




1




1





@mgh5021 No, because it is python 2.7 and python 2.7 internal default encoding is not UTF-8! But at least the output of the character is already working correctly - which is not always the case ...

– quant
Nov 15 '18 at 22:12






@mgh5021 No, because it is python 2.7 and python 2.7 internal default encoding is not UTF-8! But at least the output of the character is already working correctly - which is not always the case ...

– quant
Nov 15 '18 at 22:12













0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53328297%2fread-in-a-csv-and-recognize-em-dash-u-u2014-and-en-dash-u-u2013-in-pytho%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53328297%2fread-in-a-csv-and-recognize-em-dash-u-u2014-and-en-dash-u-u2013-in-pytho%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

27

Top Tejano songwriter Luis Silva dead of heart attack at 64

Category:Rhetoric