xml not well formed due to umlaut characters.









up vote
0
down vote

favorite












I have an xml-file with declaration:



<?xml version="1.0" encoding="utf-8"?>


When I open it with 3 different editors I got the following:



enter image description here



Obviously, there are 3 different representations. Notepad gives me the correct symbol, Notepad++ gives me hexadecimal and emacs octal code.



I have a Perl code which tests if an xml is well formed. As soon the xml have these umlaut characters the xml-file is not well formed and can't be loaded in my database. When I remove all umlaut characters (and greek symbols, etc.) the xml-file is well formed and I can import the file in the database.



My goal is to have an xml-file which I can import into a database considering umlaut characters (and greek symbols, etc.).



What is the reason for this behaviour? Is it caused when the xml was created?










share|improve this question

























    up vote
    0
    down vote

    favorite












    I have an xml-file with declaration:



    <?xml version="1.0" encoding="utf-8"?>


    When I open it with 3 different editors I got the following:



    enter image description here



    Obviously, there are 3 different representations. Notepad gives me the correct symbol, Notepad++ gives me hexadecimal and emacs octal code.



    I have a Perl code which tests if an xml is well formed. As soon the xml have these umlaut characters the xml-file is not well formed and can't be loaded in my database. When I remove all umlaut characters (and greek symbols, etc.) the xml-file is well formed and I can import the file in the database.



    My goal is to have an xml-file which I can import into a database considering umlaut characters (and greek symbols, etc.).



    What is the reason for this behaviour? Is it caused when the xml was created?










    share|improve this question























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have an xml-file with declaration:



      <?xml version="1.0" encoding="utf-8"?>


      When I open it with 3 different editors I got the following:



      enter image description here



      Obviously, there are 3 different representations. Notepad gives me the correct symbol, Notepad++ gives me hexadecimal and emacs octal code.



      I have a Perl code which tests if an xml is well formed. As soon the xml have these umlaut characters the xml-file is not well formed and can't be loaded in my database. When I remove all umlaut characters (and greek symbols, etc.) the xml-file is well formed and I can import the file in the database.



      My goal is to have an xml-file which I can import into a database considering umlaut characters (and greek symbols, etc.).



      What is the reason for this behaviour? Is it caused when the xml was created?










      share|improve this question













      I have an xml-file with declaration:



      <?xml version="1.0" encoding="utf-8"?>


      When I open it with 3 different editors I got the following:



      enter image description here



      Obviously, there are 3 different representations. Notepad gives me the correct symbol, Notepad++ gives me hexadecimal and emacs octal code.



      I have a Perl code which tests if an xml is well formed. As soon the xml have these umlaut characters the xml-file is not well formed and can't be loaded in my database. When I remove all umlaut characters (and greek symbols, etc.) the xml-file is well formed and I can import the file in the database.



      My goal is to have an xml-file which I can import into a database considering umlaut characters (and greek symbols, etc.).



      What is the reason for this behaviour? Is it caused when the xml was created?







      xml utf-8 character-encoding diacritics






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 10 at 13:41









      giordano

      96421533




      96421533






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          It looks likely to me that the ä character in your input is encoded as xE4, which is the representation of the character in iso-8859-1 (and Windows CP-1252), but is not the correct representation in UTF-8. Your three editors are dealing with the inconsistency between the declared encoding in the XML declaration and the actual encoding in different ways.



          Fix the problem by ensuring that the encoding named in the XML declaration matches the actual encoding of the characters.



          The problem may have been introduced when the XML file was first created, or it may have been introduced by some process that changed the character encoding subsequently, without changing the XML declaration to match the new encoding. This could happen if the file was transcoded by a non-XML-aware process.






          share|improve this answer




















            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239556%2fxml-not-well-formed-due-to-umlaut-characters%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote



            accepted










            It looks likely to me that the ä character in your input is encoded as xE4, which is the representation of the character in iso-8859-1 (and Windows CP-1252), but is not the correct representation in UTF-8. Your three editors are dealing with the inconsistency between the declared encoding in the XML declaration and the actual encoding in different ways.



            Fix the problem by ensuring that the encoding named in the XML declaration matches the actual encoding of the characters.



            The problem may have been introduced when the XML file was first created, or it may have been introduced by some process that changed the character encoding subsequently, without changing the XML declaration to match the new encoding. This could happen if the file was transcoded by a non-XML-aware process.






            share|improve this answer
























              up vote
              2
              down vote



              accepted










              It looks likely to me that the ä character in your input is encoded as xE4, which is the representation of the character in iso-8859-1 (and Windows CP-1252), but is not the correct representation in UTF-8. Your three editors are dealing with the inconsistency between the declared encoding in the XML declaration and the actual encoding in different ways.



              Fix the problem by ensuring that the encoding named in the XML declaration matches the actual encoding of the characters.



              The problem may have been introduced when the XML file was first created, or it may have been introduced by some process that changed the character encoding subsequently, without changing the XML declaration to match the new encoding. This could happen if the file was transcoded by a non-XML-aware process.






              share|improve this answer






















                up vote
                2
                down vote



                accepted







                up vote
                2
                down vote



                accepted






                It looks likely to me that the ä character in your input is encoded as xE4, which is the representation of the character in iso-8859-1 (and Windows CP-1252), but is not the correct representation in UTF-8. Your three editors are dealing with the inconsistency between the declared encoding in the XML declaration and the actual encoding in different ways.



                Fix the problem by ensuring that the encoding named in the XML declaration matches the actual encoding of the characters.



                The problem may have been introduced when the XML file was first created, or it may have been introduced by some process that changed the character encoding subsequently, without changing the XML declaration to match the new encoding. This could happen if the file was transcoded by a non-XML-aware process.






                share|improve this answer












                It looks likely to me that the ä character in your input is encoded as xE4, which is the representation of the character in iso-8859-1 (and Windows CP-1252), but is not the correct representation in UTF-8. Your three editors are dealing with the inconsistency between the declared encoding in the XML declaration and the actual encoding in different ways.



                Fix the problem by ensuring that the encoding named in the XML declaration matches the actual encoding of the characters.



                The problem may have been introduced when the XML file was first created, or it may have been introduced by some process that changed the character encoding subsequently, without changing the XML declaration to match the new encoding. This could happen if the file was transcoded by a non-XML-aware process.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 10 at 15:46









                Michael Kay

                107k657114




                107k657114



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239556%2fxml-not-well-formed-due-to-umlaut-characters%23new-answer', 'question_page');

                    );

                    Post as a guest














































































                    Popular posts from this blog

                    Top Tejano songwriter Luis Silva dead of heart attack at 64

                    政党

                    天津地下鉄3号線