Fgets writes different strings from the same file in Linux and Windows









up vote
0
down vote

favorite












I've just come across an issue where I was jumping between valgrind in Linux and other testing in Windows cmd.



I'm reading a certain line from a file like this:



fgets(buf, MAX_LINE_LEN, f_input);


Of course, buf is the size MAX_LINE_LEN + 1, but I digress.



This is the output of



printf("String length: %u; Contents: ", strlen(buf));
for (usint i = 0; i < strlen(buf); i++)
printf("%x ", buf[i]);
puts(";");


in Windows:



String length: 14; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 a ;
String length: 22; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 a ;
String length: 25; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 a ;
String length: 24; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 a ;
String length: 21; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 a ;
String length: 15; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 a ;
String length: 17; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 a ;


and in Linux:



String length: 15; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 d a ;
String length: 23; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 d a ;
String length: 26; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 d a ;
String length: 25; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 d a ;
String length: 22; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 d a ;
String length: 16; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 d a ;
String length: 18; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 d a ;


As you can see in Linux, there is another character before the NL, a Carriage Return. If anyone can explain this and save me the pain of adding ifdef statements for a Linux and Windows code, I'd appreciate it. I understand, that linux appends a Carriage Return after each line, but is this really the intended behaviour when it then gets read by fgets?










share|improve this question



















  • 2




    CRLF vs NL line endings. Windows uses two characters, 'r' and 'n' at the end of a line; Unix uses just 'n'. And on Windows, the I/O system maps the CRLF to 'n' only on input, but Linux doesn't because 'r' is just another control character to Unix. ('r' typically maps to control-M or 0x0D; 'n' typically maps to control-J or 0x0A.)
    – Jonathan Leffler
    Nov 10 at 22:35











  • "Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
    – Tim
    Nov 10 at 22:36











  • Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
    – Rodney
    Nov 10 at 22:38











  • @Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part, buf is actually the size of MAX_LINE_LEN.
    – areuz
    Nov 10 at 22:39














up vote
0
down vote

favorite












I've just come across an issue where I was jumping between valgrind in Linux and other testing in Windows cmd.



I'm reading a certain line from a file like this:



fgets(buf, MAX_LINE_LEN, f_input);


Of course, buf is the size MAX_LINE_LEN + 1, but I digress.



This is the output of



printf("String length: %u; Contents: ", strlen(buf));
for (usint i = 0; i < strlen(buf); i++)
printf("%x ", buf[i]);
puts(";");


in Windows:



String length: 14; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 a ;
String length: 22; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 a ;
String length: 25; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 a ;
String length: 24; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 a ;
String length: 21; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 a ;
String length: 15; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 a ;
String length: 17; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 a ;


and in Linux:



String length: 15; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 d a ;
String length: 23; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 d a ;
String length: 26; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 d a ;
String length: 25; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 d a ;
String length: 22; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 d a ;
String length: 16; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 d a ;
String length: 18; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 d a ;


As you can see in Linux, there is another character before the NL, a Carriage Return. If anyone can explain this and save me the pain of adding ifdef statements for a Linux and Windows code, I'd appreciate it. I understand, that linux appends a Carriage Return after each line, but is this really the intended behaviour when it then gets read by fgets?










share|improve this question



















  • 2




    CRLF vs NL line endings. Windows uses two characters, 'r' and 'n' at the end of a line; Unix uses just 'n'. And on Windows, the I/O system maps the CRLF to 'n' only on input, but Linux doesn't because 'r' is just another control character to Unix. ('r' typically maps to control-M or 0x0D; 'n' typically maps to control-J or 0x0A.)
    – Jonathan Leffler
    Nov 10 at 22:35











  • "Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
    – Tim
    Nov 10 at 22:36











  • Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
    – Rodney
    Nov 10 at 22:38











  • @Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part, buf is actually the size of MAX_LINE_LEN.
    – areuz
    Nov 10 at 22:39












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I've just come across an issue where I was jumping between valgrind in Linux and other testing in Windows cmd.



I'm reading a certain line from a file like this:



fgets(buf, MAX_LINE_LEN, f_input);


Of course, buf is the size MAX_LINE_LEN + 1, but I digress.



This is the output of



printf("String length: %u; Contents: ", strlen(buf));
for (usint i = 0; i < strlen(buf); i++)
printf("%x ", buf[i]);
puts(";");


in Windows:



String length: 14; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 a ;
String length: 22; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 a ;
String length: 25; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 a ;
String length: 24; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 a ;
String length: 21; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 a ;
String length: 15; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 a ;
String length: 17; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 a ;


and in Linux:



String length: 15; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 d a ;
String length: 23; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 d a ;
String length: 26; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 d a ;
String length: 25; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 d a ;
String length: 22; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 d a ;
String length: 16; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 d a ;
String length: 18; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 d a ;


As you can see in Linux, there is another character before the NL, a Carriage Return. If anyone can explain this and save me the pain of adding ifdef statements for a Linux and Windows code, I'd appreciate it. I understand, that linux appends a Carriage Return after each line, but is this really the intended behaviour when it then gets read by fgets?










share|improve this question















I've just come across an issue where I was jumping between valgrind in Linux and other testing in Windows cmd.



I'm reading a certain line from a file like this:



fgets(buf, MAX_LINE_LEN, f_input);


Of course, buf is the size MAX_LINE_LEN + 1, but I digress.



This is the output of



printf("String length: %u; Contents: ", strlen(buf));
for (usint i = 0; i < strlen(buf); i++)
printf("%x ", buf[i]);
puts(";");


in Windows:



String length: 14; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 a ;
String length: 22; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 a ;
String length: 25; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 a ;
String length: 24; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 a ;
String length: 21; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 a ;
String length: 15; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 a ;
String length: 17; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 a ;


and in Linux:



String length: 15; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 d a ;
String length: 23; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 d a ;
String length: 26; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 d a ;
String length: 25; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 d a ;
String length: 22; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 d a ;
String length: 16; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 d a ;
String length: 18; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 d a ;


As you can see in Linux, there is another character before the NL, a Carriage Return. If anyone can explain this and save me the pain of adding ifdef statements for a Linux and Windows code, I'd appreciate it. I understand, that linux appends a Carriage Return after each line, but is this really the intended behaviour when it then gets read by fgets?







c linux fgets






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 22:52

























asked Nov 10 at 22:30









areuz

322211




322211







  • 2




    CRLF vs NL line endings. Windows uses two characters, 'r' and 'n' at the end of a line; Unix uses just 'n'. And on Windows, the I/O system maps the CRLF to 'n' only on input, but Linux doesn't because 'r' is just another control character to Unix. ('r' typically maps to control-M or 0x0D; 'n' typically maps to control-J or 0x0A.)
    – Jonathan Leffler
    Nov 10 at 22:35











  • "Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
    – Tim
    Nov 10 at 22:36











  • Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
    – Rodney
    Nov 10 at 22:38











  • @Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part, buf is actually the size of MAX_LINE_LEN.
    – areuz
    Nov 10 at 22:39












  • 2




    CRLF vs NL line endings. Windows uses two characters, 'r' and 'n' at the end of a line; Unix uses just 'n'. And on Windows, the I/O system maps the CRLF to 'n' only on input, but Linux doesn't because 'r' is just another control character to Unix. ('r' typically maps to control-M or 0x0D; 'n' typically maps to control-J or 0x0A.)
    – Jonathan Leffler
    Nov 10 at 22:35











  • "Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
    – Tim
    Nov 10 at 22:36











  • Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
    – Rodney
    Nov 10 at 22:38











  • @Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part, buf is actually the size of MAX_LINE_LEN.
    – areuz
    Nov 10 at 22:39







2




2




CRLF vs NL line endings. Windows uses two characters, 'r' and 'n' at the end of a line; Unix uses just 'n'. And on Windows, the I/O system maps the CRLF to 'n' only on input, but Linux doesn't because 'r' is just another control character to Unix. ('r' typically maps to control-M or 0x0D; 'n' typically maps to control-J or 0x0A.)
– Jonathan Leffler
Nov 10 at 22:35





CRLF vs NL line endings. Windows uses two characters, 'r' and 'n' at the end of a line; Unix uses just 'n'. And on Windows, the I/O system maps the CRLF to 'n' only on input, but Linux doesn't because 'r' is just another control character to Unix. ('r' typically maps to control-M or 0x0D; 'n' typically maps to control-J or 0x0A.)
– Jonathan Leffler
Nov 10 at 22:35













"Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
– Tim
Nov 10 at 22:36





"Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
– Tim
Nov 10 at 22:36













Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
– Rodney
Nov 10 at 22:38





Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
– Rodney
Nov 10 at 22:38













@Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part, buf is actually the size of MAX_LINE_LEN.
– areuz
Nov 10 at 22:39




@Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part, buf is actually the size of MAX_LINE_LEN.
– areuz
Nov 10 at 22:39












3 Answers
3






active

oldest

votes

















up vote
2
down vote














As you can see in Linux, there is another character before the NL, a Carriage Return.




That is because your files use CR+LF newlines, i.e. each newline is actually two characters: "rn".



If you open files without the "b" flag in Windows, its C library will convert each n you write to rn, and each rn you read to n.



Use the "b" fopen() flag in Windows to see the actual file contents.



When you read a line using fgets(buf, sizeof buf, handle), you can use buf[strcspn(buf, "rn")] = ''; to remove the newline.






share|improve this answer




















  • I like the use of the b flag the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
    – areuz
    Nov 10 at 22:47







  • 1




    @areuz: You can also use len = strcspn(buf, "rn"); to obtain the length of the line excluding the newline, instead of len = strlen(buf);, when copying. If you use a temporary char pointer char *p = fgets(buf, sizeof buf, f_input);, you can skip leading whitespace using p += strspn(p, "tnvfr "); and find the length of the rest of the line excluding newline using len = strcspn(p, "rn");. There is no strrcspn(), so to remove trailing whitespace, you need e.g. while (len > 1 && isspace(p[len-1])) len--;. Then, copy just len chars starting at p.
    – Nominal Animal
    Nov 10 at 22:53

















up vote
2
down vote













MS and Linux has a different expectation of a text file line ending:"rn" vs "n".



To cope, recommend after fgets() use strcspn() to lop off the potential end of line sequence, be it "n", "rn" or missing.



fgets(buf, MAX_LINE_LEN, f_input);
buf[strcspn(buf, "nr")] = '';



Some compilers on Windows will use "n" as the end-of-line sequence and others use "rn". So I attribute the variation to compilers and their manufacturers more so than the OS. Also some old MAC text files end with 'r' and will foul fgets() on Linux.



Further: reading a file that has "rn" as a text file that expects "n" as the end-of-line sequence has a problem when reading a full buffer as "......r" and the line remainder as "n" on the next fgets(). Additional processing is needed to cope as is the case whenever the buffer is insufficient for a line of input.



Text files of one variation are often copied to the other platforms, so this is a not-so-rare occurrence.



Due to editing, some text files will have a mixture of line-ending-sequences.



Pedantic code will read the file as binary and process variant line endings itself without fgets(). Good luck.






share|improve this answer





























    up vote
    1
    down vote













    In C you an open a file stream in text or binary mode. In binary mode, no translation takes place, and the input and output are the bytes in the file. In text mode, the C "newline" character is translated into what is common on the platform in question. One UNIX-like systems, this is a 0A byte, and on DOS-like systems this is a 0D byte followed by a 0A byte. There are other cases on other operating systems, listed here:



    https://en.wikipedia.org/wiki/Newline



    So that you don't have to cope with every different text format in every program, these all get translated into an n character as far as the C program sees in the default case (text mode). The input/output layer does the necessary translations for you.



    When you use fopen() to open a file stream in C for reading or writing, you provide a "file mode" parameter - you're probably using it here as "r" to read a file, or "w" to write one. If you want to newline translation done you can specify that the stream is opened in binary mode, with "rb" for reading or "wb" for writing.






    share|improve this answer






















    • "you don't have to cope with every different text format in every program, these all get translated into an n character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
      – chux
      Nov 10 at 23:40










    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244070%2ffgets-writes-different-strings-from-the-same-file-in-linux-and-windows%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote














    As you can see in Linux, there is another character before the NL, a Carriage Return.




    That is because your files use CR+LF newlines, i.e. each newline is actually two characters: "rn".



    If you open files without the "b" flag in Windows, its C library will convert each n you write to rn, and each rn you read to n.



    Use the "b" fopen() flag in Windows to see the actual file contents.



    When you read a line using fgets(buf, sizeof buf, handle), you can use buf[strcspn(buf, "rn")] = ''; to remove the newline.






    share|improve this answer




















    • I like the use of the b flag the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
      – areuz
      Nov 10 at 22:47







    • 1




      @areuz: You can also use len = strcspn(buf, "rn"); to obtain the length of the line excluding the newline, instead of len = strlen(buf);, when copying. If you use a temporary char pointer char *p = fgets(buf, sizeof buf, f_input);, you can skip leading whitespace using p += strspn(p, "tnvfr "); and find the length of the rest of the line excluding newline using len = strcspn(p, "rn");. There is no strrcspn(), so to remove trailing whitespace, you need e.g. while (len > 1 && isspace(p[len-1])) len--;. Then, copy just len chars starting at p.
      – Nominal Animal
      Nov 10 at 22:53














    up vote
    2
    down vote














    As you can see in Linux, there is another character before the NL, a Carriage Return.




    That is because your files use CR+LF newlines, i.e. each newline is actually two characters: "rn".



    If you open files without the "b" flag in Windows, its C library will convert each n you write to rn, and each rn you read to n.



    Use the "b" fopen() flag in Windows to see the actual file contents.



    When you read a line using fgets(buf, sizeof buf, handle), you can use buf[strcspn(buf, "rn")] = ''; to remove the newline.






    share|improve this answer




















    • I like the use of the b flag the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
      – areuz
      Nov 10 at 22:47







    • 1




      @areuz: You can also use len = strcspn(buf, "rn"); to obtain the length of the line excluding the newline, instead of len = strlen(buf);, when copying. If you use a temporary char pointer char *p = fgets(buf, sizeof buf, f_input);, you can skip leading whitespace using p += strspn(p, "tnvfr "); and find the length of the rest of the line excluding newline using len = strcspn(p, "rn");. There is no strrcspn(), so to remove trailing whitespace, you need e.g. while (len > 1 && isspace(p[len-1])) len--;. Then, copy just len chars starting at p.
      – Nominal Animal
      Nov 10 at 22:53












    up vote
    2
    down vote










    up vote
    2
    down vote










    As you can see in Linux, there is another character before the NL, a Carriage Return.




    That is because your files use CR+LF newlines, i.e. each newline is actually two characters: "rn".



    If you open files without the "b" flag in Windows, its C library will convert each n you write to rn, and each rn you read to n.



    Use the "b" fopen() flag in Windows to see the actual file contents.



    When you read a line using fgets(buf, sizeof buf, handle), you can use buf[strcspn(buf, "rn")] = ''; to remove the newline.






    share|improve this answer













    As you can see in Linux, there is another character before the NL, a Carriage Return.




    That is because your files use CR+LF newlines, i.e. each newline is actually two characters: "rn".



    If you open files without the "b" flag in Windows, its C library will convert each n you write to rn, and each rn you read to n.



    Use the "b" fopen() flag in Windows to see the actual file contents.



    When you read a line using fgets(buf, sizeof buf, handle), you can use buf[strcspn(buf, "rn")] = ''; to remove the newline.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 10 at 22:39









    Nominal Animal

    27.9k33259




    27.9k33259











    • I like the use of the b flag the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
      – areuz
      Nov 10 at 22:47







    • 1




      @areuz: You can also use len = strcspn(buf, "rn"); to obtain the length of the line excluding the newline, instead of len = strlen(buf);, when copying. If you use a temporary char pointer char *p = fgets(buf, sizeof buf, f_input);, you can skip leading whitespace using p += strspn(p, "tnvfr "); and find the length of the rest of the line excluding newline using len = strcspn(p, "rn");. There is no strrcspn(), so to remove trailing whitespace, you need e.g. while (len > 1 && isspace(p[len-1])) len--;. Then, copy just len chars starting at p.
      – Nominal Animal
      Nov 10 at 22:53
















    • I like the use of the b flag the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
      – areuz
      Nov 10 at 22:47







    • 1




      @areuz: You can also use len = strcspn(buf, "rn"); to obtain the length of the line excluding the newline, instead of len = strlen(buf);, when copying. If you use a temporary char pointer char *p = fgets(buf, sizeof buf, f_input);, you can skip leading whitespace using p += strspn(p, "tnvfr "); and find the length of the rest of the line excluding newline using len = strcspn(p, "rn");. There is no strrcspn(), so to remove trailing whitespace, you need e.g. while (len > 1 && isspace(p[len-1])) len--;. Then, copy just len chars starting at p.
      – Nominal Animal
      Nov 10 at 22:53















    I like the use of the b flag the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
    – areuz
    Nov 10 at 22:47





    I like the use of the b flag the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
    – areuz
    Nov 10 at 22:47





    1




    1




    @areuz: You can also use len = strcspn(buf, "rn"); to obtain the length of the line excluding the newline, instead of len = strlen(buf);, when copying. If you use a temporary char pointer char *p = fgets(buf, sizeof buf, f_input);, you can skip leading whitespace using p += strspn(p, "tnvfr "); and find the length of the rest of the line excluding newline using len = strcspn(p, "rn");. There is no strrcspn(), so to remove trailing whitespace, you need e.g. while (len > 1 && isspace(p[len-1])) len--;. Then, copy just len chars starting at p.
    – Nominal Animal
    Nov 10 at 22:53




    @areuz: You can also use len = strcspn(buf, "rn"); to obtain the length of the line excluding the newline, instead of len = strlen(buf);, when copying. If you use a temporary char pointer char *p = fgets(buf, sizeof buf, f_input);, you can skip leading whitespace using p += strspn(p, "tnvfr "); and find the length of the rest of the line excluding newline using len = strcspn(p, "rn");. There is no strrcspn(), so to remove trailing whitespace, you need e.g. while (len > 1 && isspace(p[len-1])) len--;. Then, copy just len chars starting at p.
    – Nominal Animal
    Nov 10 at 22:53












    up vote
    2
    down vote













    MS and Linux has a different expectation of a text file line ending:"rn" vs "n".



    To cope, recommend after fgets() use strcspn() to lop off the potential end of line sequence, be it "n", "rn" or missing.



    fgets(buf, MAX_LINE_LEN, f_input);
    buf[strcspn(buf, "nr")] = '';



    Some compilers on Windows will use "n" as the end-of-line sequence and others use "rn". So I attribute the variation to compilers and their manufacturers more so than the OS. Also some old MAC text files end with 'r' and will foul fgets() on Linux.



    Further: reading a file that has "rn" as a text file that expects "n" as the end-of-line sequence has a problem when reading a full buffer as "......r" and the line remainder as "n" on the next fgets(). Additional processing is needed to cope as is the case whenever the buffer is insufficient for a line of input.



    Text files of one variation are often copied to the other platforms, so this is a not-so-rare occurrence.



    Due to editing, some text files will have a mixture of line-ending-sequences.



    Pedantic code will read the file as binary and process variant line endings itself without fgets(). Good luck.






    share|improve this answer


























      up vote
      2
      down vote













      MS and Linux has a different expectation of a text file line ending:"rn" vs "n".



      To cope, recommend after fgets() use strcspn() to lop off the potential end of line sequence, be it "n", "rn" or missing.



      fgets(buf, MAX_LINE_LEN, f_input);
      buf[strcspn(buf, "nr")] = '';



      Some compilers on Windows will use "n" as the end-of-line sequence and others use "rn". So I attribute the variation to compilers and their manufacturers more so than the OS. Also some old MAC text files end with 'r' and will foul fgets() on Linux.



      Further: reading a file that has "rn" as a text file that expects "n" as the end-of-line sequence has a problem when reading a full buffer as "......r" and the line remainder as "n" on the next fgets(). Additional processing is needed to cope as is the case whenever the buffer is insufficient for a line of input.



      Text files of one variation are often copied to the other platforms, so this is a not-so-rare occurrence.



      Due to editing, some text files will have a mixture of line-ending-sequences.



      Pedantic code will read the file as binary and process variant line endings itself without fgets(). Good luck.






      share|improve this answer
























        up vote
        2
        down vote










        up vote
        2
        down vote









        MS and Linux has a different expectation of a text file line ending:"rn" vs "n".



        To cope, recommend after fgets() use strcspn() to lop off the potential end of line sequence, be it "n", "rn" or missing.



        fgets(buf, MAX_LINE_LEN, f_input);
        buf[strcspn(buf, "nr")] = '';



        Some compilers on Windows will use "n" as the end-of-line sequence and others use "rn". So I attribute the variation to compilers and their manufacturers more so than the OS. Also some old MAC text files end with 'r' and will foul fgets() on Linux.



        Further: reading a file that has "rn" as a text file that expects "n" as the end-of-line sequence has a problem when reading a full buffer as "......r" and the line remainder as "n" on the next fgets(). Additional processing is needed to cope as is the case whenever the buffer is insufficient for a line of input.



        Text files of one variation are often copied to the other platforms, so this is a not-so-rare occurrence.



        Due to editing, some text files will have a mixture of line-ending-sequences.



        Pedantic code will read the file as binary and process variant line endings itself without fgets(). Good luck.






        share|improve this answer














        MS and Linux has a different expectation of a text file line ending:"rn" vs "n".



        To cope, recommend after fgets() use strcspn() to lop off the potential end of line sequence, be it "n", "rn" or missing.



        fgets(buf, MAX_LINE_LEN, f_input);
        buf[strcspn(buf, "nr")] = '';



        Some compilers on Windows will use "n" as the end-of-line sequence and others use "rn". So I attribute the variation to compilers and their manufacturers more so than the OS. Also some old MAC text files end with 'r' and will foul fgets() on Linux.



        Further: reading a file that has "rn" as a text file that expects "n" as the end-of-line sequence has a problem when reading a full buffer as "......r" and the line remainder as "n" on the next fgets(). Additional processing is needed to cope as is the case whenever the buffer is insufficient for a line of input.



        Text files of one variation are often copied to the other platforms, so this is a not-so-rare occurrence.



        Due to editing, some text files will have a mixture of line-ending-sequences.



        Pedantic code will read the file as binary and process variant line endings itself without fgets(). Good luck.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 10 at 23:32

























        answered Nov 10 at 22:38









        chux

        78.6k869144




        78.6k869144




















            up vote
            1
            down vote













            In C you an open a file stream in text or binary mode. In binary mode, no translation takes place, and the input and output are the bytes in the file. In text mode, the C "newline" character is translated into what is common on the platform in question. One UNIX-like systems, this is a 0A byte, and on DOS-like systems this is a 0D byte followed by a 0A byte. There are other cases on other operating systems, listed here:



            https://en.wikipedia.org/wiki/Newline



            So that you don't have to cope with every different text format in every program, these all get translated into an n character as far as the C program sees in the default case (text mode). The input/output layer does the necessary translations for you.



            When you use fopen() to open a file stream in C for reading or writing, you provide a "file mode" parameter - you're probably using it here as "r" to read a file, or "w" to write one. If you want to newline translation done you can specify that the stream is opened in binary mode, with "rb" for reading or "wb" for writing.






            share|improve this answer






















            • "you don't have to cope with every different text format in every program, these all get translated into an n character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
              – chux
              Nov 10 at 23:40














            up vote
            1
            down vote













            In C you an open a file stream in text or binary mode. In binary mode, no translation takes place, and the input and output are the bytes in the file. In text mode, the C "newline" character is translated into what is common on the platform in question. One UNIX-like systems, this is a 0A byte, and on DOS-like systems this is a 0D byte followed by a 0A byte. There are other cases on other operating systems, listed here:



            https://en.wikipedia.org/wiki/Newline



            So that you don't have to cope with every different text format in every program, these all get translated into an n character as far as the C program sees in the default case (text mode). The input/output layer does the necessary translations for you.



            When you use fopen() to open a file stream in C for reading or writing, you provide a "file mode" parameter - you're probably using it here as "r" to read a file, or "w" to write one. If you want to newline translation done you can specify that the stream is opened in binary mode, with "rb" for reading or "wb" for writing.






            share|improve this answer






















            • "you don't have to cope with every different text format in every program, these all get translated into an n character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
              – chux
              Nov 10 at 23:40












            up vote
            1
            down vote










            up vote
            1
            down vote









            In C you an open a file stream in text or binary mode. In binary mode, no translation takes place, and the input and output are the bytes in the file. In text mode, the C "newline" character is translated into what is common on the platform in question. One UNIX-like systems, this is a 0A byte, and on DOS-like systems this is a 0D byte followed by a 0A byte. There are other cases on other operating systems, listed here:



            https://en.wikipedia.org/wiki/Newline



            So that you don't have to cope with every different text format in every program, these all get translated into an n character as far as the C program sees in the default case (text mode). The input/output layer does the necessary translations for you.



            When you use fopen() to open a file stream in C for reading or writing, you provide a "file mode" parameter - you're probably using it here as "r" to read a file, or "w" to write one. If you want to newline translation done you can specify that the stream is opened in binary mode, with "rb" for reading or "wb" for writing.






            share|improve this answer














            In C you an open a file stream in text or binary mode. In binary mode, no translation takes place, and the input and output are the bytes in the file. In text mode, the C "newline" character is translated into what is common on the platform in question. One UNIX-like systems, this is a 0A byte, and on DOS-like systems this is a 0D byte followed by a 0A byte. There are other cases on other operating systems, listed here:



            https://en.wikipedia.org/wiki/Newline



            So that you don't have to cope with every different text format in every program, these all get translated into an n character as far as the C program sees in the default case (text mode). The input/output layer does the necessary translations for you.



            When you use fopen() to open a file stream in C for reading or writing, you provide a "file mode" parameter - you're probably using it here as "r" to read a file, or "w" to write one. If you want to newline translation done you can specify that the stream is opened in binary mode, with "rb" for reading or "wb" for writing.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 10 at 22:52

























            answered Nov 10 at 22:41









            Tim

            7,7912344




            7,7912344











            • "you don't have to cope with every different text format in every program, these all get translated into an n character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
              – chux
              Nov 10 at 23:40
















            • "you don't have to cope with every different text format in every program, these all get translated into an n character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
              – chux
              Nov 10 at 23:40















            "you don't have to cope with every different text format in every program, these all get translated into an n character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
            – chux
            Nov 10 at 23:40




            "you don't have to cope with every different text format in every program, these all get translated into an n character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
            – chux
            Nov 10 at 23:40

















             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244070%2ffgets-writes-different-strings-from-the-same-file-in-linux-and-windows%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Top Tejano songwriter Luis Silva dead of heart attack at 64

            政党

            天津地下鉄3号線