Fgets writes different strings from the same file in Linux and Windows
up vote
0
down vote
favorite
I've just come across an issue where I was jumping between valgrind in Linux and other testing in Windows cmd.
I'm reading a certain line from a file like this:
fgets(buf, MAX_LINE_LEN, f_input);
Of course, buf is the size MAX_LINE_LEN + 1, but I digress.
This is the output of
printf("String length: %u; Contents: ", strlen(buf));
for (usint i = 0; i < strlen(buf); i++)
printf("%x ", buf[i]);
puts(";");
in Windows:
String length: 14; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 a ;
String length: 22; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 a ;
String length: 25; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 a ;
String length: 24; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 a ;
String length: 21; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 a ;
String length: 15; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 a ;
String length: 17; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 a ;
and in Linux:
String length: 15; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 d a ;
String length: 23; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 d a ;
String length: 26; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 d a ;
String length: 25; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 d a ;
String length: 22; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 d a ;
String length: 16; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 d a ;
String length: 18; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 d a ;
As you can see in Linux, there is another character before the NL, a Carriage Return. If anyone can explain this and save me the pain of adding ifdef
statements for a Linux and Windows code, I'd appreciate it. I understand, that linux appends a Carriage Return after each line, but is this really the intended behaviour when it then gets read by fgets
?
c linux fgets
add a comment |
up vote
0
down vote
favorite
I've just come across an issue where I was jumping between valgrind in Linux and other testing in Windows cmd.
I'm reading a certain line from a file like this:
fgets(buf, MAX_LINE_LEN, f_input);
Of course, buf is the size MAX_LINE_LEN + 1, but I digress.
This is the output of
printf("String length: %u; Contents: ", strlen(buf));
for (usint i = 0; i < strlen(buf); i++)
printf("%x ", buf[i]);
puts(";");
in Windows:
String length: 14; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 a ;
String length: 22; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 a ;
String length: 25; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 a ;
String length: 24; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 a ;
String length: 21; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 a ;
String length: 15; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 a ;
String length: 17; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 a ;
and in Linux:
String length: 15; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 d a ;
String length: 23; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 d a ;
String length: 26; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 d a ;
String length: 25; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 d a ;
String length: 22; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 d a ;
String length: 16; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 d a ;
String length: 18; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 d a ;
As you can see in Linux, there is another character before the NL, a Carriage Return. If anyone can explain this and save me the pain of adding ifdef
statements for a Linux and Windows code, I'd appreciate it. I understand, that linux appends a Carriage Return after each line, but is this really the intended behaviour when it then gets read by fgets
?
c linux fgets
2
CRLF vs NL line endings. Windows uses two characters,'r'
and'n'
at the end of a line; Unix uses just'n'
. And on Windows, the I/O system maps the CRLF to'n'
only on input, but Linux doesn't because'r'
is just another control character to Unix. ('r'
typically maps to control-M or 0x0D;'n'
typically maps to control-J or 0x0A.)
– Jonathan Leffler
Nov 10 at 22:35
"Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
– Tim
Nov 10 at 22:36
Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
– Rodney
Nov 10 at 22:38
@Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part,buf
is actually the size of MAX_LINE_LEN.
– areuz
Nov 10 at 22:39
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I've just come across an issue where I was jumping between valgrind in Linux and other testing in Windows cmd.
I'm reading a certain line from a file like this:
fgets(buf, MAX_LINE_LEN, f_input);
Of course, buf is the size MAX_LINE_LEN + 1, but I digress.
This is the output of
printf("String length: %u; Contents: ", strlen(buf));
for (usint i = 0; i < strlen(buf); i++)
printf("%x ", buf[i]);
puts(";");
in Windows:
String length: 14; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 a ;
String length: 22; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 a ;
String length: 25; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 a ;
String length: 24; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 a ;
String length: 21; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 a ;
String length: 15; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 a ;
String length: 17; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 a ;
and in Linux:
String length: 15; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 d a ;
String length: 23; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 d a ;
String length: 26; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 d a ;
String length: 25; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 d a ;
String length: 22; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 d a ;
String length: 16; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 d a ;
String length: 18; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 d a ;
As you can see in Linux, there is another character before the NL, a Carriage Return. If anyone can explain this and save me the pain of adding ifdef
statements for a Linux and Windows code, I'd appreciate it. I understand, that linux appends a Carriage Return after each line, but is this really the intended behaviour when it then gets read by fgets
?
c linux fgets
I've just come across an issue where I was jumping between valgrind in Linux and other testing in Windows cmd.
I'm reading a certain line from a file like this:
fgets(buf, MAX_LINE_LEN, f_input);
Of course, buf is the size MAX_LINE_LEN + 1, but I digress.
This is the output of
printf("String length: %u; Contents: ", strlen(buf));
for (usint i = 0; i < strlen(buf); i++)
printf("%x ", buf[i]);
puts(";");
in Windows:
String length: 14; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 a ;
String length: 22; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 a ;
String length: 25; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 a ;
String length: 24; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 a ;
String length: 21; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 a ;
String length: 15; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 a ;
String length: 17; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 a ;
and in Linux:
String length: 15; Contents: 41 6e 64 72 65 6a 20 50 6c 61 76 6b 61 d a ;
String length: 23; Contents: 41 6e 6e 61 20 4d 61 72 69 61 20 43 69 63 6d 61 6e 63 6f 76 61 d a ;
String length: 26; Contents: 4d 61 72 69 61 20 52 61 7a 75 73 6f 76 61 20 4d 61 72 74 61 6b 6f 76 61 d a ;
String length: 25; Contents: 4d 69 6c 61 6e 20 52 61 73 74 69 73 6c 61 76 20 50 6f 6b 6f 6a 6e 79 d a ;
String length: 22; Contents: 4d 69 6c 65 6e 61 20 53 65 64 6d 69 6b 72 61 73 6b 6f 76 61 d a ;
String length: 16; Contents: 56 69 6e 63 65 6e 74 20 53 69 6b 75 6c 61 d a ;
String length: 18; Contents: 56 69 6e 63 65 6e 74 20 76 61 6e 20 47 6f 67 68 d a ;
As you can see in Linux, there is another character before the NL, a Carriage Return. If anyone can explain this and save me the pain of adding ifdef
statements for a Linux and Windows code, I'd appreciate it. I understand, that linux appends a Carriage Return after each line, but is this really the intended behaviour when it then gets read by fgets
?
c linux fgets
c linux fgets
edited Nov 10 at 22:52
asked Nov 10 at 22:30
areuz
322211
322211
2
CRLF vs NL line endings. Windows uses two characters,'r'
and'n'
at the end of a line; Unix uses just'n'
. And on Windows, the I/O system maps the CRLF to'n'
only on input, but Linux doesn't because'r'
is just another control character to Unix. ('r'
typically maps to control-M or 0x0D;'n'
typically maps to control-J or 0x0A.)
– Jonathan Leffler
Nov 10 at 22:35
"Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
– Tim
Nov 10 at 22:36
Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
– Rodney
Nov 10 at 22:38
@Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part,buf
is actually the size of MAX_LINE_LEN.
– areuz
Nov 10 at 22:39
add a comment |
2
CRLF vs NL line endings. Windows uses two characters,'r'
and'n'
at the end of a line; Unix uses just'n'
. And on Windows, the I/O system maps the CRLF to'n'
only on input, but Linux doesn't because'r'
is just another control character to Unix. ('r'
typically maps to control-M or 0x0D;'n'
typically maps to control-J or 0x0A.)
– Jonathan Leffler
Nov 10 at 22:35
"Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
– Tim
Nov 10 at 22:36
Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
– Rodney
Nov 10 at 22:38
@Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part,buf
is actually the size of MAX_LINE_LEN.
– areuz
Nov 10 at 22:39
2
2
CRLF vs NL line endings. Windows uses two characters,
'r'
and 'n'
at the end of a line; Unix uses just 'n'
. And on Windows, the I/O system maps the CRLF to 'n'
only on input, but Linux doesn't because 'r'
is just another control character to Unix. ('r'
typically maps to control-M or 0x0D; 'n'
typically maps to control-J or 0x0A.)– Jonathan Leffler
Nov 10 at 22:35
CRLF vs NL line endings. Windows uses two characters,
'r'
and 'n'
at the end of a line; Unix uses just 'n'
. And on Windows, the I/O system maps the CRLF to 'n'
only on input, but Linux doesn't because 'r'
is just another control character to Unix. ('r'
typically maps to control-M or 0x0D; 'n'
typically maps to control-J or 0x0A.)– Jonathan Leffler
Nov 10 at 22:35
"Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
– Tim
Nov 10 at 22:36
"Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
– Tim
Nov 10 at 22:36
Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
– Rodney
Nov 10 at 22:38
Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
– Rodney
Nov 10 at 22:38
@Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part,
buf
is actually the size of MAX_LINE_LEN.– areuz
Nov 10 at 22:39
@Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part,
buf
is actually the size of MAX_LINE_LEN.– areuz
Nov 10 at 22:39
add a comment |
3 Answers
3
active
oldest
votes
up vote
2
down vote
As you can see in Linux, there is another character before the NL, a Carriage Return.
That is because your files use CR+LF newlines, i.e. each newline is actually two characters: "rn"
.
If you open files without the "b"
flag in Windows, its C library will convert each n
you write to rn
, and each rn
you read to n
.
Use the "b"
fopen() flag in Windows to see the actual file contents.
When you read a line using fgets(buf, sizeof buf, handle)
, you can use buf[strcspn(buf, "rn")] = '';
to remove the newline.
I like the use of theb flag
the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
– areuz
Nov 10 at 22:47
1
@areuz: You can also uselen = strcspn(buf, "rn");
to obtain the length of the line excluding the newline, instead oflen = strlen(buf);
, when copying. If you use a temporary char pointerchar *p = fgets(buf, sizeof buf, f_input);
, you can skip leading whitespace usingp += strspn(p, "tnvfr ");
and find the length of the rest of the line excluding newline usinglen = strcspn(p, "rn");
. There is nostrrcspn()
, so to remove trailing whitespace, you need e.g.while (len > 1 && isspace(p[len-1])) len--;
. Then, copy justlen
chars starting atp
.
– Nominal Animal
Nov 10 at 22:53
add a comment |
up vote
2
down vote
MS and Linux has a different expectation of a text file line ending:"rn"
vs "n"
.
To cope, recommend after fgets()
use strcspn()
to lop off the potential end of line sequence, be it "n"
, "rn"
or missing.
fgets(buf, MAX_LINE_LEN, f_input);
buf[strcspn(buf, "nr")] = '';
Some compilers on Windows will use "n"
as the end-of-line sequence and others use "rn"
. So I attribute the variation to compilers and their manufacturers more so than the OS. Also some old MAC text files end with 'r'
and will foul fgets()
on Linux.
Further: reading a file that has "rn"
as a text file that expects "n"
as the end-of-line sequence has a problem when reading a full buffer as "......r"
and the line remainder as "n"
on the next fgets()
. Additional processing is needed to cope as is the case whenever the buffer is insufficient for a line of input.
Text files of one variation are often copied to the other platforms, so this is a not-so-rare occurrence.
Due to editing, some text files will have a mixture of line-ending-sequences.
Pedantic code will read the file as binary and process variant line endings itself without fgets()
. Good luck.
add a comment |
up vote
1
down vote
In C you an open a file stream in text or binary mode. In binary mode, no translation takes place, and the input and output are the bytes in the file. In text mode, the C "newline" character is translated into what is common on the platform in question. One UNIX-like systems, this is a 0A
byte, and on DOS-like systems this is a 0D
byte followed by a 0A
byte. There are other cases on other operating systems, listed here:
https://en.wikipedia.org/wiki/Newline
So that you don't have to cope with every different text format in every program, these all get translated into an n
character as far as the C program sees in the default case (text mode). The input/output layer does the necessary translations for you.
When you use fopen()
to open a file stream in C for reading or writing, you provide a "file mode" parameter - you're probably using it here as "r"
to read a file, or "w"
to write one. If you want to newline translation done you can specify that the stream is opened in binary mode, with "rb"
for reading or "wb"
for writing.
"you don't have to cope with every different text format in every program, these all get translated into ann
character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
– chux
Nov 10 at 23:40
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
As you can see in Linux, there is another character before the NL, a Carriage Return.
That is because your files use CR+LF newlines, i.e. each newline is actually two characters: "rn"
.
If you open files without the "b"
flag in Windows, its C library will convert each n
you write to rn
, and each rn
you read to n
.
Use the "b"
fopen() flag in Windows to see the actual file contents.
When you read a line using fgets(buf, sizeof buf, handle)
, you can use buf[strcspn(buf, "rn")] = '';
to remove the newline.
I like the use of theb flag
the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
– areuz
Nov 10 at 22:47
1
@areuz: You can also uselen = strcspn(buf, "rn");
to obtain the length of the line excluding the newline, instead oflen = strlen(buf);
, when copying. If you use a temporary char pointerchar *p = fgets(buf, sizeof buf, f_input);
, you can skip leading whitespace usingp += strspn(p, "tnvfr ");
and find the length of the rest of the line excluding newline usinglen = strcspn(p, "rn");
. There is nostrrcspn()
, so to remove trailing whitespace, you need e.g.while (len > 1 && isspace(p[len-1])) len--;
. Then, copy justlen
chars starting atp
.
– Nominal Animal
Nov 10 at 22:53
add a comment |
up vote
2
down vote
As you can see in Linux, there is another character before the NL, a Carriage Return.
That is because your files use CR+LF newlines, i.e. each newline is actually two characters: "rn"
.
If you open files without the "b"
flag in Windows, its C library will convert each n
you write to rn
, and each rn
you read to n
.
Use the "b"
fopen() flag in Windows to see the actual file contents.
When you read a line using fgets(buf, sizeof buf, handle)
, you can use buf[strcspn(buf, "rn")] = '';
to remove the newline.
I like the use of theb flag
the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
– areuz
Nov 10 at 22:47
1
@areuz: You can also uselen = strcspn(buf, "rn");
to obtain the length of the line excluding the newline, instead oflen = strlen(buf);
, when copying. If you use a temporary char pointerchar *p = fgets(buf, sizeof buf, f_input);
, you can skip leading whitespace usingp += strspn(p, "tnvfr ");
and find the length of the rest of the line excluding newline usinglen = strcspn(p, "rn");
. There is nostrrcspn()
, so to remove trailing whitespace, you need e.g.while (len > 1 && isspace(p[len-1])) len--;
. Then, copy justlen
chars starting atp
.
– Nominal Animal
Nov 10 at 22:53
add a comment |
up vote
2
down vote
up vote
2
down vote
As you can see in Linux, there is another character before the NL, a Carriage Return.
That is because your files use CR+LF newlines, i.e. each newline is actually two characters: "rn"
.
If you open files without the "b"
flag in Windows, its C library will convert each n
you write to rn
, and each rn
you read to n
.
Use the "b"
fopen() flag in Windows to see the actual file contents.
When you read a line using fgets(buf, sizeof buf, handle)
, you can use buf[strcspn(buf, "rn")] = '';
to remove the newline.
As you can see in Linux, there is another character before the NL, a Carriage Return.
That is because your files use CR+LF newlines, i.e. each newline is actually two characters: "rn"
.
If you open files without the "b"
flag in Windows, its C library will convert each n
you write to rn
, and each rn
you read to n
.
Use the "b"
fopen() flag in Windows to see the actual file contents.
When you read a line using fgets(buf, sizeof buf, handle)
, you can use buf[strcspn(buf, "rn")] = '';
to remove the newline.
answered Nov 10 at 22:39
Nominal Animal
27.9k33259
27.9k33259
I like the use of theb flag
the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
– areuz
Nov 10 at 22:47
1
@areuz: You can also uselen = strcspn(buf, "rn");
to obtain the length of the line excluding the newline, instead oflen = strlen(buf);
, when copying. If you use a temporary char pointerchar *p = fgets(buf, sizeof buf, f_input);
, you can skip leading whitespace usingp += strspn(p, "tnvfr ");
and find the length of the rest of the line excluding newline usinglen = strcspn(p, "rn");
. There is nostrrcspn()
, so to remove trailing whitespace, you need e.g.while (len > 1 && isspace(p[len-1])) len--;
. Then, copy justlen
chars starting atp
.
– Nominal Animal
Nov 10 at 22:53
add a comment |
I like the use of theb flag
the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.
– areuz
Nov 10 at 22:47
1
@areuz: You can also uselen = strcspn(buf, "rn");
to obtain the length of the line excluding the newline, instead oflen = strlen(buf);
, when copying. If you use a temporary char pointerchar *p = fgets(buf, sizeof buf, f_input);
, you can skip leading whitespace usingp += strspn(p, "tnvfr ");
and find the length of the rest of the line excluding newline usinglen = strcspn(p, "rn");
. There is nostrrcspn()
, so to remove trailing whitespace, you need e.g.while (len > 1 && isspace(p[len-1])) len--;
. Then, copy justlen
chars starting atp
.
– Nominal Animal
Nov 10 at 22:53
I like the use of the
b flag
the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.– areuz
Nov 10 at 22:47
I like the use of the
b flag
the most, as it removes the difference between the two platforms. I later remove the newline while copying the string anyways, so this allows me to remove both characters and now works in both windows and linux. Thanks.– areuz
Nov 10 at 22:47
1
1
@areuz: You can also use
len = strcspn(buf, "rn");
to obtain the length of the line excluding the newline, instead of len = strlen(buf);
, when copying. If you use a temporary char pointer char *p = fgets(buf, sizeof buf, f_input);
, you can skip leading whitespace using p += strspn(p, "tnvfr ");
and find the length of the rest of the line excluding newline using len = strcspn(p, "rn");
. There is no strrcspn()
, so to remove trailing whitespace, you need e.g. while (len > 1 && isspace(p[len-1])) len--;
. Then, copy just len
chars starting at p
.– Nominal Animal
Nov 10 at 22:53
@areuz: You can also use
len = strcspn(buf, "rn");
to obtain the length of the line excluding the newline, instead of len = strlen(buf);
, when copying. If you use a temporary char pointer char *p = fgets(buf, sizeof buf, f_input);
, you can skip leading whitespace using p += strspn(p, "tnvfr ");
and find the length of the rest of the line excluding newline using len = strcspn(p, "rn");
. There is no strrcspn()
, so to remove trailing whitespace, you need e.g. while (len > 1 && isspace(p[len-1])) len--;
. Then, copy just len
chars starting at p
.– Nominal Animal
Nov 10 at 22:53
add a comment |
up vote
2
down vote
MS and Linux has a different expectation of a text file line ending:"rn"
vs "n"
.
To cope, recommend after fgets()
use strcspn()
to lop off the potential end of line sequence, be it "n"
, "rn"
or missing.
fgets(buf, MAX_LINE_LEN, f_input);
buf[strcspn(buf, "nr")] = '';
Some compilers on Windows will use "n"
as the end-of-line sequence and others use "rn"
. So I attribute the variation to compilers and their manufacturers more so than the OS. Also some old MAC text files end with 'r'
and will foul fgets()
on Linux.
Further: reading a file that has "rn"
as a text file that expects "n"
as the end-of-line sequence has a problem when reading a full buffer as "......r"
and the line remainder as "n"
on the next fgets()
. Additional processing is needed to cope as is the case whenever the buffer is insufficient for a line of input.
Text files of one variation are often copied to the other platforms, so this is a not-so-rare occurrence.
Due to editing, some text files will have a mixture of line-ending-sequences.
Pedantic code will read the file as binary and process variant line endings itself without fgets()
. Good luck.
add a comment |
up vote
2
down vote
MS and Linux has a different expectation of a text file line ending:"rn"
vs "n"
.
To cope, recommend after fgets()
use strcspn()
to lop off the potential end of line sequence, be it "n"
, "rn"
or missing.
fgets(buf, MAX_LINE_LEN, f_input);
buf[strcspn(buf, "nr")] = '';
Some compilers on Windows will use "n"
as the end-of-line sequence and others use "rn"
. So I attribute the variation to compilers and their manufacturers more so than the OS. Also some old MAC text files end with 'r'
and will foul fgets()
on Linux.
Further: reading a file that has "rn"
as a text file that expects "n"
as the end-of-line sequence has a problem when reading a full buffer as "......r"
and the line remainder as "n"
on the next fgets()
. Additional processing is needed to cope as is the case whenever the buffer is insufficient for a line of input.
Text files of one variation are often copied to the other platforms, so this is a not-so-rare occurrence.
Due to editing, some text files will have a mixture of line-ending-sequences.
Pedantic code will read the file as binary and process variant line endings itself without fgets()
. Good luck.
add a comment |
up vote
2
down vote
up vote
2
down vote
MS and Linux has a different expectation of a text file line ending:"rn"
vs "n"
.
To cope, recommend after fgets()
use strcspn()
to lop off the potential end of line sequence, be it "n"
, "rn"
or missing.
fgets(buf, MAX_LINE_LEN, f_input);
buf[strcspn(buf, "nr")] = '';
Some compilers on Windows will use "n"
as the end-of-line sequence and others use "rn"
. So I attribute the variation to compilers and their manufacturers more so than the OS. Also some old MAC text files end with 'r'
and will foul fgets()
on Linux.
Further: reading a file that has "rn"
as a text file that expects "n"
as the end-of-line sequence has a problem when reading a full buffer as "......r"
and the line remainder as "n"
on the next fgets()
. Additional processing is needed to cope as is the case whenever the buffer is insufficient for a line of input.
Text files of one variation are often copied to the other platforms, so this is a not-so-rare occurrence.
Due to editing, some text files will have a mixture of line-ending-sequences.
Pedantic code will read the file as binary and process variant line endings itself without fgets()
. Good luck.
MS and Linux has a different expectation of a text file line ending:"rn"
vs "n"
.
To cope, recommend after fgets()
use strcspn()
to lop off the potential end of line sequence, be it "n"
, "rn"
or missing.
fgets(buf, MAX_LINE_LEN, f_input);
buf[strcspn(buf, "nr")] = '';
Some compilers on Windows will use "n"
as the end-of-line sequence and others use "rn"
. So I attribute the variation to compilers and their manufacturers more so than the OS. Also some old MAC text files end with 'r'
and will foul fgets()
on Linux.
Further: reading a file that has "rn"
as a text file that expects "n"
as the end-of-line sequence has a problem when reading a full buffer as "......r"
and the line remainder as "n"
on the next fgets()
. Additional processing is needed to cope as is the case whenever the buffer is insufficient for a line of input.
Text files of one variation are often copied to the other platforms, so this is a not-so-rare occurrence.
Due to editing, some text files will have a mixture of line-ending-sequences.
Pedantic code will read the file as binary and process variant line endings itself without fgets()
. Good luck.
edited Nov 10 at 23:32
answered Nov 10 at 22:38
chux
78.6k869144
78.6k869144
add a comment |
add a comment |
up vote
1
down vote
In C you an open a file stream in text or binary mode. In binary mode, no translation takes place, and the input and output are the bytes in the file. In text mode, the C "newline" character is translated into what is common on the platform in question. One UNIX-like systems, this is a 0A
byte, and on DOS-like systems this is a 0D
byte followed by a 0A
byte. There are other cases on other operating systems, listed here:
https://en.wikipedia.org/wiki/Newline
So that you don't have to cope with every different text format in every program, these all get translated into an n
character as far as the C program sees in the default case (text mode). The input/output layer does the necessary translations for you.
When you use fopen()
to open a file stream in C for reading or writing, you provide a "file mode" parameter - you're probably using it here as "r"
to read a file, or "w"
to write one. If you want to newline translation done you can specify that the stream is opened in binary mode, with "rb"
for reading or "wb"
for writing.
"you don't have to cope with every different text format in every program, these all get translated into ann
character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
– chux
Nov 10 at 23:40
add a comment |
up vote
1
down vote
In C you an open a file stream in text or binary mode. In binary mode, no translation takes place, and the input and output are the bytes in the file. In text mode, the C "newline" character is translated into what is common on the platform in question. One UNIX-like systems, this is a 0A
byte, and on DOS-like systems this is a 0D
byte followed by a 0A
byte. There are other cases on other operating systems, listed here:
https://en.wikipedia.org/wiki/Newline
So that you don't have to cope with every different text format in every program, these all get translated into an n
character as far as the C program sees in the default case (text mode). The input/output layer does the necessary translations for you.
When you use fopen()
to open a file stream in C for reading or writing, you provide a "file mode" parameter - you're probably using it here as "r"
to read a file, or "w"
to write one. If you want to newline translation done you can specify that the stream is opened in binary mode, with "rb"
for reading or "wb"
for writing.
"you don't have to cope with every different text format in every program, these all get translated into ann
character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
– chux
Nov 10 at 23:40
add a comment |
up vote
1
down vote
up vote
1
down vote
In C you an open a file stream in text or binary mode. In binary mode, no translation takes place, and the input and output are the bytes in the file. In text mode, the C "newline" character is translated into what is common on the platform in question. One UNIX-like systems, this is a 0A
byte, and on DOS-like systems this is a 0D
byte followed by a 0A
byte. There are other cases on other operating systems, listed here:
https://en.wikipedia.org/wiki/Newline
So that you don't have to cope with every different text format in every program, these all get translated into an n
character as far as the C program sees in the default case (text mode). The input/output layer does the necessary translations for you.
When you use fopen()
to open a file stream in C for reading or writing, you provide a "file mode" parameter - you're probably using it here as "r"
to read a file, or "w"
to write one. If you want to newline translation done you can specify that the stream is opened in binary mode, with "rb"
for reading or "wb"
for writing.
In C you an open a file stream in text or binary mode. In binary mode, no translation takes place, and the input and output are the bytes in the file. In text mode, the C "newline" character is translated into what is common on the platform in question. One UNIX-like systems, this is a 0A
byte, and on DOS-like systems this is a 0D
byte followed by a 0A
byte. There are other cases on other operating systems, listed here:
https://en.wikipedia.org/wiki/Newline
So that you don't have to cope with every different text format in every program, these all get translated into an n
character as far as the C program sees in the default case (text mode). The input/output layer does the necessary translations for you.
When you use fopen()
to open a file stream in C for reading or writing, you provide a "file mode" parameter - you're probably using it here as "r"
to read a file, or "w"
to write one. If you want to newline translation done you can specify that the stream is opened in binary mode, with "rb"
for reading or "wb"
for writing.
edited Nov 10 at 22:52
answered Nov 10 at 22:41
Tim
7,7912344
7,7912344
"you don't have to cope with every different text format in every program, these all get translated into ann
character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
– chux
Nov 10 at 23:40
add a comment |
"you don't have to cope with every different text format in every program, these all get translated into ann
character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.
– chux
Nov 10 at 23:40
"you don't have to cope with every different text format in every program, these all get translated into an
n
character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.– chux
Nov 10 at 23:40
"you don't have to cope with every different text format in every program, these all get translated into an
n
character" is true when reading a text file native to that C program. The trick is when reading in text mode of a file that originated as some other system's text file.– chux
Nov 10 at 23:40
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244070%2ffgets-writes-different-strings-from-the-same-file-in-linux-and-windows%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
CRLF vs NL line endings. Windows uses two characters,
'r'
and'n'
at the end of a line; Unix uses just'n'
. And on Windows, the I/O system maps the CRLF to'n'
only on input, but Linux doesn't because'r'
is just another control character to Unix. ('r'
typically maps to control-M or 0x0D;'n'
typically maps to control-J or 0x0A.)– Jonathan Leffler
Nov 10 at 22:35
"Of course, buf is the size MAX_LINE_LEN + 1" Not needed: the naximum number of characters read into the buffer is one less than the size you specify, and the line is NUL-terminated. man7.org/linux/man-pages/man3/fgets.3p.html
– Tim
Nov 10 at 22:36
Guessing that not that Linux is adding a CR, but that the CR is in the file data, which to LInux looks like two separate characters, to Windows it's one line-ending character, not sure why fgets represents the way it does though. Can you check the actual file contents
– Rodney
Nov 10 at 22:38
@Tim Oh yeah, fgets reserves one byte for null, I guess that was a mistype on my part,
buf
is actually the size of MAX_LINE_LEN.– areuz
Nov 10 at 22:39