How to get the exact count of lines in a very large text file in R? [duplicate]

This question already has an answer here:

Get the number of lines in a text file using R

5 answers

I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?

asked Nov 13 '18 at 12:15

Daniel Gießing

marked as duplicate by hrbrmstr r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 12:38

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

– duckmayr
Nov 13 '18 at 12:20

works so far, I thought this one would fail at large documents

– Daniel Gießing
Nov 13 '18 at 12:26

add a comment |

This question already has an answer here:

Get the number of lines in a text file using R

5 answers

I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?

asked Nov 13 '18 at 12:15

Daniel Gießing

marked as duplicate by hrbrmstr r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 12:38

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

– duckmayr
Nov 13 '18 at 12:20

works so far, I thought this one would fail at large documents

– Daniel Gießing
Nov 13 '18 at 12:26

add a comment |

This question already has an answer here:

Get the number of lines in a text file using R

5 answers

I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?

asked Nov 13 '18 at 12:15

Daniel Gießing

This question already has an answer here:

Get the number of lines in a text file using R

5 answers

I have multiple files with over 1.000.000 lines each, but I need to know the exact number of lines for each document using R. How can I achieve that?

This question already has an answer here:

Get the number of lines in a text file using R

5 answers

asked Nov 13 '18 at 12:15

Daniel Gießing

asked Nov 13 '18 at 12:15

Daniel Gießing

asked Nov 13 '18 at 12:15

Daniel Gießing

asked Nov 13 '18 at 12:15

Daniel Gießing

asked Nov 13 '18 at 12:15

Daniel Gießing

marked as duplicate by hrbrmstr r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 12:38

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by hrbrmstr r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 13 '18 at 12:38

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

– duckmayr
Nov 13 '18 at 12:20

works so far, I thought this one would fail at large documents

– Daniel Gießing
Nov 13 '18 at 12:26

add a comment |

There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

– duckmayr
Nov 13 '18 at 12:20

works so far, I thought this one would fail at large documents

– Daniel Gießing
Nov 13 '18 at 12:26

There are several different ways you could go. I'm not sure what's most efficient with files of that size, but an easy solution would be something like length(readLines(filename))

– duckmayr
Nov 13 '18 at 12:20

works so far, I thought this one would fail at large documents

– Daniel Gießing
Nov 13 '18 at 12:26

add a comment |

1 Answer
1

active

oldest

votes

1) wc This should be quite fast. First determine the filenames. We have assumed all files in the current directory whose extension is .txt. Change as needed. Then for each file run wc -l and form a data frame from it.

(If you are on Windows then install Rtools and ensure that Rtoolsbin is on your PATH.)

filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))

2) count.fields An alternative approach is to use count.fields. This does not make use of any external commands. filenames is from above.

sapply(filenames, function(x) length(count.fields(x, sep = "1")))

edited Nov 13 '18 at 12:37

answered Nov 13 '18 at 12:29

G. Grothendieck

146k9126231

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

(If you are on Windows then install Rtools and ensure that Rtoolsbin is on your PATH.)

filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))

2) count.fields An alternative approach is to use count.fields. This does not make use of any external commands. filenames is from above.

sapply(filenames, function(x) length(count.fields(x, sep = "1")))

edited Nov 13 '18 at 12:37

answered Nov 13 '18 at 12:29

G. Grothendieck

146k9126231

add a comment |

(If you are on Windows then install Rtools and ensure that Rtoolsbin is on your PATH.)

filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))

2) count.fields An alternative approach is to use count.fields. This does not make use of any external commands. filenames is from above.

sapply(filenames, function(x) length(count.fields(x, sep = "1")))

edited Nov 13 '18 at 12:37

answered Nov 13 '18 at 12:29

G. Grothendieck

146k9126231

add a comment |

(If you are on Windows then install Rtools and ensure that Rtoolsbin is on your PATH.)

filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))

2) count.fields An alternative approach is to use count.fields. This does not make use of any external commands. filenames is from above.

sapply(filenames, function(x) length(count.fields(x, sep = "1")))

edited Nov 13 '18 at 12:37

answered Nov 13 '18 at 12:29

G. Grothendieck

146k9126231

(If you are on Windows then install Rtools and ensure that Rtoolsbin is on your PATH.)

filenames <- dir(pattern = "[.]txt$")
wc <- function(x) shell(paste("wc -l", x), intern = TRUE)
DF <- read.table(text = sapply(filenames, wc), col.names = c("count", "filename"))

2) count.fields An alternative approach is to use count.fields. This does not make use of any external commands. filenames is from above.

sapply(filenames, function(x) length(count.fields(x, sep = "1")))

edited Nov 13 '18 at 12:37

answered Nov 13 '18 at 12:29

G. Grothendieck

146k9126231

edited Nov 13 '18 at 12:37

answered Nov 13 '18 at 12:29

G. Grothendieck

146k9126231

answered Nov 13 '18 at 12:29

G. Grothendieck

146k9126231

answered Nov 13 '18 at 12:29

G. Grothendieck

146k9126231

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth