Looking for advice on processing variable length text files

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.

The table included in each segment should start 1 line below "Group:" and stop either 2 lines above "~End" if the group is a control or 6 lines above "~End" if if the group is a standard. The lengths of the tables themselves will be variable and can be empty like the segment "SpikedControl".

and example file looks like this:

Group: Controls 1
Sample Wells Values MeanValue CV Od-bkgd-blank
Anti-Hu Det A11 2.849 2.855 0.282 2.853
 A23 2.860 
Coat Control A12 0.161 0.160 0.530 0.159
 A24 0.160 
Diluent Standard 1 A9 0.114 0.113 1.379 0.104
 A21 0.112 
Diluent Standard 2 A8 0.012 0.013 2.817 0.012
 A20 0.013 

~End
Group: SpikedControl 1
Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected

~End
Group: Standards 1
Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
 B13 2.882 2.875 1153.779 57.689
St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
 B14 2.855 2.847 670.358 100.554
St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
 B15 2.709 2.702 237.852 107.033
St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
 B16 2.258 2.248 77.452 104.560
St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
 B17 1.433 1.424 25.153 101.868
St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
 B18 0.697 0.686 8.240 100.115
St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
 B19 0.329 0.320 2.759 100.579
St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
 B20 0.188 0.181 0.895 97.891
St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
 B21 0.146 0.137 0.296 97.190
St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
 B22 0.121 0.114 Range? Range?
St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
 B23 0.125 0.117 Range? Range?
St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
 B24 0.115 0.104 Range? Range?

EC50 = 28.085

AUC = 5565.432

~End

I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.

Thanks!

Edit - Link to example file:

https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0

PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.

Edit 2 - Making some progress:

Read in file and get start and end lines for each segment

inputtext <- readLines("ExampleFile.txt")

starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)

which I can follow up with

test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)

now I am just trying to figure out how to accurately identify the number of rows to read.

So if the start row is 17, and the end row is 48. Then nrows needs to 24 which is 48 (end row indicated) - 17 (the first rows that are skipped) - 7 (to account for the header line and lines of fluff on the end of the table, which could also be 4 if it is a control table)

Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

add a comment |

I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.

and example file looks like this:

Group: Controls 1
Sample Wells Values MeanValue CV Od-bkgd-blank
Anti-Hu Det A11 2.849 2.855 0.282 2.853
 A23 2.860 
Coat Control A12 0.161 0.160 0.530 0.159
 A24 0.160 
Diluent Standard 1 A9 0.114 0.113 1.379 0.104
 A21 0.112 
Diluent Standard 2 A8 0.012 0.013 2.817 0.012
 A20 0.013 

~End
Group: SpikedControl 1
Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected

~End
Group: Standards 1
Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
 B13 2.882 2.875 1153.779 57.689
St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
 B14 2.855 2.847 670.358 100.554
St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
 B15 2.709 2.702 237.852 107.033
St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
 B16 2.258 2.248 77.452 104.560
St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
 B17 1.433 1.424 25.153 101.868
St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
 B18 0.697 0.686 8.240 100.115
St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
 B19 0.329 0.320 2.759 100.579
St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
 B20 0.188 0.181 0.895 97.891
St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
 B21 0.146 0.137 0.296 97.190
St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
 B22 0.121 0.114 Range? Range?
St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
 B23 0.125 0.117 Range? Range?
St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
 B24 0.115 0.104 Range? Range?

EC50 = 28.085

AUC = 5565.432

~End

I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.

Thanks!

Edit - Link to example file:

https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0

PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.

Edit 2 - Making some progress:

Read in file and get start and end lines for each segment

inputtext <- readLines("ExampleFile.txt")

starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)

which I can follow up with

test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)

now I am just trying to figure out how to accurately identify the number of rows to read.

Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

add a comment |

I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.

and example file looks like this:

Group: Controls 1
Sample Wells Values MeanValue CV Od-bkgd-blank
Anti-Hu Det A11 2.849 2.855 0.282 2.853
 A23 2.860 
Coat Control A12 0.161 0.160 0.530 0.159
 A24 0.160 
Diluent Standard 1 A9 0.114 0.113 1.379 0.104
 A21 0.112 
Diluent Standard 2 A8 0.012 0.013 2.817 0.012
 A20 0.013 

~End
Group: SpikedControl 1
Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected

~End
Group: Standards 1
Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
 B13 2.882 2.875 1153.779 57.689
St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
 B14 2.855 2.847 670.358 100.554
St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
 B15 2.709 2.702 237.852 107.033
St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
 B16 2.258 2.248 77.452 104.560
St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
 B17 1.433 1.424 25.153 101.868
St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
 B18 0.697 0.686 8.240 100.115
St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
 B19 0.329 0.320 2.759 100.579
St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
 B20 0.188 0.181 0.895 97.891
St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
 B21 0.146 0.137 0.296 97.190
St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
 B22 0.121 0.114 Range? Range?
St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
 B23 0.125 0.117 Range? Range?
St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
 B24 0.115 0.104 Range? Range?

EC50 = 28.085

AUC = 5565.432

~End

I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.

Thanks!

Edit - Link to example file:

https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0

PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.

Edit 2 - Making some progress:

Read in file and get start and end lines for each segment

inputtext <- readLines("ExampleFile.txt")

starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)

which I can follow up with

test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)

now I am just trying to figure out how to accurately identify the number of rows to read.

Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.

and example file looks like this:

Group: Controls 1
Sample Wells Values MeanValue CV Od-bkgd-blank
Anti-Hu Det A11 2.849 2.855 0.282 2.853
 A23 2.860 
Coat Control A12 0.161 0.160 0.530 0.159
 A24 0.160 
Diluent Standard 1 A9 0.114 0.113 1.379 0.104
 A21 0.112 
Diluent Standard 2 A8 0.012 0.013 2.817 0.012
 A20 0.013 

~End
Group: SpikedControl 1
Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected

~End
Group: Standards 1
Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
 B13 2.882 2.875 1153.779 57.689
St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
 B14 2.855 2.847 670.358 100.554
St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
 B15 2.709 2.702 237.852 107.033
St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
 B16 2.258 2.248 77.452 104.560
St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
 B17 1.433 1.424 25.153 101.868
St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
 B18 0.697 0.686 8.240 100.115
St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
 B19 0.329 0.320 2.759 100.579
St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
 B20 0.188 0.181 0.895 97.891
St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
 B21 0.146 0.137 0.296 97.190
St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
 B22 0.121 0.114 Range? Range?
St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
 B23 0.125 0.117 Range? Range?
St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
 B24 0.115 0.104 Range? Range?

EC50 = 28.085

AUC = 5565.432

~End

I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.

Thanks!

Edit - Link to example file:

https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0

PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.

Edit 2 - Making some progress:

Read in file and get start and end lines for each segment

inputtext <- readLines("ExampleFile.txt")

starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)

which I can follow up with

test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)

now I am just trying to figure out how to accurately identify the number of rows to read.

Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

edited Nov 16 '18 at 17:46

asked Nov 16 '18 at 13:42

RevDev

1319

asked Nov 16 '18 at 13:42

RevDev

1319

asked Nov 16 '18 at 13:42

RevDev

1319

add a comment |

2 Answers
2

active

oldest

votes

I ended up doing the following:

library(tidyverse)

inputtext <- readLines("Test.txt")

starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)

realend <- (ends - starts) - 2

dff <- list()

for (i in 1:4) 

 dff[[i]] <- read.table("Test.txt",
 header = T,
 sep = "t",
 skip = starts[i],
 nrows = realend[i],
 blank.lines.skip = F,
 row.names = NULL)



dff <- lapply(dff, function(x) x[!is.na(x$Values),])

dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]

names(dff) <- letters[1:length(dff)]

list2env(dff,.GlobalEnv)

answered Nov 16 '18 at 19:13

RevDev

1319

add a comment |

You can use a nested data frame for that purpose. I detailed the method in 3 steps below.

library(tidyverse)
inputtext <- readLines("~/downloads/ExampleFile.txt")

1. Read data in a single data frame, create a column with the group name

dtf1 <- data_frame(input = inputtext) %>% 
 separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>% 
 mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>% 
 fill(group) %>% 
 filter(!is.na(x4)) 
head(dtf1)

2. Nest the data frame

dtf2 <- dtf1 %>% 
 group_by(group) %>% 
 nest() 
dtf2$data[[1]]

Give column names from the data to the first nested data frame

colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
colnames(dtf2$data[[1]]) <- colnames1

3. Give column names from the data to each sub data frame

 dtf3 <- dtf2 %>% 
 mutate(names = map(data, slice, 1),
 names = map(names, unlist),
 names = map(names,
 function(x) # Replace NA column names by the default x_ names
 x[is.na(x)] <- names(x[is.na(x)])
 return(x)
 ),
 data = map2(data, names, setNames), 
 data = map(data, slice, -1))

You now have a list of data frames. You can use the group name to call the corresponding data frame:

 > dtf3$data[dtf3$group=="Controls"][[1]]
# A tibble: 8 x 9
 Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9 
 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA 
2 "" A23 2.860 "" "" "" NA NA NA 
3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA 
4 "" A24 0.160 "" "" "" NA NA NA 
5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA 
6 "" A21 0.112 "" "" "" NA NA NA 
7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA 
8 "" A20 0.013 "" "" "" NA NA NA 
> dtf3$data[dtf3$group=="Standards"][[1]]
# A tibble: 24 x 9
 Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
 1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range? 
 2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689 
 3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465 
 4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554 
 5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982 
 6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033 
 7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449 
 8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560 
 9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397 
10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868 
# ... with 14 more rows
>

Note: rename based on this answer.

Placed source data as a gist in case it gets lost:
spectrophotometer.txt

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,37612458

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339042%2flooking-for-advice-on-processing-variable-length-text-files%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I ended up doing the following:

library(tidyverse)

inputtext <- readLines("Test.txt")

starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)

realend <- (ends - starts) - 2

dff <- list()

for (i in 1:4) 

 dff[[i]] <- read.table("Test.txt",
 header = T,
 sep = "t",
 skip = starts[i],
 nrows = realend[i],
 blank.lines.skip = F,
 row.names = NULL)



dff <- lapply(dff, function(x) x[!is.na(x$Values),])

dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]

names(dff) <- letters[1:length(dff)]

list2env(dff,.GlobalEnv)

answered Nov 16 '18 at 19:13

RevDev

1319

add a comment |

I ended up doing the following:

library(tidyverse)

inputtext <- readLines("Test.txt")

starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)

realend <- (ends - starts) - 2

dff <- list()

for (i in 1:4) 

 dff[[i]] <- read.table("Test.txt",
 header = T,
 sep = "t",
 skip = starts[i],
 nrows = realend[i],
 blank.lines.skip = F,
 row.names = NULL)



dff <- lapply(dff, function(x) x[!is.na(x$Values),])

dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]

names(dff) <- letters[1:length(dff)]

list2env(dff,.GlobalEnv)

answered Nov 16 '18 at 19:13

RevDev

1319

add a comment |

I ended up doing the following:

library(tidyverse)

inputtext <- readLines("Test.txt")

starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)

realend <- (ends - starts) - 2

dff <- list()

for (i in 1:4) 

 dff[[i]] <- read.table("Test.txt",
 header = T,
 sep = "t",
 skip = starts[i],
 nrows = realend[i],
 blank.lines.skip = F,
 row.names = NULL)



dff <- lapply(dff, function(x) x[!is.na(x$Values),])

dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]

names(dff) <- letters[1:length(dff)]

list2env(dff,.GlobalEnv)

answered Nov 16 '18 at 19:13

RevDev

1319

I ended up doing the following:

library(tidyverse)

inputtext <- readLines("Test.txt")

starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)

realend <- (ends - starts) - 2

dff <- list()

for (i in 1:4) 

 dff[[i]] <- read.table("Test.txt",
 header = T,
 sep = "t",
 skip = starts[i],
 nrows = realend[i],
 blank.lines.skip = F,
 row.names = NULL)



dff <- lapply(dff, function(x) x[!is.na(x$Values),])

dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]

names(dff) <- letters[1:length(dff)]

list2env(dff,.GlobalEnv)

answered Nov 16 '18 at 19:13

RevDev

1319

answered Nov 16 '18 at 19:13

RevDev

1319

answered Nov 16 '18 at 19:13

RevDev

1319

answered Nov 16 '18 at 19:13

RevDev

1319

add a comment |

You can use a nested data frame for that purpose. I detailed the method in 3 steps below.

library(tidyverse)
inputtext <- readLines("~/downloads/ExampleFile.txt")

1. Read data in a single data frame, create a column with the group name

dtf1 <- data_frame(input = inputtext) %>% 
 separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>% 
 mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>% 
 fill(group) %>% 
 filter(!is.na(x4)) 
head(dtf1)

2. Nest the data frame

dtf2 <- dtf1 %>% 
 group_by(group) %>% 
 nest() 
dtf2$data[[1]]

Give column names from the data to the first nested data frame

colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
colnames(dtf2$data[[1]]) <- colnames1

3. Give column names from the data to each sub data frame

 dtf3 <- dtf2 %>% 
 mutate(names = map(data, slice, 1),
 names = map(names, unlist),
 names = map(names,
 function(x) # Replace NA column names by the default x_ names
 x[is.na(x)] <- names(x[is.na(x)])
 return(x)
 ),
 data = map2(data, names, setNames), 
 data = map(data, slice, -1))

You now have a list of data frames. You can use the group name to call the corresponding data frame:

 > dtf3$data[dtf3$group=="Controls"][[1]]
# A tibble: 8 x 9
 Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9 
 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA 
2 "" A23 2.860 "" "" "" NA NA NA 
3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA 
4 "" A24 0.160 "" "" "" NA NA NA 
5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA 
6 "" A21 0.112 "" "" "" NA NA NA 
7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA 
8 "" A20 0.013 "" "" "" NA NA NA 
> dtf3$data[dtf3$group=="Standards"][[1]]
# A tibble: 24 x 9
 Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
 1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range? 
 2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689 
 3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465 
 4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554 
 5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982 
 6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033 
 7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449 
 8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560 
 9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397 
10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868 
# ... with 14 more rows
>

Note: rename based on this answer.

Placed source data as a gist in case it gets lost:
spectrophotometer.txt

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,37612458

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

add a comment |

You can use a nested data frame for that purpose. I detailed the method in 3 steps below.

library(tidyverse)
inputtext <- readLines("~/downloads/ExampleFile.txt")

1. Read data in a single data frame, create a column with the group name

dtf1 <- data_frame(input = inputtext) %>% 
 separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>% 
 mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>% 
 fill(group) %>% 
 filter(!is.na(x4)) 
head(dtf1)

2. Nest the data frame

dtf2 <- dtf1 %>% 
 group_by(group) %>% 
 nest() 
dtf2$data[[1]]

Give column names from the data to the first nested data frame

colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
colnames(dtf2$data[[1]]) <- colnames1

3. Give column names from the data to each sub data frame

 dtf3 <- dtf2 %>% 
 mutate(names = map(data, slice, 1),
 names = map(names, unlist),
 names = map(names,
 function(x) # Replace NA column names by the default x_ names
 x[is.na(x)] <- names(x[is.na(x)])
 return(x)
 ),
 data = map2(data, names, setNames), 
 data = map(data, slice, -1))

You now have a list of data frames. You can use the group name to call the corresponding data frame:

 > dtf3$data[dtf3$group=="Controls"][[1]]
# A tibble: 8 x 9
 Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9 
 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA 
2 "" A23 2.860 "" "" "" NA NA NA 
3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA 
4 "" A24 0.160 "" "" "" NA NA NA 
5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA 
6 "" A21 0.112 "" "" "" NA NA NA 
7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA 
8 "" A20 0.013 "" "" "" NA NA NA 
> dtf3$data[dtf3$group=="Standards"][[1]]
# A tibble: 24 x 9
 Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
 1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range? 
 2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689 
 3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465 
 4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554 
 5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982 
 6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033 
 7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449 
 8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560 
 9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397 
10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868 
# ... with 14 more rows
>

Note: rename based on this answer.

Placed source data as a gist in case it gets lost:
spectrophotometer.txt

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,37612458

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

add a comment |

You can use a nested data frame for that purpose. I detailed the method in 3 steps below.

library(tidyverse)
inputtext <- readLines("~/downloads/ExampleFile.txt")

1. Read data in a single data frame, create a column with the group name

dtf1 <- data_frame(input = inputtext) %>% 
 separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>% 
 mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>% 
 fill(group) %>% 
 filter(!is.na(x4)) 
head(dtf1)

2. Nest the data frame

dtf2 <- dtf1 %>% 
 group_by(group) %>% 
 nest() 
dtf2$data[[1]]

Give column names from the data to the first nested data frame

colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
colnames(dtf2$data[[1]]) <- colnames1

3. Give column names from the data to each sub data frame

 dtf3 <- dtf2 %>% 
 mutate(names = map(data, slice, 1),
 names = map(names, unlist),
 names = map(names,
 function(x) # Replace NA column names by the default x_ names
 x[is.na(x)] <- names(x[is.na(x)])
 return(x)
 ),
 data = map2(data, names, setNames), 
 data = map(data, slice, -1))

You now have a list of data frames. You can use the group name to call the corresponding data frame:

 > dtf3$data[dtf3$group=="Controls"][[1]]
# A tibble: 8 x 9
 Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9 
 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA 
2 "" A23 2.860 "" "" "" NA NA NA 
3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA 
4 "" A24 0.160 "" "" "" NA NA NA 
5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA 
6 "" A21 0.112 "" "" "" NA NA NA 
7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA 
8 "" A20 0.013 "" "" "" NA NA NA 
> dtf3$data[dtf3$group=="Standards"][[1]]
# A tibble: 24 x 9
 Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
 1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range? 
 2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689 
 3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465 
 4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554 
 5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982 
 6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033 
 7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449 
 8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560 
 9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397 
10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868 
# ... with 14 more rows
>

Note: rename based on this answer.

Placed source data as a gist in case it gets lost:
spectrophotometer.txt

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,37612458

You can use a nested data frame for that purpose. I detailed the method in 3 steps below.

library(tidyverse)
inputtext <- readLines("~/downloads/ExampleFile.txt")

1. Read data in a single data frame, create a column with the group name

dtf1 <- data_frame(input = inputtext) %>% 
 separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>% 
 mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>% 
 fill(group) %>% 
 filter(!is.na(x4)) 
head(dtf1)

2. Nest the data frame

dtf2 <- dtf1 %>% 
 group_by(group) %>% 
 nest() 
dtf2$data[[1]]

Give column names from the data to the first nested data frame

colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
colnames(dtf2$data[[1]]) <- colnames1

3. Give column names from the data to each sub data frame

 dtf3 <- dtf2 %>% 
 mutate(names = map(data, slice, 1),
 names = map(names, unlist),
 names = map(names,
 function(x) # Replace NA column names by the default x_ names
 x[is.na(x)] <- names(x[is.na(x)])
 return(x)
 ),
 data = map2(data, names, setNames), 
 data = map(data, slice, -1))

You now have a list of data frames. You can use the group name to call the corresponding data frame:

 > dtf3$data[dtf3$group=="Controls"][[1]]
# A tibble: 8 x 9
 Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9 
 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA 
2 "" A23 2.860 "" "" "" NA NA NA 
3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA 
4 "" A24 0.160 "" "" "" NA NA NA 
5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA 
6 "" A21 0.112 "" "" "" NA NA NA 
7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA 
8 "" A20 0.013 "" "" "" NA NA NA 
> dtf3$data[dtf3$group=="Standards"][[1]]
# A tibble: 24 x 9
 Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
 <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> 
 1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range? 
 2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689 
 3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465 
 4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554 
 5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982 
 6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033 
 7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449 
 8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560 
 9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397 
10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868 
# ... with 14 more rows
>

Note: rename based on this answer.

Placed source data as a gist in case it gets lost:
spectrophotometer.txt

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,37612458

edited Nov 20 '18 at 8:41

answered Nov 16 '18 at 14:17

Paul Rougieux

4,37612458

answered Nov 16 '18 at 14:17

Paul Rougieux

4,37612458

answered Nov 16 '18 at 14:17

Paul Rougieux

4,37612458

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

add a comment |

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.

– RevDev
Nov 16 '18 at 14:38

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth

Looking for advice on processing variable length text files

2 Answers
2

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

Post as a guest

Popular posts from this blog

27

2013 DreamHack Counter-Strike: Global Offensive Championship

Top Tejano songwriter Luis Silva dead of heart attack at 64

Looking for advice on processing variable length text files

2 Answers 2

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

1. Read data in a single data frame, create a column with the group name

2. Nest the data frame

3. Give column names from the data to each sub data frame

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

27

2013 DreamHack Counter-Strike: Global Offensive Championship

Top Tejano songwriter Luis Silva dead of heart attack at 64

2 Answers
2

2 Answers
2

2 Answers
2