Looking for advice on processing variable length text files
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.
The table included in each segment should start 1 line below "Group:" and stop either 2 lines above "~End" if the group is a control or 6 lines above "~End" if if the group is a standard. The lengths of the tables themselves will be variable and can be empty like the segment "SpikedControl".
and example file looks like this:
Group: Controls 1
Sample Wells Values MeanValue CV Od-bkgd-blank
Anti-Hu Det A11 2.849 2.855 0.282 2.853
A23 2.860
Coat Control A12 0.161 0.160 0.530 0.159
A24 0.160
Diluent Standard 1 A9 0.114 0.113 1.379 0.104
A21 0.112
Diluent Standard 2 A8 0.012 0.013 2.817 0.012
A20 0.013
~End
Group: SpikedControl 1
Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected
~End
Group: Standards 1
Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
B13 2.882 2.875 1153.779 57.689
St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
B14 2.855 2.847 670.358 100.554
St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
B15 2.709 2.702 237.852 107.033
St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
B16 2.258 2.248 77.452 104.560
St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
B17 1.433 1.424 25.153 101.868
St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
B18 0.697 0.686 8.240 100.115
St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
B19 0.329 0.320 2.759 100.579
St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
B20 0.188 0.181 0.895 97.891
St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
B21 0.146 0.137 0.296 97.190
St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
B22 0.121 0.114 Range? Range?
St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
B23 0.125 0.117 Range? Range?
St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
B24 0.115 0.104 Range? Range?
EC50 = 28.085
AUC = 5565.432
~End
I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.
Thanks!
Edit - Link to example file:
https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0
PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.
Edit 2 - Making some progress:
Read in file and get start and end lines for each segment
inputtext <- readLines("ExampleFile.txt")
starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)
which I can follow up with
test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)
now I am just trying to figure out how to accurately identify the number of rows to read.
So if the start row is 17, and the end row is 48. Then nrows needs to 24 which is 48 (end row indicated) - 17 (the first rows that are skipped) - 7 (to account for the header line and lines of fluff on the end of the table, which could also be 4 if it is a control table)
Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.
r
add a comment |
I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.
The table included in each segment should start 1 line below "Group:" and stop either 2 lines above "~End" if the group is a control or 6 lines above "~End" if if the group is a standard. The lengths of the tables themselves will be variable and can be empty like the segment "SpikedControl".
and example file looks like this:
Group: Controls 1
Sample Wells Values MeanValue CV Od-bkgd-blank
Anti-Hu Det A11 2.849 2.855 0.282 2.853
A23 2.860
Coat Control A12 0.161 0.160 0.530 0.159
A24 0.160
Diluent Standard 1 A9 0.114 0.113 1.379 0.104
A21 0.112
Diluent Standard 2 A8 0.012 0.013 2.817 0.012
A20 0.013
~End
Group: SpikedControl 1
Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected
~End
Group: Standards 1
Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
B13 2.882 2.875 1153.779 57.689
St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
B14 2.855 2.847 670.358 100.554
St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
B15 2.709 2.702 237.852 107.033
St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
B16 2.258 2.248 77.452 104.560
St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
B17 1.433 1.424 25.153 101.868
St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
B18 0.697 0.686 8.240 100.115
St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
B19 0.329 0.320 2.759 100.579
St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
B20 0.188 0.181 0.895 97.891
St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
B21 0.146 0.137 0.296 97.190
St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
B22 0.121 0.114 Range? Range?
St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
B23 0.125 0.117 Range? Range?
St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
B24 0.115 0.104 Range? Range?
EC50 = 28.085
AUC = 5565.432
~End
I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.
Thanks!
Edit - Link to example file:
https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0
PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.
Edit 2 - Making some progress:
Read in file and get start and end lines for each segment
inputtext <- readLines("ExampleFile.txt")
starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)
which I can follow up with
test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)
now I am just trying to figure out how to accurately identify the number of rows to read.
So if the start row is 17, and the end row is 48. Then nrows needs to 24 which is 48 (end row indicated) - 17 (the first rows that are skipped) - 7 (to account for the header line and lines of fluff on the end of the table, which could also be 4 if it is a control table)
Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.
r
add a comment |
I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.
The table included in each segment should start 1 line below "Group:" and stop either 2 lines above "~End" if the group is a control or 6 lines above "~End" if if the group is a standard. The lengths of the tables themselves will be variable and can be empty like the segment "SpikedControl".
and example file looks like this:
Group: Controls 1
Sample Wells Values MeanValue CV Od-bkgd-blank
Anti-Hu Det A11 2.849 2.855 0.282 2.853
A23 2.860
Coat Control A12 0.161 0.160 0.530 0.159
A24 0.160
Diluent Standard 1 A9 0.114 0.113 1.379 0.104
A21 0.112
Diluent Standard 2 A8 0.012 0.013 2.817 0.012
A20 0.013
~End
Group: SpikedControl 1
Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected
~End
Group: Standards 1
Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
B13 2.882 2.875 1153.779 57.689
St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
B14 2.855 2.847 670.358 100.554
St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
B15 2.709 2.702 237.852 107.033
St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
B16 2.258 2.248 77.452 104.560
St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
B17 1.433 1.424 25.153 101.868
St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
B18 0.697 0.686 8.240 100.115
St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
B19 0.329 0.320 2.759 100.579
St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
B20 0.188 0.181 0.895 97.891
St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
B21 0.146 0.137 0.296 97.190
St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
B22 0.121 0.114 Range? Range?
St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
B23 0.125 0.117 Range? Range?
St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
B24 0.115 0.104 Range? Range?
EC50 = 28.085
AUC = 5565.432
~End
I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.
Thanks!
Edit - Link to example file:
https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0
PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.
Edit 2 - Making some progress:
Read in file and get start and end lines for each segment
inputtext <- readLines("ExampleFile.txt")
starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)
which I can follow up with
test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)
now I am just trying to figure out how to accurately identify the number of rows to read.
So if the start row is 17, and the end row is 48. Then nrows needs to 24 which is 48 (end row indicated) - 17 (the first rows that are skipped) - 7 (to account for the header line and lines of fluff on the end of the table, which could also be 4 if it is a control table)
Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.
r
I need to write some code to process a tab delimited text files that has defined segments with variable lengths in R.
The table included in each segment should start 1 line below "Group:" and stop either 2 lines above "~End" if the group is a control or 6 lines above "~End" if if the group is a standard. The lengths of the tables themselves will be variable and can be empty like the segment "SpikedControl".
and example file looks like this:
Group: Controls 1
Sample Wells Values MeanValue CV Od-bkgd-blank
Anti-Hu Det A11 2.849 2.855 0.282 2.853
A23 2.860
Coat Control A12 0.161 0.160 0.530 0.159
A24 0.160
Diluent Standard 1 A9 0.114 0.113 1.379 0.104
A21 0.112
Diluent Standard 2 A8 0.012 0.013 2.817 0.012
A20 0.013
~End
Group: SpikedControl 1
Sample Wells Concentration Values MeanValue CV ODbkgdblank Conc %Expected
~End
Group: Standards 1
Sample ExpConc Wells OD CV OD ODblank MeanODBlank Result %Recovery
St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
B13 2.882 2.875 1153.779 57.689
St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
B14 2.855 2.847 670.358 100.554
St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
B15 2.709 2.702 237.852 107.033
St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
B16 2.258 2.248 77.452 104.560
St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
B17 1.433 1.424 25.153 101.868
St006 8.230 B6 0.669 2.9 0.658 0.672 7.781 94.536
B18 0.697 0.686 8.240 100.115
St007 2.743 B7 0.357 5.8 0.348 0.334 3.143 114.579
B19 0.329 0.320 2.759 100.579
St008 0.914 B8 0.198 3.7 0.191 0.186 1.029 112.551
B20 0.188 0.181 0.895 97.891
St009 0.305 B9 0.163 7.8 0.154 0.146 0.532 174.477
B21 0.146 0.137 0.296 97.190
St010 0.102 B10 0.130 5.1 0.123 0.119 0.096 94.087
B22 0.121 0.114 Range? Range?
St011 0.034 B11 0.133 4.7 0.126 0.122 0.134 394.778
B23 0.125 0.117 Range? Range?
St012 0.011 B12 0.117 0.7 0.105 0.104 Range? Range?
B24 0.115 0.104 Range? Range?
EC50 = 28.085
AUC = 5565.432
~End
I am not very experienced with processing text files like this, and am looking on some advice on how to approach identifying these segments and reading the tables within.
Thanks!
Edit - Link to example file:
https://www.dropbox.com/s/4m0lmbbequmpd9b/ExampleFile.txt?dl=0
PS: these files are spit out from a spectrophotometer so I don't have any control over the format as the software is pretty antiquated.
Edit 2 - Making some progress:
Read in file and get start and end lines for each segment
inputtext <- readLines("ExampleFile.txt")
starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)
which I can follow up with
test2 <- read.table("ExampleFile.txt", header = T, sep = "t", skip = 17, nrows = 24, blank.lines.skip = F)
now I am just trying to figure out how to accurately identify the number of rows to read.
So if the start row is 17, and the end row is 48. Then nrows needs to 24 which is 48 (end row indicated) - 17 (the first rows that are skipped) - 7 (to account for the header line and lines of fluff on the end of the table, which could also be 4 if it is a control table)
Now I just need to figure out how to loop this and properly identify whether the group is a control or standard to subtract the right amount of fluff.
r
r
edited Nov 16 '18 at 17:46
RevDev
asked Nov 16 '18 at 13:42
RevDevRevDev
1319
1319
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
I ended up doing the following:
library(tidyverse)
inputtext <- readLines("Test.txt")
starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)
realend <- (ends - starts) - 2
dff <- list()
for (i in 1:4)
dff[[i]] <- read.table("Test.txt",
header = T,
sep = "t",
skip = starts[i],
nrows = realend[i],
blank.lines.skip = F,
row.names = NULL)
dff <- lapply(dff, function(x) x[!is.na(x$Values),])
dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]
names(dff) <- letters[1:length(dff)]
list2env(dff,.GlobalEnv)
add a comment |
You can use a nested data frame for that purpose. I detailed the method in 3 steps below.
library(tidyverse)
inputtext <- readLines("~/downloads/ExampleFile.txt")
1. Read data in a single data frame, create a column with the group name
dtf1 <- data_frame(input = inputtext) %>%
separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>%
mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>%
fill(group) %>%
filter(!is.na(x4))
head(dtf1)
2. Nest the data frame
dtf2 <- dtf1 %>%
group_by(group) %>%
nest()
dtf2$data[[1]]
Give column names from the data to the first nested data frame
colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
colnames(dtf2$data[[1]]) <- colnames1
3. Give column names from the data to each sub data frame
dtf3 <- dtf2 %>%
mutate(names = map(data, slice, 1),
names = map(names, unlist),
names = map(names,
function(x) # Replace NA column names by the default x_ names
x[is.na(x)] <- names(x[is.na(x)])
return(x)
),
data = map2(data, names, setNames),
data = map(data, slice, -1))
You now have a list of data frames. You can use the group name to call the corresponding data frame:
> dtf3$data[dtf3$group=="Controls"][[1]]
# A tibble: 8 x 9
Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA
2 "" A23 2.860 "" "" "" NA NA NA
3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA
4 "" A24 0.160 "" "" "" NA NA NA
5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA
6 "" A21 0.112 "" "" "" NA NA NA
7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA
8 "" A20 0.013 "" "" "" NA NA NA
> dtf3$data[dtf3$group=="Standards"][[1]]
# A tibble: 24 x 9
Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689
3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554
5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033
7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560
9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868
# ... with 14 more rows
>
Note: rename based on this answer.
Placed source data as a gist in case it gets lost:
spectrophotometer.txt
Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.
– RevDev
Nov 16 '18 at 14:38
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339042%2flooking-for-advice-on-processing-variable-length-text-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I ended up doing the following:
library(tidyverse)
inputtext <- readLines("Test.txt")
starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)
realend <- (ends - starts) - 2
dff <- list()
for (i in 1:4)
dff[[i]] <- read.table("Test.txt",
header = T,
sep = "t",
skip = starts[i],
nrows = realend[i],
blank.lines.skip = F,
row.names = NULL)
dff <- lapply(dff, function(x) x[!is.na(x$Values),])
dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]
names(dff) <- letters[1:length(dff)]
list2env(dff,.GlobalEnv)
add a comment |
I ended up doing the following:
library(tidyverse)
inputtext <- readLines("Test.txt")
starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)
realend <- (ends - starts) - 2
dff <- list()
for (i in 1:4)
dff[[i]] <- read.table("Test.txt",
header = T,
sep = "t",
skip = starts[i],
nrows = realend[i],
blank.lines.skip = F,
row.names = NULL)
dff <- lapply(dff, function(x) x[!is.na(x$Values),])
dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]
names(dff) <- letters[1:length(dff)]
list2env(dff,.GlobalEnv)
add a comment |
I ended up doing the following:
library(tidyverse)
inputtext <- readLines("Test.txt")
starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)
realend <- (ends - starts) - 2
dff <- list()
for (i in 1:4)
dff[[i]] <- read.table("Test.txt",
header = T,
sep = "t",
skip = starts[i],
nrows = realend[i],
blank.lines.skip = F,
row.names = NULL)
dff <- lapply(dff, function(x) x[!is.na(x$Values),])
dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]
names(dff) <- letters[1:length(dff)]
list2env(dff,.GlobalEnv)
I ended up doing the following:
library(tidyverse)
inputtext <- readLines("Test.txt")
starts <- grep("Group:", inputtext)
ends <- grep("~End", inputtext)
realend <- (ends - starts) - 2
dff <- list()
for (i in 1:4)
dff[[i]] <- read.table("Test.txt",
header = T,
sep = "t",
skip = starts[i],
nrows = realend[i],
blank.lines.skip = F,
row.names = NULL)
dff <- lapply(dff, function(x) x[!is.na(x$Values),])
dff <- dff[sapply(dff, function(x) dim(x)[1]) > 0]
names(dff) <- letters[1:length(dff)]
list2env(dff,.GlobalEnv)
answered Nov 16 '18 at 19:13
RevDevRevDev
1319
1319
add a comment |
add a comment |
You can use a nested data frame for that purpose. I detailed the method in 3 steps below.
library(tidyverse)
inputtext <- readLines("~/downloads/ExampleFile.txt")
1. Read data in a single data frame, create a column with the group name
dtf1 <- data_frame(input = inputtext) %>%
separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>%
mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>%
fill(group) %>%
filter(!is.na(x4))
head(dtf1)
2. Nest the data frame
dtf2 <- dtf1 %>%
group_by(group) %>%
nest()
dtf2$data[[1]]
Give column names from the data to the first nested data frame
colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
colnames(dtf2$data[[1]]) <- colnames1
3. Give column names from the data to each sub data frame
dtf3 <- dtf2 %>%
mutate(names = map(data, slice, 1),
names = map(names, unlist),
names = map(names,
function(x) # Replace NA column names by the default x_ names
x[is.na(x)] <- names(x[is.na(x)])
return(x)
),
data = map2(data, names, setNames),
data = map(data, slice, -1))
You now have a list of data frames. You can use the group name to call the corresponding data frame:
> dtf3$data[dtf3$group=="Controls"][[1]]
# A tibble: 8 x 9
Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA
2 "" A23 2.860 "" "" "" NA NA NA
3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA
4 "" A24 0.160 "" "" "" NA NA NA
5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA
6 "" A21 0.112 "" "" "" NA NA NA
7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA
8 "" A20 0.013 "" "" "" NA NA NA
> dtf3$data[dtf3$group=="Standards"][[1]]
# A tibble: 24 x 9
Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689
3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554
5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033
7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560
9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868
# ... with 14 more rows
>
Note: rename based on this answer.
Placed source data as a gist in case it gets lost:
spectrophotometer.txt
Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.
– RevDev
Nov 16 '18 at 14:38
add a comment |
You can use a nested data frame for that purpose. I detailed the method in 3 steps below.
library(tidyverse)
inputtext <- readLines("~/downloads/ExampleFile.txt")
1. Read data in a single data frame, create a column with the group name
dtf1 <- data_frame(input = inputtext) %>%
separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>%
mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>%
fill(group) %>%
filter(!is.na(x4))
head(dtf1)
2. Nest the data frame
dtf2 <- dtf1 %>%
group_by(group) %>%
nest()
dtf2$data[[1]]
Give column names from the data to the first nested data frame
colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
colnames(dtf2$data[[1]]) <- colnames1
3. Give column names from the data to each sub data frame
dtf3 <- dtf2 %>%
mutate(names = map(data, slice, 1),
names = map(names, unlist),
names = map(names,
function(x) # Replace NA column names by the default x_ names
x[is.na(x)] <- names(x[is.na(x)])
return(x)
),
data = map2(data, names, setNames),
data = map(data, slice, -1))
You now have a list of data frames. You can use the group name to call the corresponding data frame:
> dtf3$data[dtf3$group=="Controls"][[1]]
# A tibble: 8 x 9
Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA
2 "" A23 2.860 "" "" "" NA NA NA
3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA
4 "" A24 0.160 "" "" "" NA NA NA
5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA
6 "" A21 0.112 "" "" "" NA NA NA
7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA
8 "" A20 0.013 "" "" "" NA NA NA
> dtf3$data[dtf3$group=="Standards"][[1]]
# A tibble: 24 x 9
Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689
3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554
5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033
7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560
9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868
# ... with 14 more rows
>
Note: rename based on this answer.
Placed source data as a gist in case it gets lost:
spectrophotometer.txt
Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.
– RevDev
Nov 16 '18 at 14:38
add a comment |
You can use a nested data frame for that purpose. I detailed the method in 3 steps below.
library(tidyverse)
inputtext <- readLines("~/downloads/ExampleFile.txt")
1. Read data in a single data frame, create a column with the group name
dtf1 <- data_frame(input = inputtext) %>%
separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>%
mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>%
fill(group) %>%
filter(!is.na(x4))
head(dtf1)
2. Nest the data frame
dtf2 <- dtf1 %>%
group_by(group) %>%
nest()
dtf2$data[[1]]
Give column names from the data to the first nested data frame
colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
colnames(dtf2$data[[1]]) <- colnames1
3. Give column names from the data to each sub data frame
dtf3 <- dtf2 %>%
mutate(names = map(data, slice, 1),
names = map(names, unlist),
names = map(names,
function(x) # Replace NA column names by the default x_ names
x[is.na(x)] <- names(x[is.na(x)])
return(x)
),
data = map2(data, names, setNames),
data = map(data, slice, -1))
You now have a list of data frames. You can use the group name to call the corresponding data frame:
> dtf3$data[dtf3$group=="Controls"][[1]]
# A tibble: 8 x 9
Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA
2 "" A23 2.860 "" "" "" NA NA NA
3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA
4 "" A24 0.160 "" "" "" NA NA NA
5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA
6 "" A21 0.112 "" "" "" NA NA NA
7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA
8 "" A20 0.013 "" "" "" NA NA NA
> dtf3$data[dtf3$group=="Standards"][[1]]
# A tibble: 24 x 9
Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689
3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554
5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033
7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560
9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868
# ... with 14 more rows
>
Note: rename based on this answer.
Placed source data as a gist in case it gets lost:
spectrophotometer.txt
You can use a nested data frame for that purpose. I detailed the method in 3 steps below.
library(tidyverse)
inputtext <- readLines("~/downloads/ExampleFile.txt")
1. Read data in a single data frame, create a column with the group name
dtf1 <- data_frame(input = inputtext) %>%
separate(input, c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), sep = "t") %>%
mutate(group = ifelse(grepl("Group:", x1), x2, NA)) %>%
fill(group) %>%
filter(!is.na(x4))
head(dtf1)
2. Nest the data frame
dtf2 <- dtf1 %>%
group_by(group) %>%
nest()
dtf2$data[[1]]
Give column names from the data to the first nested data frame
colnames1 <- dtf2$data[[1]] %>% slice(1) %>% unlist()
colnames1[is.na(colnames1)] <- names(colnames1[is.na(colnames1)])
colnames(dtf2$data[[1]]) <- colnames1
3. Give column names from the data to each sub data frame
dtf3 <- dtf2 %>%
mutate(names = map(data, slice, 1),
names = map(names, unlist),
names = map(names,
function(x) # Replace NA column names by the default x_ names
x[is.na(x)] <- names(x[is.na(x)])
return(x)
),
data = map2(data, names, setNames),
data = map(data, slice, -1))
You now have a list of data frames. You can use the group name to call the corresponding data frame:
> dtf3$data[dtf3$group=="Controls"][[1]]
# A tibble: 8 x 9
Sample Wells Values MeanValue CV `Od-bkgd-blank` x7 x8 x9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Anti-Hu Det A11 2.849 2.855 0.282 2.853 NA NA NA
2 "" A23 2.860 "" "" "" NA NA NA
3 Coat Control A12 0.161 0.160 0.530 0.159 NA NA NA
4 "" A24 0.160 "" "" "" NA NA NA
5 Diluent Standard 1 A9 0.114 0.113 1.379 0.104 NA NA NA
6 "" A21 0.112 "" "" "" NA NA NA
7 Diluent Standard 2 A8 0.012 0.013 2.817 0.012 NA NA NA
8 "" A20 0.013 "" "" "" NA NA NA
> dtf3$data[dtf3$group=="Standards"][[1]]
# A tibble: 24 x 9
Sample ExpConc Wells OD `CV OD` ODblank MeanODBlank Result `%Recovery`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 St001 2000.000 B1 2.939 1.4 2.932 2.904 Range? Range?
2 "" "" B13 2.882 "" 2.875 "" 1153.779 57.689
3 St002 666.667 B2 2.820 0.9 2.812 2.829 456.435 68.465
4 "" "" B14 2.855 "" 2.847 "" 670.358 100.554
5 St003 222.222 B3 2.677 0.9 2.669 2.686 208.849 93.982
6 "" "" B15 2.709 "" 2.702 "" 237.852 107.033
7 St004 74.074 B4 2.215 1.4 2.205 2.226 72.185 97.449
8 "" "" B16 2.258 "" 2.248 "" 77.452 104.560
9 St005 24.691 B5 1.406 1.3 1.397 1.410 24.296 98.397
10 "" "" B17 1.433 "" 1.424 "" 25.153 101.868
# ... with 14 more rows
>
Note: rename based on this answer.
Placed source data as a gist in case it gets lost:
spectrophotometer.txt
edited Nov 20 '18 at 8:41
answered Nov 16 '18 at 14:17
Paul RougieuxPaul Rougieux
4,37612458
4,37612458
Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.
– RevDev
Nov 16 '18 at 14:38
add a comment |
Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.
– RevDev
Nov 16 '18 at 14:38
Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.
– RevDev
Nov 16 '18 at 14:38
Ah i probably screwed it up when I trimmed it down for the example, I have edited my post with a link to a full file example.
– RevDev
Nov 16 '18 at 14:38
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339042%2flooking-for-advice-on-processing-variable-length-text-files%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown