Create loop using awk BEGIN [duplicate]
This question already has an answer here:
How do I use shell variables in an awk script?
8 answers
I am trying to turn a awk BEGIN code into a loop. The original code was used to edit data based the value of the "Batch" column and output a file.
This is the original code (that works great):
awk '
BEGIN
FS=OFS=","
FNR==1
for(i=1;i<=NF;i++)
if($i=="YBr")
field=i
if($i=="NationalCowID")
value=i
$field==1
for(i=value+1;i<=NF;i++)
$i="*"
1
' obvs.csv > obvs1.csv
The above code takes the file obvs.csv that looks like this:
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9
and edits the data to look like this (obvs1.csv):
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,*,9
I would like to take this code and turn it into a loop so that a new file is created with the edits begin applied to each value of the "Batch" column(1-6).I've read some examples and the command explanation, but I don't fully understand what each part of the code does.
For example, how do I code $j compared to the already coded $i?
this is the loop I have tried to create:
for j in 1..6
do
awk '
BEGIN
FS=OFS=","
FNR==$j
for(i=1;i<=NF;i++)
if($i=="Batch")
field=i
if($i=="NationalCowID")
value=i
$field==1
for(i=value+1;i<=NF;i++)
$i="*"
$j
' obvs.csv > obvs$j.csv
done
In the end I am hoping to have 6 files as follows:
obvs1.csv -> only lines with batch = 1 are edited
obvs2.csv -> only lines with batch = 2 are edited
obvs3.csv-> only lines with batch = 3 are edited
obvs4.csv-> only lines with batch = 4 are edited
obvs5.csv-> only lines with batch = 5 are edited
obvs6.csv-> only lines with batch = 6 are edited
So the file name corresponds to the "Batch" being used as an indicator for which line to edit. i.e. for obvs2.csv, for data lines where Batch equals 2, all columns except for the first and last would be edited to * .
So far, I end up with 6 files that are named correctly but the edits within the file are not correct.
Any direction/ code explanation is greatly appreciated!
arrays loops awk
marked as duplicate by jas, Inian, kvantour, Michael Dodd, Pearly Spencer Nov 15 '18 at 15:24
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
|
show 2 more comments
This question already has an answer here:
How do I use shell variables in an awk script?
8 answers
I am trying to turn a awk BEGIN code into a loop. The original code was used to edit data based the value of the "Batch" column and output a file.
This is the original code (that works great):
awk '
BEGIN
FS=OFS=","
FNR==1
for(i=1;i<=NF;i++)
if($i=="YBr")
field=i
if($i=="NationalCowID")
value=i
$field==1
for(i=value+1;i<=NF;i++)
$i="*"
1
' obvs.csv > obvs1.csv
The above code takes the file obvs.csv that looks like this:
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9
and edits the data to look like this (obvs1.csv):
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,*,9
I would like to take this code and turn it into a loop so that a new file is created with the edits begin applied to each value of the "Batch" column(1-6).I've read some examples and the command explanation, but I don't fully understand what each part of the code does.
For example, how do I code $j compared to the already coded $i?
this is the loop I have tried to create:
for j in 1..6
do
awk '
BEGIN
FS=OFS=","
FNR==$j
for(i=1;i<=NF;i++)
if($i=="Batch")
field=i
if($i=="NationalCowID")
value=i
$field==1
for(i=value+1;i<=NF;i++)
$i="*"
$j
' obvs.csv > obvs$j.csv
done
In the end I am hoping to have 6 files as follows:
obvs1.csv -> only lines with batch = 1 are edited
obvs2.csv -> only lines with batch = 2 are edited
obvs3.csv-> only lines with batch = 3 are edited
obvs4.csv-> only lines with batch = 4 are edited
obvs5.csv-> only lines with batch = 5 are edited
obvs6.csv-> only lines with batch = 6 are edited
So the file name corresponds to the "Batch" being used as an indicator for which line to edit. i.e. for obvs2.csv, for data lines where Batch equals 2, all columns except for the first and last would be edited to * .
So far, I end up with 6 files that are named correctly but the edits within the file are not correct.
Any direction/ code explanation is greatly appreciated!
arrays loops awk
marked as duplicate by jas, Inian, kvantour, Michael Dodd, Pearly Spencer Nov 15 '18 at 15:24
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Please post samples of input and expected output, as it is NOT clear here.
– RavinderSingh13
Nov 15 '18 at 1:53
@RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now
– Cae.rich
Nov 15 '18 at 2:20
@Cae.rich, still not clear,what is the logic lines are having*
in them and what is the condition of getting output into new files? Please be clear in it.
– RavinderSingh13
Nov 15 '18 at 2:52
In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13
– Cae.rich
Nov 15 '18 at 3:03
@Cae.rich, ok added solution now, try and let me know then?
– RavinderSingh13
Nov 15 '18 at 3:35
|
show 2 more comments
This question already has an answer here:
How do I use shell variables in an awk script?
8 answers
I am trying to turn a awk BEGIN code into a loop. The original code was used to edit data based the value of the "Batch" column and output a file.
This is the original code (that works great):
awk '
BEGIN
FS=OFS=","
FNR==1
for(i=1;i<=NF;i++)
if($i=="YBr")
field=i
if($i=="NationalCowID")
value=i
$field==1
for(i=value+1;i<=NF;i++)
$i="*"
1
' obvs.csv > obvs1.csv
The above code takes the file obvs.csv that looks like this:
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9
and edits the data to look like this (obvs1.csv):
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,*,9
I would like to take this code and turn it into a loop so that a new file is created with the edits begin applied to each value of the "Batch" column(1-6).I've read some examples and the command explanation, but I don't fully understand what each part of the code does.
For example, how do I code $j compared to the already coded $i?
this is the loop I have tried to create:
for j in 1..6
do
awk '
BEGIN
FS=OFS=","
FNR==$j
for(i=1;i<=NF;i++)
if($i=="Batch")
field=i
if($i=="NationalCowID")
value=i
$field==1
for(i=value+1;i<=NF;i++)
$i="*"
$j
' obvs.csv > obvs$j.csv
done
In the end I am hoping to have 6 files as follows:
obvs1.csv -> only lines with batch = 1 are edited
obvs2.csv -> only lines with batch = 2 are edited
obvs3.csv-> only lines with batch = 3 are edited
obvs4.csv-> only lines with batch = 4 are edited
obvs5.csv-> only lines with batch = 5 are edited
obvs6.csv-> only lines with batch = 6 are edited
So the file name corresponds to the "Batch" being used as an indicator for which line to edit. i.e. for obvs2.csv, for data lines where Batch equals 2, all columns except for the first and last would be edited to * .
So far, I end up with 6 files that are named correctly but the edits within the file are not correct.
Any direction/ code explanation is greatly appreciated!
arrays loops awk
This question already has an answer here:
How do I use shell variables in an awk script?
8 answers
I am trying to turn a awk BEGIN code into a loop. The original code was used to edit data based the value of the "Batch" column and output a file.
This is the original code (that works great):
awk '
BEGIN
FS=OFS=","
FNR==1
for(i=1;i<=NF;i++)
if($i=="YBr")
field=i
if($i=="NationalCowID")
value=i
$field==1
for(i=value+1;i<=NF;i++)
$i="*"
1
' obvs.csv > obvs1.csv
The above code takes the file obvs.csv that looks like this:
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9
and edits the data to look like this (obvs1.csv):
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,*,9
I would like to take this code and turn it into a loop so that a new file is created with the edits begin applied to each value of the "Batch" column(1-6).I've read some examples and the command explanation, but I don't fully understand what each part of the code does.
For example, how do I code $j compared to the already coded $i?
this is the loop I have tried to create:
for j in 1..6
do
awk '
BEGIN
FS=OFS=","
FNR==$j
for(i=1;i<=NF;i++)
if($i=="Batch")
field=i
if($i=="NationalCowID")
value=i
$field==1
for(i=value+1;i<=NF;i++)
$i="*"
$j
' obvs.csv > obvs$j.csv
done
In the end I am hoping to have 6 files as follows:
obvs1.csv -> only lines with batch = 1 are edited
obvs2.csv -> only lines with batch = 2 are edited
obvs3.csv-> only lines with batch = 3 are edited
obvs4.csv-> only lines with batch = 4 are edited
obvs5.csv-> only lines with batch = 5 are edited
obvs6.csv-> only lines with batch = 6 are edited
So the file name corresponds to the "Batch" being used as an indicator for which line to edit. i.e. for obvs2.csv, for data lines where Batch equals 2, all columns except for the first and last would be edited to * .
So far, I end up with 6 files that are named correctly but the edits within the file are not correct.
Any direction/ code explanation is greatly appreciated!
This question already has an answer here:
How do I use shell variables in an awk script?
8 answers
arrays loops awk
arrays loops awk
edited Nov 15 '18 at 2:52
Cae.rich
asked Nov 14 '18 at 23:56
Cae.richCae.rich
204
204
marked as duplicate by jas, Inian, kvantour, Michael Dodd, Pearly Spencer Nov 15 '18 at 15:24
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by jas, Inian, kvantour, Michael Dodd, Pearly Spencer Nov 15 '18 at 15:24
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
Please post samples of input and expected output, as it is NOT clear here.
– RavinderSingh13
Nov 15 '18 at 1:53
@RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now
– Cae.rich
Nov 15 '18 at 2:20
@Cae.rich, still not clear,what is the logic lines are having*
in them and what is the condition of getting output into new files? Please be clear in it.
– RavinderSingh13
Nov 15 '18 at 2:52
In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13
– Cae.rich
Nov 15 '18 at 3:03
@Cae.rich, ok added solution now, try and let me know then?
– RavinderSingh13
Nov 15 '18 at 3:35
|
show 2 more comments
Please post samples of input and expected output, as it is NOT clear here.
– RavinderSingh13
Nov 15 '18 at 1:53
@RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now
– Cae.rich
Nov 15 '18 at 2:20
@Cae.rich, still not clear,what is the logic lines are having*
in them and what is the condition of getting output into new files? Please be clear in it.
– RavinderSingh13
Nov 15 '18 at 2:52
In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13
– Cae.rich
Nov 15 '18 at 3:03
@Cae.rich, ok added solution now, try and let me know then?
– RavinderSingh13
Nov 15 '18 at 3:35
Please post samples of input and expected output, as it is NOT clear here.
– RavinderSingh13
Nov 15 '18 at 1:53
Please post samples of input and expected output, as it is NOT clear here.
– RavinderSingh13
Nov 15 '18 at 1:53
@RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now
– Cae.rich
Nov 15 '18 at 2:20
@RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now
– Cae.rich
Nov 15 '18 at 2:20
@Cae.rich, still not clear,what is the logic lines are having
*
in them and what is the condition of getting output into new files? Please be clear in it.– RavinderSingh13
Nov 15 '18 at 2:52
@Cae.rich, still not clear,what is the logic lines are having
*
in them and what is the condition of getting output into new files? Please be clear in it.– RavinderSingh13
Nov 15 '18 at 2:52
In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13
– Cae.rich
Nov 15 '18 at 3:03
In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13
– Cae.rich
Nov 15 '18 at 3:03
@Cae.rich, ok added solution now, try and let me know then?
– RavinderSingh13
Nov 15 '18 at 3:35
@Cae.rich, ok added solution now, try and let me know then?
– RavinderSingh13
Nov 15 '18 at 3:35
|
show 2 more comments
2 Answers
2
active
oldest
votes
Could you please try following once.
awk '
BEGIN
FS=OFS=","
FNR==1
head=$0
next
count[$6]
a[++val]=$0
END
for(i in count)
for(j=1;j<=val;j++)
num=split(a[j],array,",")
if(!header["output_file"i]++)
print (head) > "output_file"i
if(array[6]==i)
for(k=2;k<=(num-1);k++)
value=value?value OFS "*":"*"
if(value)
print (array[1],value,array[num]) > "output_file"i
else print (a[j]) > "output_file"i
value=""
' Input_file
Above code will create 2 output files named output_file1
and output_file2
you could change output file names as per your wish too.
Output files values will be as follows.
cat output_file1
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,9
cat output_file2
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,*,*,*,*,*,*,*,*,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,*,*,*,*,*,*,*,*,7
206004579,*,*,*,*,*,*,*,*,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9
Will add explanation too shortly.
EDIT: Adding explanation too above code now.
awk ' ##awk code starts from here.
BEGIN ##Starting BEGIN section for awk code here.
FS=OFS="," ##Setting FS and OFS values here as comma for all lines.
##Closing BEGIN block here.
FNR==1 ##Checking condition when first line is being read for Input_file.
head=$0 ##Creating a variable named head whose value is current line value.
next ##next will skip all further statements from here.
##Closing FNR==1 condition block here.
##Starting main block for awk code here.
count[$6] ##Creating an array named count whose index is $6(6th field of current line).
a[++val]=$0 ##Creating an array named a whose index is variable val value increment with 1 and value is $0.
##Closing main section of awk code here.
END ##Mentioning END block of this awk code here.
for(i in count) ##Starting a for loop to traverse through count array.
for(j=1;j<=val;j++) ##Starting for loop which runs from j value from 1 to value of val.
num=split(a[j],array,",") ##Spliting value of array a into array named array and its length stores in num variable.
if(!header["output_file"i]++) ##Checking if string output_file with i value is NOT in header array then do following.
print (head) > "output_file"i ##Printing header(taken from 1st line of Input_file) to output_file i(integer value).
##Closing for block for j<=val one here.
if(array[6]==i) ##Checking condition if array[6] value is equal to i then do following.
for(k=2;k<=(num-1);k++) ##Starting a for loop from k=2 to till value of (num-1) here.
value=value?value OFS "*":"*" ##Creating variable value which is having * in it and its value is concatenating.
##Closing for loop block here.
##Closing if condition block here.
if(value) ##Checking condition if variable value is NOT NULL then do following.
print (array[1],value,array[num]) > "output_file"i ##printing array[1],value,array[num] values to output_file i(interger) value output file.
##Closing block for if condition here.
else print (a[j]) > "output_file"i ##Going to else part of if condition printing value of a[j] to output_file i here.
value="" ##Nullifying variable value here.
##Closing block for for loop here.
' Input_file ##Mentioning Input_file name here.
@Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.
– RavinderSingh13
Nov 16 '18 at 18:52
1
Thanks for your explanation. Really helped to understand the code!
– Cae.rich
Nov 26 '18 at 0:00
add a comment |
The problem is that you are mixing bash variables and awk variables. Hence the suggested duplicate. The awk
code you provide can also not perform the substitutions you mention.
This is how your original code should look like:
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "YBr") bfield=i;
else if ($i == "NationalCowID") cfield=i
($bfield == 1) for(i=cfield+1;i<NF;++i) $i="*"
1
You can now adapt to include the value that $bfield
should have:
awk -v bvalue="$j" '
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "Batch") bfield=i;
else if ($i == "NationalCowID") cfield=i
($bfield == bvalue) for(i=cfield+1;i<NF;++i) $i="*"
1
' file.csv
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Could you please try following once.
awk '
BEGIN
FS=OFS=","
FNR==1
head=$0
next
count[$6]
a[++val]=$0
END
for(i in count)
for(j=1;j<=val;j++)
num=split(a[j],array,",")
if(!header["output_file"i]++)
print (head) > "output_file"i
if(array[6]==i)
for(k=2;k<=(num-1);k++)
value=value?value OFS "*":"*"
if(value)
print (array[1],value,array[num]) > "output_file"i
else print (a[j]) > "output_file"i
value=""
' Input_file
Above code will create 2 output files named output_file1
and output_file2
you could change output file names as per your wish too.
Output files values will be as follows.
cat output_file1
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,9
cat output_file2
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,*,*,*,*,*,*,*,*,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,*,*,*,*,*,*,*,*,7
206004579,*,*,*,*,*,*,*,*,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9
Will add explanation too shortly.
EDIT: Adding explanation too above code now.
awk ' ##awk code starts from here.
BEGIN ##Starting BEGIN section for awk code here.
FS=OFS="," ##Setting FS and OFS values here as comma for all lines.
##Closing BEGIN block here.
FNR==1 ##Checking condition when first line is being read for Input_file.
head=$0 ##Creating a variable named head whose value is current line value.
next ##next will skip all further statements from here.
##Closing FNR==1 condition block here.
##Starting main block for awk code here.
count[$6] ##Creating an array named count whose index is $6(6th field of current line).
a[++val]=$0 ##Creating an array named a whose index is variable val value increment with 1 and value is $0.
##Closing main section of awk code here.
END ##Mentioning END block of this awk code here.
for(i in count) ##Starting a for loop to traverse through count array.
for(j=1;j<=val;j++) ##Starting for loop which runs from j value from 1 to value of val.
num=split(a[j],array,",") ##Spliting value of array a into array named array and its length stores in num variable.
if(!header["output_file"i]++) ##Checking if string output_file with i value is NOT in header array then do following.
print (head) > "output_file"i ##Printing header(taken from 1st line of Input_file) to output_file i(integer value).
##Closing for block for j<=val one here.
if(array[6]==i) ##Checking condition if array[6] value is equal to i then do following.
for(k=2;k<=(num-1);k++) ##Starting a for loop from k=2 to till value of (num-1) here.
value=value?value OFS "*":"*" ##Creating variable value which is having * in it and its value is concatenating.
##Closing for loop block here.
##Closing if condition block here.
if(value) ##Checking condition if variable value is NOT NULL then do following.
print (array[1],value,array[num]) > "output_file"i ##printing array[1],value,array[num] values to output_file i(interger) value output file.
##Closing block for if condition here.
else print (a[j]) > "output_file"i ##Going to else part of if condition printing value of a[j] to output_file i here.
value="" ##Nullifying variable value here.
##Closing block for for loop here.
' Input_file ##Mentioning Input_file name here.
@Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.
– RavinderSingh13
Nov 16 '18 at 18:52
1
Thanks for your explanation. Really helped to understand the code!
– Cae.rich
Nov 26 '18 at 0:00
add a comment |
Could you please try following once.
awk '
BEGIN
FS=OFS=","
FNR==1
head=$0
next
count[$6]
a[++val]=$0
END
for(i in count)
for(j=1;j<=val;j++)
num=split(a[j],array,",")
if(!header["output_file"i]++)
print (head) > "output_file"i
if(array[6]==i)
for(k=2;k<=(num-1);k++)
value=value?value OFS "*":"*"
if(value)
print (array[1],value,array[num]) > "output_file"i
else print (a[j]) > "output_file"i
value=""
' Input_file
Above code will create 2 output files named output_file1
and output_file2
you could change output file names as per your wish too.
Output files values will be as follows.
cat output_file1
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,9
cat output_file2
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,*,*,*,*,*,*,*,*,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,*,*,*,*,*,*,*,*,7
206004579,*,*,*,*,*,*,*,*,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9
Will add explanation too shortly.
EDIT: Adding explanation too above code now.
awk ' ##awk code starts from here.
BEGIN ##Starting BEGIN section for awk code here.
FS=OFS="," ##Setting FS and OFS values here as comma for all lines.
##Closing BEGIN block here.
FNR==1 ##Checking condition when first line is being read for Input_file.
head=$0 ##Creating a variable named head whose value is current line value.
next ##next will skip all further statements from here.
##Closing FNR==1 condition block here.
##Starting main block for awk code here.
count[$6] ##Creating an array named count whose index is $6(6th field of current line).
a[++val]=$0 ##Creating an array named a whose index is variable val value increment with 1 and value is $0.
##Closing main section of awk code here.
END ##Mentioning END block of this awk code here.
for(i in count) ##Starting a for loop to traverse through count array.
for(j=1;j<=val;j++) ##Starting for loop which runs from j value from 1 to value of val.
num=split(a[j],array,",") ##Spliting value of array a into array named array and its length stores in num variable.
if(!header["output_file"i]++) ##Checking if string output_file with i value is NOT in header array then do following.
print (head) > "output_file"i ##Printing header(taken from 1st line of Input_file) to output_file i(integer value).
##Closing for block for j<=val one here.
if(array[6]==i) ##Checking condition if array[6] value is equal to i then do following.
for(k=2;k<=(num-1);k++) ##Starting a for loop from k=2 to till value of (num-1) here.
value=value?value OFS "*":"*" ##Creating variable value which is having * in it and its value is concatenating.
##Closing for loop block here.
##Closing if condition block here.
if(value) ##Checking condition if variable value is NOT NULL then do following.
print (array[1],value,array[num]) > "output_file"i ##printing array[1],value,array[num] values to output_file i(interger) value output file.
##Closing block for if condition here.
else print (a[j]) > "output_file"i ##Going to else part of if condition printing value of a[j] to output_file i here.
value="" ##Nullifying variable value here.
##Closing block for for loop here.
' Input_file ##Mentioning Input_file name here.
@Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.
– RavinderSingh13
Nov 16 '18 at 18:52
1
Thanks for your explanation. Really helped to understand the code!
– Cae.rich
Nov 26 '18 at 0:00
add a comment |
Could you please try following once.
awk '
BEGIN
FS=OFS=","
FNR==1
head=$0
next
count[$6]
a[++val]=$0
END
for(i in count)
for(j=1;j<=val;j++)
num=split(a[j],array,",")
if(!header["output_file"i]++)
print (head) > "output_file"i
if(array[6]==i)
for(k=2;k<=(num-1);k++)
value=value?value OFS "*":"*"
if(value)
print (array[1],value,array[num]) > "output_file"i
else print (a[j]) > "output_file"i
value=""
' Input_file
Above code will create 2 output files named output_file1
and output_file2
you could change output file names as per your wish too.
Output files values will be as follows.
cat output_file1
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,9
cat output_file2
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,*,*,*,*,*,*,*,*,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,*,*,*,*,*,*,*,*,7
206004579,*,*,*,*,*,*,*,*,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9
Will add explanation too shortly.
EDIT: Adding explanation too above code now.
awk ' ##awk code starts from here.
BEGIN ##Starting BEGIN section for awk code here.
FS=OFS="," ##Setting FS and OFS values here as comma for all lines.
##Closing BEGIN block here.
FNR==1 ##Checking condition when first line is being read for Input_file.
head=$0 ##Creating a variable named head whose value is current line value.
next ##next will skip all further statements from here.
##Closing FNR==1 condition block here.
##Starting main block for awk code here.
count[$6] ##Creating an array named count whose index is $6(6th field of current line).
a[++val]=$0 ##Creating an array named a whose index is variable val value increment with 1 and value is $0.
##Closing main section of awk code here.
END ##Mentioning END block of this awk code here.
for(i in count) ##Starting a for loop to traverse through count array.
for(j=1;j<=val;j++) ##Starting for loop which runs from j value from 1 to value of val.
num=split(a[j],array,",") ##Spliting value of array a into array named array and its length stores in num variable.
if(!header["output_file"i]++) ##Checking if string output_file with i value is NOT in header array then do following.
print (head) > "output_file"i ##Printing header(taken from 1st line of Input_file) to output_file i(integer value).
##Closing for block for j<=val one here.
if(array[6]==i) ##Checking condition if array[6] value is equal to i then do following.
for(k=2;k<=(num-1);k++) ##Starting a for loop from k=2 to till value of (num-1) here.
value=value?value OFS "*":"*" ##Creating variable value which is having * in it and its value is concatenating.
##Closing for loop block here.
##Closing if condition block here.
if(value) ##Checking condition if variable value is NOT NULL then do following.
print (array[1],value,array[num]) > "output_file"i ##printing array[1],value,array[num] values to output_file i(interger) value output file.
##Closing block for if condition here.
else print (a[j]) > "output_file"i ##Going to else part of if condition printing value of a[j] to output_file i here.
value="" ##Nullifying variable value here.
##Closing block for for loop here.
' Input_file ##Mentioning Input_file name here.
Could you please try following once.
awk '
BEGIN
FS=OFS=","
FNR==1
head=$0
next
count[$6]
a[++val]=$0
END
for(i in count)
for(j=1;j<=val;j++)
num=split(a[j],array,",")
if(!header["output_file"i]++)
print (head) > "output_file"i
if(array[6]==i)
for(k=2;k<=(num-1);k++)
value=value?value OFS "*":"*"
if(value)
print (array[1],value,array[num]) > "output_file"i
else print (a[j]) > "output_file"i
value=""
' Input_file
Above code will create 2 output files named output_file1
and output_file2
you could change output file names as per your wish too.
Output files values will be as follows.
cat output_file1
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,9
cat output_file2
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,*,*,*,*,*,*,*,*,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,*,*,*,*,*,*,*,*,7
206004579,*,*,*,*,*,*,*,*,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9
Will add explanation too shortly.
EDIT: Adding explanation too above code now.
awk ' ##awk code starts from here.
BEGIN ##Starting BEGIN section for awk code here.
FS=OFS="," ##Setting FS and OFS values here as comma for all lines.
##Closing BEGIN block here.
FNR==1 ##Checking condition when first line is being read for Input_file.
head=$0 ##Creating a variable named head whose value is current line value.
next ##next will skip all further statements from here.
##Closing FNR==1 condition block here.
##Starting main block for awk code here.
count[$6] ##Creating an array named count whose index is $6(6th field of current line).
a[++val]=$0 ##Creating an array named a whose index is variable val value increment with 1 and value is $0.
##Closing main section of awk code here.
END ##Mentioning END block of this awk code here.
for(i in count) ##Starting a for loop to traverse through count array.
for(j=1;j<=val;j++) ##Starting for loop which runs from j value from 1 to value of val.
num=split(a[j],array,",") ##Spliting value of array a into array named array and its length stores in num variable.
if(!header["output_file"i]++) ##Checking if string output_file with i value is NOT in header array then do following.
print (head) > "output_file"i ##Printing header(taken from 1st line of Input_file) to output_file i(integer value).
##Closing for block for j<=val one here.
if(array[6]==i) ##Checking condition if array[6] value is equal to i then do following.
for(k=2;k<=(num-1);k++) ##Starting a for loop from k=2 to till value of (num-1) here.
value=value?value OFS "*":"*" ##Creating variable value which is having * in it and its value is concatenating.
##Closing for loop block here.
##Closing if condition block here.
if(value) ##Checking condition if variable value is NOT NULL then do following.
print (array[1],value,array[num]) > "output_file"i ##printing array[1],value,array[num] values to output_file i(interger) value output file.
##Closing block for if condition here.
else print (a[j]) > "output_file"i ##Going to else part of if condition printing value of a[j] to output_file i here.
value="" ##Nullifying variable value here.
##Closing block for for loop here.
' Input_file ##Mentioning Input_file name here.
edited Nov 15 '18 at 7:32
answered Nov 15 '18 at 3:34
RavinderSingh13RavinderSingh13
28k41638
28k41638
@Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.
– RavinderSingh13
Nov 16 '18 at 18:52
1
Thanks for your explanation. Really helped to understand the code!
– Cae.rich
Nov 26 '18 at 0:00
add a comment |
@Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.
– RavinderSingh13
Nov 16 '18 at 18:52
1
Thanks for your explanation. Really helped to understand the code!
– Cae.rich
Nov 26 '18 at 0:00
@Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.
– RavinderSingh13
Nov 16 '18 at 18:52
@Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.
– RavinderSingh13
Nov 16 '18 at 18:52
1
1
Thanks for your explanation. Really helped to understand the code!
– Cae.rich
Nov 26 '18 at 0:00
Thanks for your explanation. Really helped to understand the code!
– Cae.rich
Nov 26 '18 at 0:00
add a comment |
The problem is that you are mixing bash variables and awk variables. Hence the suggested duplicate. The awk
code you provide can also not perform the substitutions you mention.
This is how your original code should look like:
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "YBr") bfield=i;
else if ($i == "NationalCowID") cfield=i
($bfield == 1) for(i=cfield+1;i<NF;++i) $i="*"
1
You can now adapt to include the value that $bfield
should have:
awk -v bvalue="$j" '
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "Batch") bfield=i;
else if ($i == "NationalCowID") cfield=i
($bfield == bvalue) for(i=cfield+1;i<NF;++i) $i="*"
1
' file.csv
add a comment |
The problem is that you are mixing bash variables and awk variables. Hence the suggested duplicate. The awk
code you provide can also not perform the substitutions you mention.
This is how your original code should look like:
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "YBr") bfield=i;
else if ($i == "NationalCowID") cfield=i
($bfield == 1) for(i=cfield+1;i<NF;++i) $i="*"
1
You can now adapt to include the value that $bfield
should have:
awk -v bvalue="$j" '
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "Batch") bfield=i;
else if ($i == "NationalCowID") cfield=i
($bfield == bvalue) for(i=cfield+1;i<NF;++i) $i="*"
1
' file.csv
add a comment |
The problem is that you are mixing bash variables and awk variables. Hence the suggested duplicate. The awk
code you provide can also not perform the substitutions you mention.
This is how your original code should look like:
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "YBr") bfield=i;
else if ($i == "NationalCowID") cfield=i
($bfield == 1) for(i=cfield+1;i<NF;++i) $i="*"
1
You can now adapt to include the value that $bfield
should have:
awk -v bvalue="$j" '
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "Batch") bfield=i;
else if ($i == "NationalCowID") cfield=i
($bfield == bvalue) for(i=cfield+1;i<NF;++i) $i="*"
1
' file.csv
The problem is that you are mixing bash variables and awk variables. Hence the suggested duplicate. The awk
code you provide can also not perform the substitutions you mention.
This is how your original code should look like:
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "YBr") bfield=i;
else if ($i == "NationalCowID") cfield=i
($bfield == 1) for(i=cfield+1;i<NF;++i) $i="*"
1
You can now adapt to include the value that $bfield
should have:
awk -v bvalue="$j" '
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "Batch") bfield=i;
else if ($i == "NationalCowID") cfield=i
($bfield == bvalue) for(i=cfield+1;i<NF;++i) $i="*"
1
' file.csv
edited Nov 15 '18 at 12:24
answered Nov 15 '18 at 12:19
kvantourkvantour
8,87331330
8,87331330
add a comment |
add a comment |
Please post samples of input and expected output, as it is NOT clear here.
– RavinderSingh13
Nov 15 '18 at 1:53
@RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now
– Cae.rich
Nov 15 '18 at 2:20
@Cae.rich, still not clear,what is the logic lines are having
*
in them and what is the condition of getting output into new files? Please be clear in it.– RavinderSingh13
Nov 15 '18 at 2:52
In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13
– Cae.rich
Nov 15 '18 at 3:03
@Cae.rich, ok added solution now, try and let me know then?
– RavinderSingh13
Nov 15 '18 at 3:35