Create loop using awk BEGIN [duplicate]










0
















This question already has an answer here:



  • How do I use shell variables in an awk script?

    8 answers



I am trying to turn a awk BEGIN code into a loop. The original code was used to edit data based the value of the "Batch" column and output a file.



This is the original code (that works great):



awk '
BEGIN
FS=OFS=","

FNR==1
for(i=1;i<=NF;i++)
if($i=="YBr")
field=i

if($i=="NationalCowID")
value=i



$field==1
for(i=value+1;i<=NF;i++)
$i="*"


1
' obvs.csv > obvs1.csv


The above code takes the file obvs.csv that looks like this:



NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9


and edits the data to look like this (obvs1.csv):



NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,*,9


I would like to take this code and turn it into a loop so that a new file is created with the edits begin applied to each value of the "Batch" column(1-6).I've read some examples and the command explanation, but I don't fully understand what each part of the code does.
For example, how do I code $j compared to the already coded $i?
this is the loop I have tried to create:



for j in 1..6
do
awk '
BEGIN
FS=OFS=","

FNR==$j
for(i=1;i<=NF;i++)
if($i=="Batch")
field=i

if($i=="NationalCowID")
value=i



$field==1
for(i=value+1;i<=NF;i++)
$i="*"


$j
' obvs.csv > obvs$j.csv
done


In the end I am hoping to have 6 files as follows:



obvs1.csv -> only lines with batch = 1 are edited
obvs2.csv -> only lines with batch = 2 are edited
obvs3.csv-> only lines with batch = 3 are edited
obvs4.csv-> only lines with batch = 4 are edited
obvs5.csv-> only lines with batch = 5 are edited
obvs6.csv-> only lines with batch = 6 are edited


So the file name corresponds to the "Batch" being used as an indicator for which line to edit. i.e. for obvs2.csv, for data lines where Batch equals 2, all columns except for the first and last would be edited to * .
So far, I end up with 6 files that are named correctly but the edits within the file are not correct.
Any direction/ code explanation is greatly appreciated!










share|improve this question















marked as duplicate by jas, Inian, kvantour, Michael Dodd, Pearly Spencer Nov 15 '18 at 15:24


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


















  • Please post samples of input and expected output, as it is NOT clear here.

    – RavinderSingh13
    Nov 15 '18 at 1:53











  • @RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now

    – Cae.rich
    Nov 15 '18 at 2:20











  • @Cae.rich, still not clear,what is the logic lines are having * in them and what is the condition of getting output into new files? Please be clear in it.

    – RavinderSingh13
    Nov 15 '18 at 2:52











  • In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13

    – Cae.rich
    Nov 15 '18 at 3:03











  • @Cae.rich, ok added solution now, try and let me know then?

    – RavinderSingh13
    Nov 15 '18 at 3:35















0
















This question already has an answer here:



  • How do I use shell variables in an awk script?

    8 answers



I am trying to turn a awk BEGIN code into a loop. The original code was used to edit data based the value of the "Batch" column and output a file.



This is the original code (that works great):



awk '
BEGIN
FS=OFS=","

FNR==1
for(i=1;i<=NF;i++)
if($i=="YBr")
field=i

if($i=="NationalCowID")
value=i



$field==1
for(i=value+1;i<=NF;i++)
$i="*"


1
' obvs.csv > obvs1.csv


The above code takes the file obvs.csv that looks like this:



NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9


and edits the data to look like this (obvs1.csv):



NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,*,9


I would like to take this code and turn it into a loop so that a new file is created with the edits begin applied to each value of the "Batch" column(1-6).I've read some examples and the command explanation, but I don't fully understand what each part of the code does.
For example, how do I code $j compared to the already coded $i?
this is the loop I have tried to create:



for j in 1..6
do
awk '
BEGIN
FS=OFS=","

FNR==$j
for(i=1;i<=NF;i++)
if($i=="Batch")
field=i

if($i=="NationalCowID")
value=i



$field==1
for(i=value+1;i<=NF;i++)
$i="*"


$j
' obvs.csv > obvs$j.csv
done


In the end I am hoping to have 6 files as follows:



obvs1.csv -> only lines with batch = 1 are edited
obvs2.csv -> only lines with batch = 2 are edited
obvs3.csv-> only lines with batch = 3 are edited
obvs4.csv-> only lines with batch = 4 are edited
obvs5.csv-> only lines with batch = 5 are edited
obvs6.csv-> only lines with batch = 6 are edited


So the file name corresponds to the "Batch" being used as an indicator for which line to edit. i.e. for obvs2.csv, for data lines where Batch equals 2, all columns except for the first and last would be edited to * .
So far, I end up with 6 files that are named correctly but the edits within the file are not correct.
Any direction/ code explanation is greatly appreciated!










share|improve this question















marked as duplicate by jas, Inian, kvantour, Michael Dodd, Pearly Spencer Nov 15 '18 at 15:24


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


















  • Please post samples of input and expected output, as it is NOT clear here.

    – RavinderSingh13
    Nov 15 '18 at 1:53











  • @RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now

    – Cae.rich
    Nov 15 '18 at 2:20











  • @Cae.rich, still not clear,what is the logic lines are having * in them and what is the condition of getting output into new files? Please be clear in it.

    – RavinderSingh13
    Nov 15 '18 at 2:52











  • In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13

    – Cae.rich
    Nov 15 '18 at 3:03











  • @Cae.rich, ok added solution now, try and let me know then?

    – RavinderSingh13
    Nov 15 '18 at 3:35













0












0








0









This question already has an answer here:



  • How do I use shell variables in an awk script?

    8 answers



I am trying to turn a awk BEGIN code into a loop. The original code was used to edit data based the value of the "Batch" column and output a file.



This is the original code (that works great):



awk '
BEGIN
FS=OFS=","

FNR==1
for(i=1;i<=NF;i++)
if($i=="YBr")
field=i

if($i=="NationalCowID")
value=i



$field==1
for(i=value+1;i<=NF;i++)
$i="*"


1
' obvs.csv > obvs1.csv


The above code takes the file obvs.csv that looks like this:



NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9


and edits the data to look like this (obvs1.csv):



NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,*,9


I would like to take this code and turn it into a loop so that a new file is created with the edits begin applied to each value of the "Batch" column(1-6).I've read some examples and the command explanation, but I don't fully understand what each part of the code does.
For example, how do I code $j compared to the already coded $i?
this is the loop I have tried to create:



for j in 1..6
do
awk '
BEGIN
FS=OFS=","

FNR==$j
for(i=1;i<=NF;i++)
if($i=="Batch")
field=i

if($i=="NationalCowID")
value=i



$field==1
for(i=value+1;i<=NF;i++)
$i="*"


$j
' obvs.csv > obvs$j.csv
done


In the end I am hoping to have 6 files as follows:



obvs1.csv -> only lines with batch = 1 are edited
obvs2.csv -> only lines with batch = 2 are edited
obvs3.csv-> only lines with batch = 3 are edited
obvs4.csv-> only lines with batch = 4 are edited
obvs5.csv-> only lines with batch = 5 are edited
obvs6.csv-> only lines with batch = 6 are edited


So the file name corresponds to the "Batch" being used as an indicator for which line to edit. i.e. for obvs2.csv, for data lines where Batch equals 2, all columns except for the first and last would be edited to * .
So far, I end up with 6 files that are named correctly but the edits within the file are not correct.
Any direction/ code explanation is greatly appreciated!










share|improve this question

















This question already has an answer here:



  • How do I use shell variables in an awk script?

    8 answers



I am trying to turn a awk BEGIN code into a loop. The original code was used to edit data based the value of the "Batch" column and output a file.



This is the original code (that works great):



awk '
BEGIN
FS=OFS=","

FNR==1
for(i=1;i<=NF;i++)
if($i=="YBr")
field=i

if($i=="NationalCowID")
value=i



$field==1
for(i=value+1;i<=NF;i++)
$i="*"


1
' obvs.csv > obvs1.csv


The above code takes the file obvs.csv that looks like this:



NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9


and edits the data to look like this (obvs1.csv):



NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,*,9


I would like to take this code and turn it into a loop so that a new file is created with the edits begin applied to each value of the "Batch" column(1-6).I've read some examples and the command explanation, but I don't fully understand what each part of the code does.
For example, how do I code $j compared to the already coded $i?
this is the loop I have tried to create:



for j in 1..6
do
awk '
BEGIN
FS=OFS=","

FNR==$j
for(i=1;i<=NF;i++)
if($i=="Batch")
field=i

if($i=="NationalCowID")
value=i



$field==1
for(i=value+1;i<=NF;i++)
$i="*"


$j
' obvs.csv > obvs$j.csv
done


In the end I am hoping to have 6 files as follows:



obvs1.csv -> only lines with batch = 1 are edited
obvs2.csv -> only lines with batch = 2 are edited
obvs3.csv-> only lines with batch = 3 are edited
obvs4.csv-> only lines with batch = 4 are edited
obvs5.csv-> only lines with batch = 5 are edited
obvs6.csv-> only lines with batch = 6 are edited


So the file name corresponds to the "Batch" being used as an indicator for which line to edit. i.e. for obvs2.csv, for data lines where Batch equals 2, all columns except for the first and last would be edited to * .
So far, I end up with 6 files that are named correctly but the edits within the file are not correct.
Any direction/ code explanation is greatly appreciated!





This question already has an answer here:



  • How do I use shell variables in an awk script?

    8 answers







arrays loops awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 2:52







Cae.rich

















asked Nov 14 '18 at 23:56









Cae.richCae.rich

204




204




marked as duplicate by jas, Inian, kvantour, Michael Dodd, Pearly Spencer Nov 15 '18 at 15:24


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by jas, Inian, kvantour, Michael Dodd, Pearly Spencer Nov 15 '18 at 15:24


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.














  • Please post samples of input and expected output, as it is NOT clear here.

    – RavinderSingh13
    Nov 15 '18 at 1:53











  • @RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now

    – Cae.rich
    Nov 15 '18 at 2:20











  • @Cae.rich, still not clear,what is the logic lines are having * in them and what is the condition of getting output into new files? Please be clear in it.

    – RavinderSingh13
    Nov 15 '18 at 2:52











  • In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13

    – Cae.rich
    Nov 15 '18 at 3:03











  • @Cae.rich, ok added solution now, try and let me know then?

    – RavinderSingh13
    Nov 15 '18 at 3:35

















  • Please post samples of input and expected output, as it is NOT clear here.

    – RavinderSingh13
    Nov 15 '18 at 1:53











  • @RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now

    – Cae.rich
    Nov 15 '18 at 2:20











  • @Cae.rich, still not clear,what is the logic lines are having * in them and what is the condition of getting output into new files? Please be clear in it.

    – RavinderSingh13
    Nov 15 '18 at 2:52











  • In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13

    – Cae.rich
    Nov 15 '18 at 3:03











  • @Cae.rich, ok added solution now, try and let me know then?

    – RavinderSingh13
    Nov 15 '18 at 3:35
















Please post samples of input and expected output, as it is NOT clear here.

– RavinderSingh13
Nov 15 '18 at 1:53





Please post samples of input and expected output, as it is NOT clear here.

– RavinderSingh13
Nov 15 '18 at 1:53













@RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now

– Cae.rich
Nov 15 '18 at 2:20





@RavinderSingh13 I have made edits to the questions. Hopefully what I'm try to achieve is more clear now

– Cae.rich
Nov 15 '18 at 2:20













@Cae.rich, still not clear,what is the logic lines are having * in them and what is the condition of getting output into new files? Please be clear in it.

– RavinderSingh13
Nov 15 '18 at 2:52





@Cae.rich, still not clear,what is the logic lines are having * in them and what is the condition of getting output into new files? Please be clear in it.

– RavinderSingh13
Nov 15 '18 at 2:52













In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13

– Cae.rich
Nov 15 '18 at 3:03





In the obvs.csv file, I would like to find every line where $3 == 1, then change $2 - $9 to * , then print entire file into a new file named obvs1.csv. I would like to create a loop that does this for $3 == 2, then $3 == 3, then $3 == 4 …. until $3 == 6. so in the end I will have a total of 6 file that are created when I run the loop. Dose that help at all to make things more clear? Sorry Im not the best at using proper coding terminology yet @RavinderSingh13

– Cae.rich
Nov 15 '18 at 3:03













@Cae.rich, ok added solution now, try and let me know then?

– RavinderSingh13
Nov 15 '18 at 3:35





@Cae.rich, ok added solution now, try and let me know then?

– RavinderSingh13
Nov 15 '18 at 3:35












2 Answers
2






active

oldest

votes


















1














Could you please try following once.



awk '
BEGIN
FS=OFS=","

FNR==1
head=$0
next


count[$6]
a[++val]=$0

END
for(i in count)
for(j=1;j<=val;j++)
num=split(a[j],array,",")
if(!header["output_file"i]++)
print (head) > "output_file"i

if(array[6]==i)
for(k=2;k<=(num-1);k++)
value=value?value OFS "*":"*"


if(value)
print (array[1],value,array[num]) > "output_file"i

else print (a[j]) > "output_file"i
value=""


' Input_file


Above code will create 2 output files named output_file1 and output_file2 you could change output file names as per your wish too.



Output files values will be as follows.



cat output_file1
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,*,*,*,*,*,*,*,*,1
206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
206004575,*,*,*,*,*,*,*,*,6
206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
206004571,*,*,*,*,*,*,*,*,9

cat output_file2
NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
206004573,*,*,*,*,*,*,*,*,4
206004575,20141207,6,2,91,1,2014,380.94688,NA,6
206004576,*,*,*,*,*,*,*,*,7
206004579,*,*,*,*,*,*,*,*,8
206004571,20141207,6,2,91,1,2014,380.94688,NA,9


Will add explanation too shortly.



EDIT: Adding explanation too above code now.



awk ' ##awk code starts from here.
BEGIN ##Starting BEGIN section for awk code here.
FS=OFS="," ##Setting FS and OFS values here as comma for all lines.
##Closing BEGIN block here.
FNR==1 ##Checking condition when first line is being read for Input_file.
head=$0 ##Creating a variable named head whose value is current line value.
next ##next will skip all further statements from here.
##Closing FNR==1 condition block here.
##Starting main block for awk code here.
count[$6] ##Creating an array named count whose index is $6(6th field of current line).
a[++val]=$0 ##Creating an array named a whose index is variable val value increment with 1 and value is $0.
##Closing main section of awk code here.
END ##Mentioning END block of this awk code here.
for(i in count) ##Starting a for loop to traverse through count array.
for(j=1;j<=val;j++) ##Starting for loop which runs from j value from 1 to value of val.
num=split(a[j],array,",") ##Spliting value of array a into array named array and its length stores in num variable.
if(!header["output_file"i]++) ##Checking if string output_file with i value is NOT in header array then do following.
print (head) > "output_file"i ##Printing header(taken from 1st line of Input_file) to output_file i(integer value).
##Closing for block for j<=val one here.
if(array[6]==i) ##Checking condition if array[6] value is equal to i then do following.
for(k=2;k<=(num-1);k++) ##Starting a for loop from k=2 to till value of (num-1) here.
value=value?value OFS "*":"*" ##Creating variable value which is having * in it and its value is concatenating.
##Closing for loop block here.
##Closing if condition block here.
if(value) ##Checking condition if variable value is NOT NULL then do following.
print (array[1],value,array[num]) > "output_file"i ##printing array[1],value,array[num] values to output_file i(interger) value output file.
##Closing block for if condition here.
else print (a[j]) > "output_file"i ##Going to else part of if condition printing value of a[j] to output_file i here.
value="" ##Nullifying variable value here.
##Closing block for for loop here.

' Input_file ##Mentioning Input_file name here.





share|improve this answer

























  • @Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.

    – RavinderSingh13
    Nov 16 '18 at 18:52






  • 1





    Thanks for your explanation. Really helped to understand the code!

    – Cae.rich
    Nov 26 '18 at 0:00


















0














The problem is that you are mixing bash variables and awk variables. Hence the suggested duplicate. The awk code you provide can also not perform the substitutions you mention.



This is how your original code should look like:



BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "YBr") bfield=i;
else if ($i == "NationalCowID") cfield=i


($bfield == 1) for(i=cfield+1;i<NF;++i) $i="*"
1


You can now adapt to include the value that $bfield should have:



awk -v bvalue="$j" '
BEGINFS=OFS=","
(FNR==1)
for (i=1;i<=NF;++i)
if ($i == "Batch") bfield=i;
else if ($i == "NationalCowID") cfield=i


($bfield == bvalue) for(i=cfield+1;i<NF;++i) $i="*"
1
' file.csv





share|improve this answer































    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Could you please try following once.



    awk '
    BEGIN
    FS=OFS=","

    FNR==1
    head=$0
    next


    count[$6]
    a[++val]=$0

    END
    for(i in count)
    for(j=1;j<=val;j++)
    num=split(a[j],array,",")
    if(!header["output_file"i]++)
    print (head) > "output_file"i

    if(array[6]==i)
    for(k=2;k<=(num-1);k++)
    value=value?value OFS "*":"*"


    if(value)
    print (array[1],value,array[num]) > "output_file"i

    else print (a[j]) > "output_file"i
    value=""


    ' Input_file


    Above code will create 2 output files named output_file1 and output_file2 you could change output file names as per your wish too.



    Output files values will be as follows.



    cat output_file1
    NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
    206004574,*,*,*,*,*,*,*,*,1
    206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
    206004575,*,*,*,*,*,*,*,*,6
    206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
    206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
    206004571,*,*,*,*,*,*,*,*,9

    cat output_file2
    NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
    206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
    206004573,*,*,*,*,*,*,*,*,4
    206004575,20141207,6,2,91,1,2014,380.94688,NA,6
    206004576,*,*,*,*,*,*,*,*,7
    206004579,*,*,*,*,*,*,*,*,8
    206004571,20141207,6,2,91,1,2014,380.94688,NA,9


    Will add explanation too shortly.



    EDIT: Adding explanation too above code now.



    awk ' ##awk code starts from here.
    BEGIN ##Starting BEGIN section for awk code here.
    FS=OFS="," ##Setting FS and OFS values here as comma for all lines.
    ##Closing BEGIN block here.
    FNR==1 ##Checking condition when first line is being read for Input_file.
    head=$0 ##Creating a variable named head whose value is current line value.
    next ##next will skip all further statements from here.
    ##Closing FNR==1 condition block here.
    ##Starting main block for awk code here.
    count[$6] ##Creating an array named count whose index is $6(6th field of current line).
    a[++val]=$0 ##Creating an array named a whose index is variable val value increment with 1 and value is $0.
    ##Closing main section of awk code here.
    END ##Mentioning END block of this awk code here.
    for(i in count) ##Starting a for loop to traverse through count array.
    for(j=1;j<=val;j++) ##Starting for loop which runs from j value from 1 to value of val.
    num=split(a[j],array,",") ##Spliting value of array a into array named array and its length stores in num variable.
    if(!header["output_file"i]++) ##Checking if string output_file with i value is NOT in header array then do following.
    print (head) > "output_file"i ##Printing header(taken from 1st line of Input_file) to output_file i(integer value).
    ##Closing for block for j<=val one here.
    if(array[6]==i) ##Checking condition if array[6] value is equal to i then do following.
    for(k=2;k<=(num-1);k++) ##Starting a for loop from k=2 to till value of (num-1) here.
    value=value?value OFS "*":"*" ##Creating variable value which is having * in it and its value is concatenating.
    ##Closing for loop block here.
    ##Closing if condition block here.
    if(value) ##Checking condition if variable value is NOT NULL then do following.
    print (array[1],value,array[num]) > "output_file"i ##printing array[1],value,array[num] values to output_file i(interger) value output file.
    ##Closing block for if condition here.
    else print (a[j]) > "output_file"i ##Going to else part of if condition printing value of a[j] to output_file i here.
    value="" ##Nullifying variable value here.
    ##Closing block for for loop here.

    ' Input_file ##Mentioning Input_file name here.





    share|improve this answer

























    • @Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.

      – RavinderSingh13
      Nov 16 '18 at 18:52






    • 1





      Thanks for your explanation. Really helped to understand the code!

      – Cae.rich
      Nov 26 '18 at 0:00















    1














    Could you please try following once.



    awk '
    BEGIN
    FS=OFS=","

    FNR==1
    head=$0
    next


    count[$6]
    a[++val]=$0

    END
    for(i in count)
    for(j=1;j<=val;j++)
    num=split(a[j],array,",")
    if(!header["output_file"i]++)
    print (head) > "output_file"i

    if(array[6]==i)
    for(k=2;k<=(num-1);k++)
    value=value?value OFS "*":"*"


    if(value)
    print (array[1],value,array[num]) > "output_file"i

    else print (a[j]) > "output_file"i
    value=""


    ' Input_file


    Above code will create 2 output files named output_file1 and output_file2 you could change output file names as per your wish too.



    Output files values will be as follows.



    cat output_file1
    NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
    206004574,*,*,*,*,*,*,*,*,1
    206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
    206004575,*,*,*,*,*,*,*,*,6
    206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
    206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
    206004571,*,*,*,*,*,*,*,*,9

    cat output_file2
    NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
    206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
    206004573,*,*,*,*,*,*,*,*,4
    206004575,20141207,6,2,91,1,2014,380.94688,NA,6
    206004576,*,*,*,*,*,*,*,*,7
    206004579,*,*,*,*,*,*,*,*,8
    206004571,20141207,6,2,91,1,2014,380.94688,NA,9


    Will add explanation too shortly.



    EDIT: Adding explanation too above code now.



    awk ' ##awk code starts from here.
    BEGIN ##Starting BEGIN section for awk code here.
    FS=OFS="," ##Setting FS and OFS values here as comma for all lines.
    ##Closing BEGIN block here.
    FNR==1 ##Checking condition when first line is being read for Input_file.
    head=$0 ##Creating a variable named head whose value is current line value.
    next ##next will skip all further statements from here.
    ##Closing FNR==1 condition block here.
    ##Starting main block for awk code here.
    count[$6] ##Creating an array named count whose index is $6(6th field of current line).
    a[++val]=$0 ##Creating an array named a whose index is variable val value increment with 1 and value is $0.
    ##Closing main section of awk code here.
    END ##Mentioning END block of this awk code here.
    for(i in count) ##Starting a for loop to traverse through count array.
    for(j=1;j<=val;j++) ##Starting for loop which runs from j value from 1 to value of val.
    num=split(a[j],array,",") ##Spliting value of array a into array named array and its length stores in num variable.
    if(!header["output_file"i]++) ##Checking if string output_file with i value is NOT in header array then do following.
    print (head) > "output_file"i ##Printing header(taken from 1st line of Input_file) to output_file i(integer value).
    ##Closing for block for j<=val one here.
    if(array[6]==i) ##Checking condition if array[6] value is equal to i then do following.
    for(k=2;k<=(num-1);k++) ##Starting a for loop from k=2 to till value of (num-1) here.
    value=value?value OFS "*":"*" ##Creating variable value which is having * in it and its value is concatenating.
    ##Closing for loop block here.
    ##Closing if condition block here.
    if(value) ##Checking condition if variable value is NOT NULL then do following.
    print (array[1],value,array[num]) > "output_file"i ##printing array[1],value,array[num] values to output_file i(interger) value output file.
    ##Closing block for if condition here.
    else print (a[j]) > "output_file"i ##Going to else part of if condition printing value of a[j] to output_file i here.
    value="" ##Nullifying variable value here.
    ##Closing block for for loop here.

    ' Input_file ##Mentioning Input_file name here.





    share|improve this answer

























    • @Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.

      – RavinderSingh13
      Nov 16 '18 at 18:52






    • 1





      Thanks for your explanation. Really helped to understand the code!

      – Cae.rich
      Nov 26 '18 at 0:00













    1












    1








    1







    Could you please try following once.



    awk '
    BEGIN
    FS=OFS=","

    FNR==1
    head=$0
    next


    count[$6]
    a[++val]=$0

    END
    for(i in count)
    for(j=1;j<=val;j++)
    num=split(a[j],array,",")
    if(!header["output_file"i]++)
    print (head) > "output_file"i

    if(array[6]==i)
    for(k=2;k<=(num-1);k++)
    value=value?value OFS "*":"*"


    if(value)
    print (array[1],value,array[num]) > "output_file"i

    else print (a[j]) > "output_file"i
    value=""


    ' Input_file


    Above code will create 2 output files named output_file1 and output_file2 you could change output file names as per your wish too.



    Output files values will be as follows.



    cat output_file1
    NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
    206004574,*,*,*,*,*,*,*,*,1
    206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
    206004575,*,*,*,*,*,*,*,*,6
    206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
    206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
    206004571,*,*,*,*,*,*,*,*,9

    cat output_file2
    NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
    206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
    206004573,*,*,*,*,*,*,*,*,4
    206004575,20141207,6,2,91,1,2014,380.94688,NA,6
    206004576,*,*,*,*,*,*,*,*,7
    206004579,*,*,*,*,*,*,*,*,8
    206004571,20141207,6,2,91,1,2014,380.94688,NA,9


    Will add explanation too shortly.



    EDIT: Adding explanation too above code now.



    awk ' ##awk code starts from here.
    BEGIN ##Starting BEGIN section for awk code here.
    FS=OFS="," ##Setting FS and OFS values here as comma for all lines.
    ##Closing BEGIN block here.
    FNR==1 ##Checking condition when first line is being read for Input_file.
    head=$0 ##Creating a variable named head whose value is current line value.
    next ##next will skip all further statements from here.
    ##Closing FNR==1 condition block here.
    ##Starting main block for awk code here.
    count[$6] ##Creating an array named count whose index is $6(6th field of current line).
    a[++val]=$0 ##Creating an array named a whose index is variable val value increment with 1 and value is $0.
    ##Closing main section of awk code here.
    END ##Mentioning END block of this awk code here.
    for(i in count) ##Starting a for loop to traverse through count array.
    for(j=1;j<=val;j++) ##Starting for loop which runs from j value from 1 to value of val.
    num=split(a[j],array,",") ##Spliting value of array a into array named array and its length stores in num variable.
    if(!header["output_file"i]++) ##Checking if string output_file with i value is NOT in header array then do following.
    print (head) > "output_file"i ##Printing header(taken from 1st line of Input_file) to output_file i(integer value).
    ##Closing for block for j<=val one here.
    if(array[6]==i) ##Checking condition if array[6] value is equal to i then do following.
    for(k=2;k<=(num-1);k++) ##Starting a for loop from k=2 to till value of (num-1) here.
    value=value?value OFS "*":"*" ##Creating variable value which is having * in it and its value is concatenating.
    ##Closing for loop block here.
    ##Closing if condition block here.
    if(value) ##Checking condition if variable value is NOT NULL then do following.
    print (array[1],value,array[num]) > "output_file"i ##printing array[1],value,array[num] values to output_file i(interger) value output file.
    ##Closing block for if condition here.
    else print (a[j]) > "output_file"i ##Going to else part of if condition printing value of a[j] to output_file i here.
    value="" ##Nullifying variable value here.
    ##Closing block for for loop here.

    ' Input_file ##Mentioning Input_file name here.





    share|improve this answer















    Could you please try following once.



    awk '
    BEGIN
    FS=OFS=","

    FNR==1
    head=$0
    next


    count[$6]
    a[++val]=$0

    END
    for(i in count)
    for(j=1;j<=val;j++)
    num=split(a[j],array,",")
    if(!header["output_file"i]++)
    print (head) > "output_file"i

    if(array[6]==i)
    for(k=2;k<=(num-1);k++)
    value=value?value OFS "*":"*"


    if(value)
    print (array[1],value,array[num]) > "output_file"i

    else print (a[j]) > "output_file"i
    value=""


    ' Input_file


    Above code will create 2 output files named output_file1 and output_file2 you could change output file names as per your wish too.



    Output files values will be as follows.



    cat output_file1
    NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
    206004574,*,*,*,*,*,*,*,*,1
    206004573,20141209,6,2,93,2,2014,436.4504712,NA,4
    206004575,*,*,*,*,*,*,*,*,6
    206004576,20141208,6,2,92,2,2014,424.4410055,NA,7
    206004579,20141209,6,2,93,2,2014,436.4504712,NA,8
    206004571,*,*,*,*,*,*,*,*,9

    cat output_file2
    NationalCowID,TestDate,Batch,LN,DIM,YBr,year,CH4,PLS,qtl
    206004574,20141208,6,2,92,1,2014,424.4410055,NA,1
    206004573,*,*,*,*,*,*,*,*,4
    206004575,20141207,6,2,91,1,2014,380.94688,NA,6
    206004576,*,*,*,*,*,*,*,*,7
    206004579,*,*,*,*,*,*,*,*,8
    206004571,20141207,6,2,91,1,2014,380.94688,NA,9


    Will add explanation too shortly.



    EDIT: Adding explanation too above code now.



    awk ' ##awk code starts from here.
    BEGIN ##Starting BEGIN section for awk code here.
    FS=OFS="," ##Setting FS and OFS values here as comma for all lines.
    ##Closing BEGIN block here.
    FNR==1 ##Checking condition when first line is being read for Input_file.
    head=$0 ##Creating a variable named head whose value is current line value.
    next ##next will skip all further statements from here.
    ##Closing FNR==1 condition block here.
    ##Starting main block for awk code here.
    count[$6] ##Creating an array named count whose index is $6(6th field of current line).
    a[++val]=$0 ##Creating an array named a whose index is variable val value increment with 1 and value is $0.
    ##Closing main section of awk code here.
    END ##Mentioning END block of this awk code here.
    for(i in count) ##Starting a for loop to traverse through count array.
    for(j=1;j<=val;j++) ##Starting for loop which runs from j value from 1 to value of val.
    num=split(a[j],array,",") ##Spliting value of array a into array named array and its length stores in num variable.
    if(!header["output_file"i]++) ##Checking if string output_file with i value is NOT in header array then do following.
    print (head) > "output_file"i ##Printing header(taken from 1st line of Input_file) to output_file i(integer value).
    ##Closing for block for j<=val one here.
    if(array[6]==i) ##Checking condition if array[6] value is equal to i then do following.
    for(k=2;k<=(num-1);k++) ##Starting a for loop from k=2 to till value of (num-1) here.
    value=value?value OFS "*":"*" ##Creating variable value which is having * in it and its value is concatenating.
    ##Closing for loop block here.
    ##Closing if condition block here.
    if(value) ##Checking condition if variable value is NOT NULL then do following.
    print (array[1],value,array[num]) > "output_file"i ##printing array[1],value,array[num] values to output_file i(interger) value output file.
    ##Closing block for if condition here.
    else print (a[j]) > "output_file"i ##Going to else part of if condition printing value of a[j] to output_file i here.
    value="" ##Nullifying variable value here.
    ##Closing block for for loop here.

    ' Input_file ##Mentioning Input_file name here.






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 15 '18 at 7:32

























    answered Nov 15 '18 at 3:34









    RavinderSingh13RavinderSingh13

    28k41638




    28k41638












    • @Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.

      – RavinderSingh13
      Nov 16 '18 at 18:52






    • 1





      Thanks for your explanation. Really helped to understand the code!

      – Cae.rich
      Nov 26 '18 at 0:00

















    • @Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.

      – RavinderSingh13
      Nov 16 '18 at 18:52






    • 1





      Thanks for your explanation. Really helped to understand the code!

      – Cae.rich
      Nov 26 '18 at 0:00
















    @Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.

    – RavinderSingh13
    Nov 16 '18 at 18:52





    @Cae.rich, try to encourage people by up-voting helpful and good answers and always try to select an answer as correct one out of many to close the thread completely.

    – RavinderSingh13
    Nov 16 '18 at 18:52




    1




    1





    Thanks for your explanation. Really helped to understand the code!

    – Cae.rich
    Nov 26 '18 at 0:00





    Thanks for your explanation. Really helped to understand the code!

    – Cae.rich
    Nov 26 '18 at 0:00













    0














    The problem is that you are mixing bash variables and awk variables. Hence the suggested duplicate. The awk code you provide can also not perform the substitutions you mention.



    This is how your original code should look like:



    BEGINFS=OFS=","
    (FNR==1)
    for (i=1;i<=NF;++i)
    if ($i == "YBr") bfield=i;
    else if ($i == "NationalCowID") cfield=i


    ($bfield == 1) for(i=cfield+1;i<NF;++i) $i="*"
    1


    You can now adapt to include the value that $bfield should have:



    awk -v bvalue="$j" '
    BEGINFS=OFS=","
    (FNR==1)
    for (i=1;i<=NF;++i)
    if ($i == "Batch") bfield=i;
    else if ($i == "NationalCowID") cfield=i


    ($bfield == bvalue) for(i=cfield+1;i<NF;++i) $i="*"
    1
    ' file.csv





    share|improve this answer





























      0














      The problem is that you are mixing bash variables and awk variables. Hence the suggested duplicate. The awk code you provide can also not perform the substitutions you mention.



      This is how your original code should look like:



      BEGINFS=OFS=","
      (FNR==1)
      for (i=1;i<=NF;++i)
      if ($i == "YBr") bfield=i;
      else if ($i == "NationalCowID") cfield=i


      ($bfield == 1) for(i=cfield+1;i<NF;++i) $i="*"
      1


      You can now adapt to include the value that $bfield should have:



      awk -v bvalue="$j" '
      BEGINFS=OFS=","
      (FNR==1)
      for (i=1;i<=NF;++i)
      if ($i == "Batch") bfield=i;
      else if ($i == "NationalCowID") cfield=i


      ($bfield == bvalue) for(i=cfield+1;i<NF;++i) $i="*"
      1
      ' file.csv





      share|improve this answer



























        0












        0








        0







        The problem is that you are mixing bash variables and awk variables. Hence the suggested duplicate. The awk code you provide can also not perform the substitutions you mention.



        This is how your original code should look like:



        BEGINFS=OFS=","
        (FNR==1)
        for (i=1;i<=NF;++i)
        if ($i == "YBr") bfield=i;
        else if ($i == "NationalCowID") cfield=i


        ($bfield == 1) for(i=cfield+1;i<NF;++i) $i="*"
        1


        You can now adapt to include the value that $bfield should have:



        awk -v bvalue="$j" '
        BEGINFS=OFS=","
        (FNR==1)
        for (i=1;i<=NF;++i)
        if ($i == "Batch") bfield=i;
        else if ($i == "NationalCowID") cfield=i


        ($bfield == bvalue) for(i=cfield+1;i<NF;++i) $i="*"
        1
        ' file.csv





        share|improve this answer















        The problem is that you are mixing bash variables and awk variables. Hence the suggested duplicate. The awk code you provide can also not perform the substitutions you mention.



        This is how your original code should look like:



        BEGINFS=OFS=","
        (FNR==1)
        for (i=1;i<=NF;++i)
        if ($i == "YBr") bfield=i;
        else if ($i == "NationalCowID") cfield=i


        ($bfield == 1) for(i=cfield+1;i<NF;++i) $i="*"
        1


        You can now adapt to include the value that $bfield should have:



        awk -v bvalue="$j" '
        BEGINFS=OFS=","
        (FNR==1)
        for (i=1;i<=NF;++i)
        if ($i == "Batch") bfield=i;
        else if ($i == "NationalCowID") cfield=i


        ($bfield == bvalue) for(i=cfield+1;i<NF;++i) $i="*"
        1
        ' file.csv






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 15 '18 at 12:24

























        answered Nov 15 '18 at 12:19









        kvantourkvantour

        8,87331330




        8,87331330













            Popular posts from this blog

            Top Tejano songwriter Luis Silva dead of heart attack at 64

            政党

            天津地下鉄3号線