Merging data based on matching first column in Python

I currently have two sets of data files that look like this:

File 1:

test1 ba ab cd dh gf
test2 fa ab cd dh gf
test3 rt ty er wq ee
test4 er rt sf sd sa

and in file 2:

test1 123 344 123
test1 234 567 787
test1 221 344 566
test3 456 121 677

I would like to combine the files based on mathching rows in the first column (so that "tests" match up)

like so:

test1 ba ab cd dh gf 123 344 123
test1 ba ab cd dh gf 234 567 787
test1 ba ab cd dh gf 221 344 566
test3 rt ty er wq ee 456 121 677

I have this Code

def combineFiles(file1,file2,outfile):
 def read_file(file):
 data = 
 for line in csv.reader(file):
 data[line[0]] = line[1:]
 return data
 with open(file1, 'r') as f1, open(file2, 'r') as f2:
 data1 = read_file(f1)
 data2 = read_file(f2)
 with open(outfile, 'w') as out:
 wtr= csv.writer(out)
 for key in data1.keys():
 try:
 wtr.writerow(((key), ','.join(data1[key]), ','.join(data2[key])))
 except KeyError:
 pass

However the output ends up looking like this:

test1 ba ab cd dh gf 123 344 123
test3 er rt sf sd sa 456 121 677

Can anyone help me with how to make the output so that test1 can be printed all three times?

Much Appreciated

asked Nov 14 '18 at 2:41

Rk_23

10117

add a comment |

I currently have two sets of data files that look like this:

File 1:

test1 ba ab cd dh gf
test2 fa ab cd dh gf
test3 rt ty er wq ee
test4 er rt sf sd sa

and in file 2:

test1 123 344 123
test1 234 567 787
test1 221 344 566
test3 456 121 677

I would like to combine the files based on mathching rows in the first column (so that "tests" match up)

like so:

test1 ba ab cd dh gf 123 344 123
test1 ba ab cd dh gf 234 567 787
test1 ba ab cd dh gf 221 344 566
test3 rt ty er wq ee 456 121 677

I have this Code

def combineFiles(file1,file2,outfile):
 def read_file(file):
 data = 
 for line in csv.reader(file):
 data[line[0]] = line[1:]
 return data
 with open(file1, 'r') as f1, open(file2, 'r') as f2:
 data1 = read_file(f1)
 data2 = read_file(f2)
 with open(outfile, 'w') as out:
 wtr= csv.writer(out)
 for key in data1.keys():
 try:
 wtr.writerow(((key), ','.join(data1[key]), ','.join(data2[key])))
 except KeyError:
 pass

However the output ends up looking like this:

test1 ba ab cd dh gf 123 344 123
test3 er rt sf sd sa 456 121 677

Can anyone help me with how to make the output so that test1 can be printed all three times?

Much Appreciated

asked Nov 14 '18 at 2:41

Rk_23

10117

add a comment |

I currently have two sets of data files that look like this:

File 1:

test1 ba ab cd dh gf
test2 fa ab cd dh gf
test3 rt ty er wq ee
test4 er rt sf sd sa

and in file 2:

test1 123 344 123
test1 234 567 787
test1 221 344 566
test3 456 121 677

I would like to combine the files based on mathching rows in the first column (so that "tests" match up)

like so:

test1 ba ab cd dh gf 123 344 123
test1 ba ab cd dh gf 234 567 787
test1 ba ab cd dh gf 221 344 566
test3 rt ty er wq ee 456 121 677

I have this Code

def combineFiles(file1,file2,outfile):
 def read_file(file):
 data = 
 for line in csv.reader(file):
 data[line[0]] = line[1:]
 return data
 with open(file1, 'r') as f1, open(file2, 'r') as f2:
 data1 = read_file(f1)
 data2 = read_file(f2)
 with open(outfile, 'w') as out:
 wtr= csv.writer(out)
 for key in data1.keys():
 try:
 wtr.writerow(((key), ','.join(data1[key]), ','.join(data2[key])))
 except KeyError:
 pass

However the output ends up looking like this:

test1 ba ab cd dh gf 123 344 123
test3 er rt sf sd sa 456 121 677

Can anyone help me with how to make the output so that test1 can be printed all three times?

Much Appreciated

asked Nov 14 '18 at 2:41

Rk_23

10117

I currently have two sets of data files that look like this:

File 1:

test1 ba ab cd dh gf
test2 fa ab cd dh gf
test3 rt ty er wq ee
test4 er rt sf sd sa

and in file 2:

test1 123 344 123
test1 234 567 787
test1 221 344 566
test3 456 121 677

I would like to combine the files based on mathching rows in the first column (so that "tests" match up)

like so:

test1 ba ab cd dh gf 123 344 123
test1 ba ab cd dh gf 234 567 787
test1 ba ab cd dh gf 221 344 566
test3 rt ty er wq ee 456 121 677

I have this Code

def combineFiles(file1,file2,outfile):
 def read_file(file):
 data = 
 for line in csv.reader(file):
 data[line[0]] = line[1:]
 return data
 with open(file1, 'r') as f1, open(file2, 'r') as f2:
 data1 = read_file(f1)
 data2 = read_file(f2)
 with open(outfile, 'w') as out:
 wtr= csv.writer(out)
 for key in data1.keys():
 try:
 wtr.writerow(((key), ','.join(data1[key]), ','.join(data2[key])))
 except KeyError:
 pass

However the output ends up looking like this:

test1 ba ab cd dh gf 123 344 123
test3 er rt sf sd sa 456 121 677

Can anyone help me with how to make the output so that test1 can be printed all three times?

Much Appreciated

python python-3.x python-2.7

asked Nov 14 '18 at 2:41

Rk_23

10117

asked Nov 14 '18 at 2:41

Rk_23

10117

asked Nov 14 '18 at 2:41

Rk_23

10117

asked Nov 14 '18 at 2:41

Rk_23

10117

asked Nov 14 '18 at 2:41

Rk_23

10117

add a comment |

4 Answers
4

active

oldest

votes

While I would recommend Brad Solomon's approach as it's pretty succinct, you just need a small change in your code.

Since your second file is the one that has the "final say", you just need to create a dictionary for the first file. Then you can write the output file as you read from the second file, fetching values from the data1 dictionary as you go:

with open(file1, 'r') as f1, open(file2, 'r') as f2:
 data1 = read_file(f1)
 with open(outfile, 'w') as out:
 wtr = csv.writer(out, delimiter=' ')
 for line in csv.reader(f2, delimiter=' '):
 # only write if there is a corresponding line in file1
 if line[0] in data1:
 # as you write, get the corresponding file1 data
 wtr.writerow(line[0:] + data1[line[0]] + line[1:])

edited Nov 15 '18 at 4:47

answered Nov 14 '18 at 3:48

slider

8,23811129

add a comment |

You might want to give the Pandas library a try; it makes cases like this easy:

>>> import pandas as pd
>>> pd.merge(df1, df2, on='testnum', how='inner')
 testnum 1_x 2_x 3_x 4 5 1_y 2_y 3_y
0 test1 ba ab cd dh gf 123 344 123
1 test1 ba ab cd dh gf 234 567 787
2 test1 ba ab cd dh gf 221 344 566
3 test3 rt ty er wq ee 456 121 677

This assumes the test column is named "testnum".

>>> df1
 testnum 1 2 3 4 5
0 test1 ba ab cd dh gf
1 test2 fa ab cd dh gf
2 test3 rt ty er wq ee
3 test4 er rt sf sd sa

>>> df2
 testnum 1 2 3
0 test1 123 344 123
1 test1 234 567 787
2 test1 221 344 566
3 test3 456 121 677

You'd read these in with pd.read_csv().

edited Nov 14 '18 at 12:48

answered Nov 14 '18 at 2:49

Brad Solomon

13.4k73484

add a comment |

The issue is that you are overwriting the keys in the line

data[line[0]] = line[1:]

Since your files have non-unique "keys," you could try manually making them unique using enumerate:

for ind, line in enumerate(csv.reader(file)):
 unique_key = ''.join([line[0], "_", str(ind)])
 data[unique_key] = line[1:]

Later, when you merge the result, you can strip the keys to remove anything after the underscore:

wtr.writerow(((key.split("_")[0], ','.join(data1[key]), ','.join(data2[key])))

To my taste, this is all very clumsy. If your goal is to read, manipulate, and write data to and from csv files, I would recommend looking into pandas, as this code can be written in a few lines using DataFrames (see the answer by Brad Solomon).

edited Nov 14 '18 at 3:12

answered Nov 14 '18 at 2:58

ppinchuk

813

add a comment |

You can try collecting your items into seperate collections.defaultdict(), then get the cartesian product of the intersecting rows with itertools.product():

from collections import defaultdict
from itertools import product

def collect_rows(file):
 collection = defaultdict(list)

 for line in file:
 col1, *rest = line.split()
 collection[col1].append(rest)

 return collection

with open("file1.txt") as f1, open("file2.txt") as f2, open("output.txt", "w") as out:
 f1_collection = collect_rows(f1)
 f2_collection = collect_rows(f2)

 # Ordered intersection, no need to sort
 set_2 = set(f2_collection)
 intersection = [key for key in f1_collection if key in set_2]

 for key in intersection:
 for x, y in product(f1_collection[key], f2_collection[key]):
 out.write("%sn" % " ".join([key] + x + y))

Which gives the following output.txt:

test1 ba ab cd dh gf 123 344 123
test1 ba ab cd dh gf 234 567 787
test1 ba ab cd dh gf 221 344 566
test3 rt ty er wq ee 456 121 677

Note: It's probably easier to follow Brad Solomon's Pandas approach, since it can be done with one command.

edited Nov 14 '18 at 3:37

answered Nov 14 '18 at 3:05

RoadRunner

11.2k31340

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292432%2fmerging-data-based-on-matching-first-column-in-python%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

While I would recommend Brad Solomon's approach as it's pretty succinct, you just need a small change in your code.

with open(file1, 'r') as f1, open(file2, 'r') as f2:
 data1 = read_file(f1)
 with open(outfile, 'w') as out:
 wtr = csv.writer(out, delimiter=' ')
 for line in csv.reader(f2, delimiter=' '):
 # only write if there is a corresponding line in file1
 if line[0] in data1:
 # as you write, get the corresponding file1 data
 wtr.writerow(line[0:] + data1[line[0]] + line[1:])

edited Nov 15 '18 at 4:47

answered Nov 14 '18 at 3:48

slider

8,23811129

add a comment |

While I would recommend Brad Solomon's approach as it's pretty succinct, you just need a small change in your code.

with open(file1, 'r') as f1, open(file2, 'r') as f2:
 data1 = read_file(f1)
 with open(outfile, 'w') as out:
 wtr = csv.writer(out, delimiter=' ')
 for line in csv.reader(f2, delimiter=' '):
 # only write if there is a corresponding line in file1
 if line[0] in data1:
 # as you write, get the corresponding file1 data
 wtr.writerow(line[0:] + data1[line[0]] + line[1:])

edited Nov 15 '18 at 4:47

answered Nov 14 '18 at 3:48

slider

8,23811129

add a comment |

While I would recommend Brad Solomon's approach as it's pretty succinct, you just need a small change in your code.

with open(file1, 'r') as f1, open(file2, 'r') as f2:
 data1 = read_file(f1)
 with open(outfile, 'w') as out:
 wtr = csv.writer(out, delimiter=' ')
 for line in csv.reader(f2, delimiter=' '):
 # only write if there is a corresponding line in file1
 if line[0] in data1:
 # as you write, get the corresponding file1 data
 wtr.writerow(line[0:] + data1[line[0]] + line[1:])

edited Nov 15 '18 at 4:47

answered Nov 14 '18 at 3:48

slider

8,23811129

While I would recommend Brad Solomon's approach as it's pretty succinct, you just need a small change in your code.

with open(file1, 'r') as f1, open(file2, 'r') as f2:
 data1 = read_file(f1)
 with open(outfile, 'w') as out:
 wtr = csv.writer(out, delimiter=' ')
 for line in csv.reader(f2, delimiter=' '):
 # only write if there is a corresponding line in file1
 if line[0] in data1:
 # as you write, get the corresponding file1 data
 wtr.writerow(line[0:] + data1[line[0]] + line[1:])

edited Nov 15 '18 at 4:47

answered Nov 14 '18 at 3:48

slider

8,23811129

edited Nov 15 '18 at 4:47

answered Nov 14 '18 at 3:48

slider

8,23811129

answered Nov 14 '18 at 3:48

slider

8,23811129

answered Nov 14 '18 at 3:48

slider

8,23811129

add a comment |

You might want to give the Pandas library a try; it makes cases like this easy:

>>> import pandas as pd
>>> pd.merge(df1, df2, on='testnum', how='inner')
 testnum 1_x 2_x 3_x 4 5 1_y 2_y 3_y
0 test1 ba ab cd dh gf 123 344 123
1 test1 ba ab cd dh gf 234 567 787
2 test1 ba ab cd dh gf 221 344 566
3 test3 rt ty er wq ee 456 121 677

This assumes the test column is named "testnum".

>>> df1
 testnum 1 2 3 4 5
0 test1 ba ab cd dh gf
1 test2 fa ab cd dh gf
2 test3 rt ty er wq ee
3 test4 er rt sf sd sa

>>> df2
 testnum 1 2 3
0 test1 123 344 123
1 test1 234 567 787
2 test1 221 344 566
3 test3 456 121 677

You'd read these in with pd.read_csv().

edited Nov 14 '18 at 12:48

answered Nov 14 '18 at 2:49

Brad Solomon

13.4k73484

add a comment |

You might want to give the Pandas library a try; it makes cases like this easy:

>>> import pandas as pd
>>> pd.merge(df1, df2, on='testnum', how='inner')
 testnum 1_x 2_x 3_x 4 5 1_y 2_y 3_y
0 test1 ba ab cd dh gf 123 344 123
1 test1 ba ab cd dh gf 234 567 787
2 test1 ba ab cd dh gf 221 344 566
3 test3 rt ty er wq ee 456 121 677

This assumes the test column is named "testnum".

>>> df1
 testnum 1 2 3 4 5
0 test1 ba ab cd dh gf
1 test2 fa ab cd dh gf
2 test3 rt ty er wq ee
3 test4 er rt sf sd sa

>>> df2
 testnum 1 2 3
0 test1 123 344 123
1 test1 234 567 787
2 test1 221 344 566
3 test3 456 121 677

You'd read these in with pd.read_csv().

edited Nov 14 '18 at 12:48

answered Nov 14 '18 at 2:49

Brad Solomon

13.4k73484

add a comment |

You might want to give the Pandas library a try; it makes cases like this easy:

>>> import pandas as pd
>>> pd.merge(df1, df2, on='testnum', how='inner')
 testnum 1_x 2_x 3_x 4 5 1_y 2_y 3_y
0 test1 ba ab cd dh gf 123 344 123
1 test1 ba ab cd dh gf 234 567 787
2 test1 ba ab cd dh gf 221 344 566
3 test3 rt ty er wq ee 456 121 677

This assumes the test column is named "testnum".

>>> df1
 testnum 1 2 3 4 5
0 test1 ba ab cd dh gf
1 test2 fa ab cd dh gf
2 test3 rt ty er wq ee
3 test4 er rt sf sd sa

>>> df2
 testnum 1 2 3
0 test1 123 344 123
1 test1 234 567 787
2 test1 221 344 566
3 test3 456 121 677

You'd read these in with pd.read_csv().

edited Nov 14 '18 at 12:48

answered Nov 14 '18 at 2:49

Brad Solomon

13.4k73484

You might want to give the Pandas library a try; it makes cases like this easy:

>>> import pandas as pd
>>> pd.merge(df1, df2, on='testnum', how='inner')
 testnum 1_x 2_x 3_x 4 5 1_y 2_y 3_y
0 test1 ba ab cd dh gf 123 344 123
1 test1 ba ab cd dh gf 234 567 787
2 test1 ba ab cd dh gf 221 344 566
3 test3 rt ty er wq ee 456 121 677

This assumes the test column is named "testnum".

>>> df1
 testnum 1 2 3 4 5
0 test1 ba ab cd dh gf
1 test2 fa ab cd dh gf
2 test3 rt ty er wq ee
3 test4 er rt sf sd sa

>>> df2
 testnum 1 2 3
0 test1 123 344 123
1 test1 234 567 787
2 test1 221 344 566
3 test3 456 121 677

You'd read these in with pd.read_csv().

edited Nov 14 '18 at 12:48

answered Nov 14 '18 at 2:49

Brad Solomon

13.4k73484

edited Nov 14 '18 at 12:48

answered Nov 14 '18 at 2:49

Brad Solomon

13.4k73484

answered Nov 14 '18 at 2:49

Brad Solomon

13.4k73484

answered Nov 14 '18 at 2:49

Brad Solomon

13.4k73484

add a comment |

The issue is that you are overwriting the keys in the line

data[line[0]] = line[1:]

Since your files have non-unique "keys," you could try manually making them unique using enumerate:

for ind, line in enumerate(csv.reader(file)):
 unique_key = ''.join([line[0], "_", str(ind)])
 data[unique_key] = line[1:]

Later, when you merge the result, you can strip the keys to remove anything after the underscore:

wtr.writerow(((key.split("_")[0], ','.join(data1[key]), ','.join(data2[key])))

edited Nov 14 '18 at 3:12

answered Nov 14 '18 at 2:58

ppinchuk

813

add a comment |

The issue is that you are overwriting the keys in the line

data[line[0]] = line[1:]

Since your files have non-unique "keys," you could try manually making them unique using enumerate:

for ind, line in enumerate(csv.reader(file)):
 unique_key = ''.join([line[0], "_", str(ind)])
 data[unique_key] = line[1:]

Later, when you merge the result, you can strip the keys to remove anything after the underscore:

wtr.writerow(((key.split("_")[0], ','.join(data1[key]), ','.join(data2[key])))

edited Nov 14 '18 at 3:12

answered Nov 14 '18 at 2:58

ppinchuk

813

add a comment |

The issue is that you are overwriting the keys in the line

data[line[0]] = line[1:]

Since your files have non-unique "keys," you could try manually making them unique using enumerate:

for ind, line in enumerate(csv.reader(file)):
 unique_key = ''.join([line[0], "_", str(ind)])
 data[unique_key] = line[1:]

Later, when you merge the result, you can strip the keys to remove anything after the underscore:

wtr.writerow(((key.split("_")[0], ','.join(data1[key]), ','.join(data2[key])))

edited Nov 14 '18 at 3:12

answered Nov 14 '18 at 2:58

ppinchuk

813

The issue is that you are overwriting the keys in the line

data[line[0]] = line[1:]

Since your files have non-unique "keys," you could try manually making them unique using enumerate:

for ind, line in enumerate(csv.reader(file)):
 unique_key = ''.join([line[0], "_", str(ind)])
 data[unique_key] = line[1:]

Later, when you merge the result, you can strip the keys to remove anything after the underscore:

wtr.writerow(((key.split("_")[0], ','.join(data1[key]), ','.join(data2[key])))

edited Nov 14 '18 at 3:12

answered Nov 14 '18 at 2:58

ppinchuk

813

edited Nov 14 '18 at 3:12

answered Nov 14 '18 at 2:58

ppinchuk

813

answered Nov 14 '18 at 2:58

ppinchuk

813

answered Nov 14 '18 at 2:58

ppinchuk

813

add a comment |

You can try collecting your items into seperate collections.defaultdict(), then get the cartesian product of the intersecting rows with itertools.product():

from collections import defaultdict
from itertools import product

def collect_rows(file):
 collection = defaultdict(list)

 for line in file:
 col1, *rest = line.split()
 collection[col1].append(rest)

 return collection

with open("file1.txt") as f1, open("file2.txt") as f2, open("output.txt", "w") as out:
 f1_collection = collect_rows(f1)
 f2_collection = collect_rows(f2)

 # Ordered intersection, no need to sort
 set_2 = set(f2_collection)
 intersection = [key for key in f1_collection if key in set_2]

 for key in intersection:
 for x, y in product(f1_collection[key], f2_collection[key]):
 out.write("%sn" % " ".join([key] + x + y))

Which gives the following output.txt:

test1 ba ab cd dh gf 123 344 123
test1 ba ab cd dh gf 234 567 787
test1 ba ab cd dh gf 221 344 566
test3 rt ty er wq ee 456 121 677

Note: It's probably easier to follow Brad Solomon's Pandas approach, since it can be done with one command.

edited Nov 14 '18 at 3:37

answered Nov 14 '18 at 3:05

RoadRunner

11.2k31340

add a comment |

You can try collecting your items into seperate collections.defaultdict(), then get the cartesian product of the intersecting rows with itertools.product():

from collections import defaultdict
from itertools import product

def collect_rows(file):
 collection = defaultdict(list)

 for line in file:
 col1, *rest = line.split()
 collection[col1].append(rest)

 return collection

with open("file1.txt") as f1, open("file2.txt") as f2, open("output.txt", "w") as out:
 f1_collection = collect_rows(f1)
 f2_collection = collect_rows(f2)

 # Ordered intersection, no need to sort
 set_2 = set(f2_collection)
 intersection = [key for key in f1_collection if key in set_2]

 for key in intersection:
 for x, y in product(f1_collection[key], f2_collection[key]):
 out.write("%sn" % " ".join([key] + x + y))

Which gives the following output.txt:

test1 ba ab cd dh gf 123 344 123
test1 ba ab cd dh gf 234 567 787
test1 ba ab cd dh gf 221 344 566
test3 rt ty er wq ee 456 121 677

Note: It's probably easier to follow Brad Solomon's Pandas approach, since it can be done with one command.

edited Nov 14 '18 at 3:37

answered Nov 14 '18 at 3:05

RoadRunner

11.2k31340

add a comment |

You can try collecting your items into seperate collections.defaultdict(), then get the cartesian product of the intersecting rows with itertools.product():

from collections import defaultdict
from itertools import product

def collect_rows(file):
 collection = defaultdict(list)

 for line in file:
 col1, *rest = line.split()
 collection[col1].append(rest)

 return collection

with open("file1.txt") as f1, open("file2.txt") as f2, open("output.txt", "w") as out:
 f1_collection = collect_rows(f1)
 f2_collection = collect_rows(f2)

 # Ordered intersection, no need to sort
 set_2 = set(f2_collection)
 intersection = [key for key in f1_collection if key in set_2]

 for key in intersection:
 for x, y in product(f1_collection[key], f2_collection[key]):
 out.write("%sn" % " ".join([key] + x + y))

Which gives the following output.txt:

test1 ba ab cd dh gf 123 344 123
test1 ba ab cd dh gf 234 567 787
test1 ba ab cd dh gf 221 344 566
test3 rt ty er wq ee 456 121 677

Note: It's probably easier to follow Brad Solomon's Pandas approach, since it can be done with one command.

edited Nov 14 '18 at 3:37

answered Nov 14 '18 at 3:05

RoadRunner

11.2k31340

You can try collecting your items into seperate collections.defaultdict(), then get the cartesian product of the intersecting rows with itertools.product():

from collections import defaultdict
from itertools import product

def collect_rows(file):
 collection = defaultdict(list)

 for line in file:
 col1, *rest = line.split()
 collection[col1].append(rest)

 return collection

with open("file1.txt") as f1, open("file2.txt") as f2, open("output.txt", "w") as out:
 f1_collection = collect_rows(f1)
 f2_collection = collect_rows(f2)

 # Ordered intersection, no need to sort
 set_2 = set(f2_collection)
 intersection = [key for key in f1_collection if key in set_2]

 for key in intersection:
 for x, y in product(f1_collection[key], f2_collection[key]):
 out.write("%sn" % " ".join([key] + x + y))

Which gives the following output.txt:

test1 ba ab cd dh gf 123 344 123
test1 ba ab cd dh gf 234 567 787
test1 ba ab cd dh gf 221 344 566
test3 rt ty er wq ee 456 121 677

Note: It's probably easier to follow Brad Solomon's Pandas approach, since it can be done with one command.

edited Nov 14 '18 at 3:37

answered Nov 14 '18 at 3:05

RoadRunner

11.2k31340

edited Nov 14 '18 at 3:37

answered Nov 14 '18 at 3:05

RoadRunner

11.2k31340

answered Nov 14 '18 at 3:05

RoadRunner

11.2k31340

answered Nov 14 '18 at 3:05

RoadRunner

11.2k31340

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth