Filtering the dataframe by matching values of two columns in different rows

This is a problem pretty similar to this one Filtering the dataframe by matching values of two columns, but I think still different. In the example there it just is required to match the two columns on values for the same row. I am looking to filter a data.table on matching values in two columns, which can be scattered across the whole range of the data.table.

The resulting data.table should look like this, and it should basically filter the rows, which have matching values between msg_seq_nb and orig_msg_seq_nb. I know that I could achieve that by merging the two data.tables on the two columns, but I thought there must be an easier way, which would prevent the merging and the additional overhead this might create.

 dt <- structure(list(cusip_id = c("00208J702", "00208J702", "00208J702", 
 "00208J702", "010284AH0", "010284AH0", "010284AH0", "U9220JAC4", 
 "U9220JAC4"), orig_msg_seq_nb = c(NA, NA, NA, NA, NA, NA, "0003081", NA, "0012889"), msg_seq_nb = c("0008911", "0009000", "0009075","0009152", "0031914", "0003081", "0003105", "0012889", "0017534" )), row.names = c(NA, -9L), class = c("data.table", "data.frame" ))

> dt
 cusip_id orig_msg_seq_nb msg_seq_nb
1: 00208J702 <NA> 0008911
2: 00208J702 <NA> 0009000
3: 00208J702 <NA> 0009075
4: 00208J702 <NA> 0009152
5: 010284AH0 <NA> 0031914
6: 010284AH0 <NA> 0003081
7: 010284AH0 0003081 0003105
8: U9220JAC4 <NA> 0012889
9: U9220JAC4 0012889 0017534

The result should look like this

 cusip_id orig_msg_seq_nb msg_seq_nb
1: 010284AH0 <NA> 0003081
2: 010284AH0 0003081 0003105
3: U9220JAC4 <NA> 0012889
4: U9220JAC4 0012889 0017534

Note that the msg_seq_nb and the orig_msg_seq_nb are not unique in this dataset.

edited Nov 16 '18 at 11:46

asked Nov 16 '18 at 9:43

hannes101

732719

So for your example above, you would filter out all observations of cusip_id == 010284AH0? Maybe add a bigger example along with expected output

– Sotos
Nov 16 '18 at 9:51

The cusip doesn't matter for the matching to work. The dt is basically the desired output, and the initial data.table is just the same but with a lot more entries in the columns, don't know if that would make it easier to get.

– hannes101
Nov 16 '18 at 10:05

1

It is better to also give the initial table as well...no need for a lot of entries. just a few more that did not make the filtering

– Sotos
Nov 16 '18 at 10:17

add a comment |

 dt <- structure(list(cusip_id = c("00208J702", "00208J702", "00208J702", 
 "00208J702", "010284AH0", "010284AH0", "010284AH0", "U9220JAC4", 
 "U9220JAC4"), orig_msg_seq_nb = c(NA, NA, NA, NA, NA, NA, "0003081", NA, "0012889"), msg_seq_nb = c("0008911", "0009000", "0009075","0009152", "0031914", "0003081", "0003105", "0012889", "0017534" )), row.names = c(NA, -9L), class = c("data.table", "data.frame" ))

> dt
 cusip_id orig_msg_seq_nb msg_seq_nb
1: 00208J702 <NA> 0008911
2: 00208J702 <NA> 0009000
3: 00208J702 <NA> 0009075
4: 00208J702 <NA> 0009152
5: 010284AH0 <NA> 0031914
6: 010284AH0 <NA> 0003081
7: 010284AH0 0003081 0003105
8: U9220JAC4 <NA> 0012889
9: U9220JAC4 0012889 0017534

The result should look like this

 cusip_id orig_msg_seq_nb msg_seq_nb
1: 010284AH0 <NA> 0003081
2: 010284AH0 0003081 0003105
3: U9220JAC4 <NA> 0012889
4: U9220JAC4 0012889 0017534

Note that the msg_seq_nb and the orig_msg_seq_nb are not unique in this dataset.

edited Nov 16 '18 at 11:46

asked Nov 16 '18 at 9:43

hannes101

732719

So for your example above, you would filter out all observations of cusip_id == 010284AH0? Maybe add a bigger example along with expected output

– Sotos
Nov 16 '18 at 9:51

The cusip doesn't matter for the matching to work. The dt is basically the desired output, and the initial data.table is just the same but with a lot more entries in the columns, don't know if that would make it easier to get.

– hannes101
Nov 16 '18 at 10:05

1

It is better to also give the initial table as well...no need for a lot of entries. just a few more that did not make the filtering

– Sotos
Nov 16 '18 at 10:17

add a comment |

 dt <- structure(list(cusip_id = c("00208J702", "00208J702", "00208J702", 
 "00208J702", "010284AH0", "010284AH0", "010284AH0", "U9220JAC4", 
 "U9220JAC4"), orig_msg_seq_nb = c(NA, NA, NA, NA, NA, NA, "0003081", NA, "0012889"), msg_seq_nb = c("0008911", "0009000", "0009075","0009152", "0031914", "0003081", "0003105", "0012889", "0017534" )), row.names = c(NA, -9L), class = c("data.table", "data.frame" ))

> dt
 cusip_id orig_msg_seq_nb msg_seq_nb
1: 00208J702 <NA> 0008911
2: 00208J702 <NA> 0009000
3: 00208J702 <NA> 0009075
4: 00208J702 <NA> 0009152
5: 010284AH0 <NA> 0031914
6: 010284AH0 <NA> 0003081
7: 010284AH0 0003081 0003105
8: U9220JAC4 <NA> 0012889
9: U9220JAC4 0012889 0017534

The result should look like this

 cusip_id orig_msg_seq_nb msg_seq_nb
1: 010284AH0 <NA> 0003081
2: 010284AH0 0003081 0003105
3: U9220JAC4 <NA> 0012889
4: U9220JAC4 0012889 0017534

Note that the msg_seq_nb and the orig_msg_seq_nb are not unique in this dataset.

edited Nov 16 '18 at 11:46

asked Nov 16 '18 at 9:43

hannes101

732719

 dt <- structure(list(cusip_id = c("00208J702", "00208J702", "00208J702", 
 "00208J702", "010284AH0", "010284AH0", "010284AH0", "U9220JAC4", 
 "U9220JAC4"), orig_msg_seq_nb = c(NA, NA, NA, NA, NA, NA, "0003081", NA, "0012889"), msg_seq_nb = c("0008911", "0009000", "0009075","0009152", "0031914", "0003081", "0003105", "0012889", "0017534" )), row.names = c(NA, -9L), class = c("data.table", "data.frame" ))

> dt
 cusip_id orig_msg_seq_nb msg_seq_nb
1: 00208J702 <NA> 0008911
2: 00208J702 <NA> 0009000
3: 00208J702 <NA> 0009075
4: 00208J702 <NA> 0009152
5: 010284AH0 <NA> 0031914
6: 010284AH0 <NA> 0003081
7: 010284AH0 0003081 0003105
8: U9220JAC4 <NA> 0012889
9: U9220JAC4 0012889 0017534

The result should look like this

 cusip_id orig_msg_seq_nb msg_seq_nb
1: 010284AH0 <NA> 0003081
2: 010284AH0 0003081 0003105
3: U9220JAC4 <NA> 0012889
4: U9220JAC4 0012889 0017534

Note that the msg_seq_nb and the orig_msg_seq_nb are not unique in this dataset.

r dplyr data.table

edited Nov 16 '18 at 11:46

asked Nov 16 '18 at 9:43

hannes101

732719

edited Nov 16 '18 at 11:46

asked Nov 16 '18 at 9:43

hannes101

732719

edited Nov 16 '18 at 11:46

asked Nov 16 '18 at 9:43

hannes101

732719

asked Nov 16 '18 at 9:43

hannes101

732719

asked Nov 16 '18 at 9:43

hannes101

732719

So for your example above, you would filter out all observations of cusip_id == 010284AH0? Maybe add a bigger example along with expected output

– Sotos
Nov 16 '18 at 9:51

The cusip doesn't matter for the matching to work. The dt is basically the desired output, and the initial data.table is just the same but with a lot more entries in the columns, don't know if that would make it easier to get.

– hannes101
Nov 16 '18 at 10:05

1

It is better to also give the initial table as well...no need for a lot of entries. just a few more that did not make the filtering

– Sotos
Nov 16 '18 at 10:17

add a comment |

So for your example above, you would filter out all observations of cusip_id == 010284AH0? Maybe add a bigger example along with expected output

– Sotos
Nov 16 '18 at 9:51

The cusip doesn't matter for the matching to work. The dt is basically the desired output, and the initial data.table is just the same but with a lot more entries in the columns, don't know if that would make it easier to get.

– hannes101
Nov 16 '18 at 10:05

1

It is better to also give the initial table as well...no need for a lot of entries. just a few more that did not make the filtering

– Sotos
Nov 16 '18 at 10:17

So for your example above, you would filter out all observations of cusip_id == 010284AH0? Maybe add a bigger example along with expected output

– Sotos
Nov 16 '18 at 9:51

The cusip doesn't matter for the matching to work. The dt is basically the desired output, and the initial data.table is just the same but with a lot more entries in the columns, don't know if that would make it easier to get.

– hannes101
Nov 16 '18 at 10:05

It is better to also give the initial table as well...no need for a lot of entries. just a few more that did not make the filtering

– Sotos
Nov 16 '18 at 10:17

add a comment |

1 Answer
1

active

oldest

votes

The easiest way I can think of:

dt[orig_msg_seq_nb %in% msg_seq_nb | msg_seq_nb %in% orig_msg_seq_nb, ]

 cusip_id orig_msg_seq_nb msg_seq_nb
1: 010284AH0 NA 0003081
2: 010284AH0 0003081 0003105
3: U9220JAC4 NA 0012889
4: U9220JAC4 0012889 0017534

answered Nov 16 '18 at 10:31

MRau

32318

Unfortunately, this is not possible, since the msg_seq_nb are not unique and it will be much more matches than the correct ones.

– hannes101
Nov 16 '18 at 11:46

@hannes101 If you look for matching values between msg_seq_nb and orig_msg_seq_nb, then this behavior is correct. If you want to further reduce the numer of rows, then you need an additional condition. Do you want to filter those rows, where cusip_id matches as well?

– MRau
Nov 16 '18 at 11:53

I just read that the msg_seq_nb are only unique per trading day, so yes I would need an additional condition. I think I'll just go with a merge between these two tables.

– hannes101
Nov 16 '18 at 11:59

@hannes101 The example you provided doesn't enable to test this additional condition, so I can only guess, but you can try this: ids <- dt[orig_msg_seq_nb %in% msg_seq_nb, cusip_id]; dt[orig_msg_seq_nb %in% msg_seq_nb | (msg_seq_nb %in% orig_msg_seq_nb & cusip_id %in% ids), ]

– MRau
Nov 16 '18 at 12:45

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335142%2ffiltering-the-dataframe-by-matching-values-of-two-columns-in-different-rows%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The easiest way I can think of:

dt[orig_msg_seq_nb %in% msg_seq_nb | msg_seq_nb %in% orig_msg_seq_nb, ]

 cusip_id orig_msg_seq_nb msg_seq_nb
1: 010284AH0 NA 0003081
2: 010284AH0 0003081 0003105
3: U9220JAC4 NA 0012889
4: U9220JAC4 0012889 0017534

answered Nov 16 '18 at 10:31

MRau

32318

Unfortunately, this is not possible, since the msg_seq_nb are not unique and it will be much more matches than the correct ones.

– hannes101
Nov 16 '18 at 11:46

@hannes101 If you look for matching values between msg_seq_nb and orig_msg_seq_nb, then this behavior is correct. If you want to further reduce the numer of rows, then you need an additional condition. Do you want to filter those rows, where cusip_id matches as well?

– MRau
Nov 16 '18 at 11:53

I just read that the msg_seq_nb are only unique per trading day, so yes I would need an additional condition. I think I'll just go with a merge between these two tables.

– hannes101
Nov 16 '18 at 11:59

@hannes101 The example you provided doesn't enable to test this additional condition, so I can only guess, but you can try this: ids <- dt[orig_msg_seq_nb %in% msg_seq_nb, cusip_id]; dt[orig_msg_seq_nb %in% msg_seq_nb | (msg_seq_nb %in% orig_msg_seq_nb & cusip_id %in% ids), ]

– MRau
Nov 16 '18 at 12:45

add a comment |

The easiest way I can think of:

dt[orig_msg_seq_nb %in% msg_seq_nb | msg_seq_nb %in% orig_msg_seq_nb, ]

 cusip_id orig_msg_seq_nb msg_seq_nb
1: 010284AH0 NA 0003081
2: 010284AH0 0003081 0003105
3: U9220JAC4 NA 0012889
4: U9220JAC4 0012889 0017534

answered Nov 16 '18 at 10:31

MRau

32318

Unfortunately, this is not possible, since the msg_seq_nb are not unique and it will be much more matches than the correct ones.

– hannes101
Nov 16 '18 at 11:46

@hannes101 If you look for matching values between msg_seq_nb and orig_msg_seq_nb, then this behavior is correct. If you want to further reduce the numer of rows, then you need an additional condition. Do you want to filter those rows, where cusip_id matches as well?

– MRau
Nov 16 '18 at 11:53

I just read that the msg_seq_nb are only unique per trading day, so yes I would need an additional condition. I think I'll just go with a merge between these two tables.

– hannes101
Nov 16 '18 at 11:59

@hannes101 The example you provided doesn't enable to test this additional condition, so I can only guess, but you can try this: ids <- dt[orig_msg_seq_nb %in% msg_seq_nb, cusip_id]; dt[orig_msg_seq_nb %in% msg_seq_nb | (msg_seq_nb %in% orig_msg_seq_nb & cusip_id %in% ids), ]

– MRau
Nov 16 '18 at 12:45

add a comment |

The easiest way I can think of:

dt[orig_msg_seq_nb %in% msg_seq_nb | msg_seq_nb %in% orig_msg_seq_nb, ]

 cusip_id orig_msg_seq_nb msg_seq_nb
1: 010284AH0 NA 0003081
2: 010284AH0 0003081 0003105
3: U9220JAC4 NA 0012889
4: U9220JAC4 0012889 0017534

answered Nov 16 '18 at 10:31

MRau

32318

The easiest way I can think of:

dt[orig_msg_seq_nb %in% msg_seq_nb | msg_seq_nb %in% orig_msg_seq_nb, ]

 cusip_id orig_msg_seq_nb msg_seq_nb
1: 010284AH0 NA 0003081
2: 010284AH0 0003081 0003105
3: U9220JAC4 NA 0012889
4: U9220JAC4 0012889 0017534

answered Nov 16 '18 at 10:31

MRau

32318

answered Nov 16 '18 at 10:31

MRau

32318

answered Nov 16 '18 at 10:31

MRau

32318

answered Nov 16 '18 at 10:31

MRau

32318

Unfortunately, this is not possible, since the msg_seq_nb are not unique and it will be much more matches than the correct ones.

– hannes101
Nov 16 '18 at 11:46

@hannes101 If you look for matching values between msg_seq_nb and orig_msg_seq_nb, then this behavior is correct. If you want to further reduce the numer of rows, then you need an additional condition. Do you want to filter those rows, where cusip_id matches as well?

– MRau
Nov 16 '18 at 11:53

I just read that the msg_seq_nb are only unique per trading day, so yes I would need an additional condition. I think I'll just go with a merge between these two tables.

– hannes101
Nov 16 '18 at 11:59

@hannes101 The example you provided doesn't enable to test this additional condition, so I can only guess, but you can try this: ids <- dt[orig_msg_seq_nb %in% msg_seq_nb, cusip_id]; dt[orig_msg_seq_nb %in% msg_seq_nb | (msg_seq_nb %in% orig_msg_seq_nb & cusip_id %in% ids), ]

– MRau
Nov 16 '18 at 12:45

add a comment |

Unfortunately, this is not possible, since the msg_seq_nb are not unique and it will be much more matches than the correct ones.

– hannes101
Nov 16 '18 at 11:46

@hannes101 If you look for matching values between msg_seq_nb and orig_msg_seq_nb, then this behavior is correct. If you want to further reduce the numer of rows, then you need an additional condition. Do you want to filter those rows, where cusip_id matches as well?

– MRau
Nov 16 '18 at 11:53

I just read that the msg_seq_nb are only unique per trading day, so yes I would need an additional condition. I think I'll just go with a merge between these two tables.

– hannes101
Nov 16 '18 at 11:59

@hannes101 The example you provided doesn't enable to test this additional condition, so I can only guess, but you can try this: ids <- dt[orig_msg_seq_nb %in% msg_seq_nb, cusip_id]; dt[orig_msg_seq_nb %in% msg_seq_nb | (msg_seq_nb %in% orig_msg_seq_nb & cusip_id %in% ids), ]

– MRau
Nov 16 '18 at 12:45

Unfortunately, this is not possible, since the msg_seq_nb are not unique and it will be much more matches than the correct ones.

– hannes101
Nov 16 '18 at 11:46

@hannes101 If you look for matching values between msg_seq_nb and orig_msg_seq_nb, then this behavior is correct. If you want to further reduce the numer of rows, then you need an additional condition. Do you want to filter those rows, where cusip_id matches as well?

– MRau
Nov 16 '18 at 11:53

I just read that the msg_seq_nb are only unique per trading day, so yes I would need an additional condition. I think I'll just go with a merge between these two tables.

– hannes101
Nov 16 '18 at 11:59

@hannes101 The example you provided doesn't enable to test this additional condition, so I can only guess, but you can try this:

ids <- dt[orig_msg_seq_nb %in% msg_seq_nb, cusip_id]; dt[orig_msg_seq_nb %in% msg_seq_nb | (msg_seq_nb %in% orig_msg_seq_nb & cusip_id %in% ids), ]

– MRau
Nov 16 '18 at 12:45

@hannes101 The example you provided doesn't enable to test this additional condition, so I can only guess, but you can try this:

ids <- dt[orig_msg_seq_nb %in% msg_seq_nb, cusip_id]; dt[orig_msg_seq_nb %in% msg_seq_nb | (msg_seq_nb %in% orig_msg_seq_nb & cusip_id %in% ids), ]

– MRau
Nov 16 '18 at 12:45

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth