Coding Help on Stata

I have an unbalanced panel data set that gives me information on how much banks lend in different areas. Geography id and bank id are numeric variables that were created using a Stata command like egen id=group(var).

The geography id goes from 1 to n and the bank id goes from 1 to k. To give you a more concrete idea of how my data look:

Geography ID (gid) | Bank ID (bid) | lending
-----------------------------------------------
1 | 1 | 25
1 | 2 | 32
1 | 4 | 83
----------------------------------------------
2 | 1 | 76
2 | 3 | 22
---------------------------------------------
3 | 2 | 42
3 | 3 | 12
3 | 5 | 22
--------------------------------------------

My final goal is to create a dataframe that has all the pairwise combinations of the geographical areas such that:

 1 2 3 ......... n
-------------------------------
1|(1,1) (1,2) (1,3)......(1,n)
2|(2,1) (2,2) (2,3)......(2,n)
.| . . .
n|(n,1) . ......(n,n)

Such that entry (i,j) gives me:

(i,j)=(Lending from Banks Operating in Area i and j)/(Total Lending in Area i and j)

So for instance given the above data

(1,1)=1 (1,2)=(25+76)/(25+32+83+76+22) (1,3)=(32+42)/(25+32+83+42+12+22)

I have a feeling that as a first step I should use levelsof and bysort in a loop but I am unsure on how exactly to tackle the problem.

Even if you can't provide an exact solution I would be extremely grateful receiving any help or suggestion. Although I prefer Stata I also have some knowledge of Matlab/R, so if you think it would be more suitable for that problem I am open to suggestions.

edited Nov 15 '18 at 9:12

Nick Cox

25.1k42038

asked Nov 14 '18 at 23:56

nda

In R that could be accomplished with xtabs( Lending ~ Geograph_ID + Bank_ID, data=dat). Or possible something with tapply if there are multiple entries in any given cell (which is not presented in the data schema you displayed.) It would be an N x K matrix, not an N x N one. It's not clear how there could be different total in an i-j combo than a single item in that same combo.

– 42-
Nov 15 '18 at 0:09

Thank you for your answer. Sorry if I wasn't clear, but it should be N x N, since I want all the pairwise combinations of N different areas and not the pairwise combinations of Area-Bank (which would be N x K) .

– nda
Nov 15 '18 at 0:16

add a comment |

The geography id goes from 1 to n and the bank id goes from 1 to k. To give you a more concrete idea of how my data look:

Geography ID (gid) | Bank ID (bid) | lending
-----------------------------------------------
1 | 1 | 25
1 | 2 | 32
1 | 4 | 83
----------------------------------------------
2 | 1 | 76
2 | 3 | 22
---------------------------------------------
3 | 2 | 42
3 | 3 | 12
3 | 5 | 22
--------------------------------------------

My final goal is to create a dataframe that has all the pairwise combinations of the geographical areas such that:

 1 2 3 ......... n
-------------------------------
1|(1,1) (1,2) (1,3)......(1,n)
2|(2,1) (2,2) (2,3)......(2,n)
.| . . .
n|(n,1) . ......(n,n)

Such that entry (i,j) gives me:

(i,j)=(Lending from Banks Operating in Area i and j)/(Total Lending in Area i and j)

So for instance given the above data

(1,1)=1 (1,2)=(25+76)/(25+32+83+76+22) (1,3)=(32+42)/(25+32+83+42+12+22)

I have a feeling that as a first step I should use levelsof and bysort in a loop but I am unsure on how exactly to tackle the problem.

edited Nov 15 '18 at 9:12

Nick Cox

25.1k42038

asked Nov 14 '18 at 23:56

nda

In R that could be accomplished with xtabs( Lending ~ Geograph_ID + Bank_ID, data=dat). Or possible something with tapply if there are multiple entries in any given cell (which is not presented in the data schema you displayed.) It would be an N x K matrix, not an N x N one. It's not clear how there could be different total in an i-j combo than a single item in that same combo.

– 42-
Nov 15 '18 at 0:09

Thank you for your answer. Sorry if I wasn't clear, but it should be N x N, since I want all the pairwise combinations of N different areas and not the pairwise combinations of Area-Bank (which would be N x K) .

– nda
Nov 15 '18 at 0:16

add a comment |

The geography id goes from 1 to n and the bank id goes from 1 to k. To give you a more concrete idea of how my data look:

Geography ID (gid) | Bank ID (bid) | lending
-----------------------------------------------
1 | 1 | 25
1 | 2 | 32
1 | 4 | 83
----------------------------------------------
2 | 1 | 76
2 | 3 | 22
---------------------------------------------
3 | 2 | 42
3 | 3 | 12
3 | 5 | 22
--------------------------------------------

My final goal is to create a dataframe that has all the pairwise combinations of the geographical areas such that:

 1 2 3 ......... n
-------------------------------
1|(1,1) (1,2) (1,3)......(1,n)
2|(2,1) (2,2) (2,3)......(2,n)
.| . . .
n|(n,1) . ......(n,n)

Such that entry (i,j) gives me:

(i,j)=(Lending from Banks Operating in Area i and j)/(Total Lending in Area i and j)

So for instance given the above data

(1,1)=1 (1,2)=(25+76)/(25+32+83+76+22) (1,3)=(32+42)/(25+32+83+42+12+22)

I have a feeling that as a first step I should use levelsof and bysort in a loop but I am unsure on how exactly to tackle the problem.

edited Nov 15 '18 at 9:12

Nick Cox

25.1k42038

asked Nov 14 '18 at 23:56

nda

The geography id goes from 1 to n and the bank id goes from 1 to k. To give you a more concrete idea of how my data look:

Geography ID (gid) | Bank ID (bid) | lending
-----------------------------------------------
1 | 1 | 25
1 | 2 | 32
1 | 4 | 83
----------------------------------------------
2 | 1 | 76
2 | 3 | 22
---------------------------------------------
3 | 2 | 42
3 | 3 | 12
3 | 5 | 22
--------------------------------------------

My final goal is to create a dataframe that has all the pairwise combinations of the geographical areas such that:

 1 2 3 ......... n
-------------------------------
1|(1,1) (1,2) (1,3)......(1,n)
2|(2,1) (2,2) (2,3)......(2,n)
.| . . .
n|(n,1) . ......(n,n)

Such that entry (i,j) gives me:

(i,j)=(Lending from Banks Operating in Area i and j)/(Total Lending in Area i and j)

So for instance given the above data

(1,1)=1 (1,2)=(25+76)/(25+32+83+76+22) (1,3)=(32+42)/(25+32+83+42+12+22)

I have a feeling that as a first step I should use levelsof and bysort in a loop but I am unsure on how exactly to tackle the problem.

r matlab stata

edited Nov 15 '18 at 9:12

Nick Cox

25.1k42038

asked Nov 14 '18 at 23:56

nda

edited Nov 15 '18 at 9:12

Nick Cox

25.1k42038

asked Nov 14 '18 at 23:56

nda

edited Nov 15 '18 at 9:12

Nick Cox

25.1k42038

edited Nov 15 '18 at 9:12

Nick Cox

25.1k42038

edited Nov 15 '18 at 9:12

Nick Cox

25.1k42038

asked Nov 14 '18 at 23:56

nda

asked Nov 14 '18 at 23:56

nda

asked Nov 14 '18 at 23:56

nda

In R that could be accomplished with xtabs( Lending ~ Geograph_ID + Bank_ID, data=dat). Or possible something with tapply if there are multiple entries in any given cell (which is not presented in the data schema you displayed.) It would be an N x K matrix, not an N x N one. It's not clear how there could be different total in an i-j combo than a single item in that same combo.

– 42-
Nov 15 '18 at 0:09

Thank you for your answer. Sorry if I wasn't clear, but it should be N x N, since I want all the pairwise combinations of N different areas and not the pairwise combinations of Area-Bank (which would be N x K) .

– nda
Nov 15 '18 at 0:16

add a comment |

In R that could be accomplished with xtabs( Lending ~ Geograph_ID + Bank_ID, data=dat). Or possible something with tapply if there are multiple entries in any given cell (which is not presented in the data schema you displayed.) It would be an N x K matrix, not an N x N one. It's not clear how there could be different total in an i-j combo than a single item in that same combo.

– 42-
Nov 15 '18 at 0:09

Thank you for your answer. Sorry if I wasn't clear, but it should be N x N, since I want all the pairwise combinations of N different areas and not the pairwise combinations of Area-Bank (which would be N x K) .

– nda
Nov 15 '18 at 0:16

In R that could be accomplished with xtabs( Lending ~ Geograph_ID + Bank_ID, data=dat). Or possible something with tapply if there are multiple entries in any given cell (which is not presented in the data schema you displayed.) It would be an N x K matrix, not an N x N one. It's not clear how there could be different total in an i-j combo than a single item in that same combo.

– 42-
Nov 15 '18 at 0:09

Thank you for your answer. Sorry if I wasn't clear, but it should be N x N, since I want all the pairwise combinations of N different areas and not the pairwise combinations of Area-Bank (which would be N x K) .

– nda
Nov 15 '18 at 0:16

add a comment |

1 Answer
1

active

oldest

votes

Here's an R method:

x <- data.frame(
 geoid = c(1,1,1, 2,2, 3,3,3),
 bankid = c(1,2,4, 1,3, 2,3,5),
 lending = c(25,32,83, 76,22, 42,12,22)
)

myfunc <- function(x, i, j) 
 geos <- x$geoid %in% c(i, j)
 banks <- with(x, intersect(bankid[geoid == i], bankid[geoid == j]))
 with(x, sum(lending[geos & bankid %in% banks]) / sum(lending[geos]))


outer(unique(x$geoid), unique(x$geoid),
 function(i,j) mapply(myfunc, list(x), i, j))
# [,1] [,2] [,3]
# [1,] 1.0000000 0.4243697 0.3425926
# [2,] 0.4243697 1.0000000 0.1954023
# [3,] 0.3425926 0.1954023 1.0000000

It's not the most efficient, but it's a start. It's difficult (I think) to do this truly vectorized, since each subset requires intersections, though I'm sure this could be optimized to not require re-calculating intersect(bankid...) twice for each equivalent pair (if that's a performance factor).

Edit: slightly more efficient process that does not re-calculate equivalent pairs of geoid:

Split the data by geo:

geox <- split(x, x$geoid)

myfunc <- function(i, j) 
 if (i >= j) return(NA)
 banks <- intersect(geox[[i]]$bankid, geox[[j]]$bankid)
 sum(with(geox[[i]], lending[ bankid %in% banks ]),
 with(geox[[j]], lending[ bankid %in% banks ])) /
 sum(geox[[i]]$lending, geox[[j]]$lending)


o <- outer(seq_along(geox), seq_along(geox),
 function(i,j) mapply(myfunc, i, j))
o
# [,1] [,2] [,3]
# [1,] NA 0.4243697 0.3425926
# [2,] NA NA 0.1954023
# [3,] NA NA NA

(Just to prove we only calculated the minimum set.) Now, flip the upper triangle's data to lower triangle:

o[which(lower.tri(o),TRUE)] <- o[which(upper.tri(o),TRUE)]
o
# [,1] [,2] [,3]
# [1,] NA 0.4243697 0.3425926
# [2,] 0.4243697 NA 0.1954023
# [3,] 0.3425926 0.1954023 NA

And assign the known-value of 1 to the diagonal:

diag(o) <- 1

edited Nov 15 '18 at 0:46

answered Nov 15 '18 at 0:14

r2evans

27.1k33058

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53310540%2fcoding-help-on-stata%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Here's an R method:

x <- data.frame(
 geoid = c(1,1,1, 2,2, 3,3,3),
 bankid = c(1,2,4, 1,3, 2,3,5),
 lending = c(25,32,83, 76,22, 42,12,22)
)

myfunc <- function(x, i, j) 
 geos <- x$geoid %in% c(i, j)
 banks <- with(x, intersect(bankid[geoid == i], bankid[geoid == j]))
 with(x, sum(lending[geos & bankid %in% banks]) / sum(lending[geos]))


outer(unique(x$geoid), unique(x$geoid),
 function(i,j) mapply(myfunc, list(x), i, j))
# [,1] [,2] [,3]
# [1,] 1.0000000 0.4243697 0.3425926
# [2,] 0.4243697 1.0000000 0.1954023
# [3,] 0.3425926 0.1954023 1.0000000

Edit: slightly more efficient process that does not re-calculate equivalent pairs of geoid:

Split the data by geo:

geox <- split(x, x$geoid)

myfunc <- function(i, j) 
 if (i >= j) return(NA)
 banks <- intersect(geox[[i]]$bankid, geox[[j]]$bankid)
 sum(with(geox[[i]], lending[ bankid %in% banks ]),
 with(geox[[j]], lending[ bankid %in% banks ])) /
 sum(geox[[i]]$lending, geox[[j]]$lending)


o <- outer(seq_along(geox), seq_along(geox),
 function(i,j) mapply(myfunc, i, j))
o
# [,1] [,2] [,3]
# [1,] NA 0.4243697 0.3425926
# [2,] NA NA 0.1954023
# [3,] NA NA NA

(Just to prove we only calculated the minimum set.) Now, flip the upper triangle's data to lower triangle:

o[which(lower.tri(o),TRUE)] <- o[which(upper.tri(o),TRUE)]
o
# [,1] [,2] [,3]
# [1,] NA 0.4243697 0.3425926
# [2,] 0.4243697 NA 0.1954023
# [3,] 0.3425926 0.1954023 NA

And assign the known-value of 1 to the diagonal:

diag(o) <- 1

edited Nov 15 '18 at 0:46

answered Nov 15 '18 at 0:14

r2evans

27.1k33058

add a comment |

Here's an R method:

x <- data.frame(
 geoid = c(1,1,1, 2,2, 3,3,3),
 bankid = c(1,2,4, 1,3, 2,3,5),
 lending = c(25,32,83, 76,22, 42,12,22)
)

myfunc <- function(x, i, j) 
 geos <- x$geoid %in% c(i, j)
 banks <- with(x, intersect(bankid[geoid == i], bankid[geoid == j]))
 with(x, sum(lending[geos & bankid %in% banks]) / sum(lending[geos]))


outer(unique(x$geoid), unique(x$geoid),
 function(i,j) mapply(myfunc, list(x), i, j))
# [,1] [,2] [,3]
# [1,] 1.0000000 0.4243697 0.3425926
# [2,] 0.4243697 1.0000000 0.1954023
# [3,] 0.3425926 0.1954023 1.0000000

Edit: slightly more efficient process that does not re-calculate equivalent pairs of geoid:

Split the data by geo:

geox <- split(x, x$geoid)

myfunc <- function(i, j) 
 if (i >= j) return(NA)
 banks <- intersect(geox[[i]]$bankid, geox[[j]]$bankid)
 sum(with(geox[[i]], lending[ bankid %in% banks ]),
 with(geox[[j]], lending[ bankid %in% banks ])) /
 sum(geox[[i]]$lending, geox[[j]]$lending)


o <- outer(seq_along(geox), seq_along(geox),
 function(i,j) mapply(myfunc, i, j))
o
# [,1] [,2] [,3]
# [1,] NA 0.4243697 0.3425926
# [2,] NA NA 0.1954023
# [3,] NA NA NA

(Just to prove we only calculated the minimum set.) Now, flip the upper triangle's data to lower triangle:

o[which(lower.tri(o),TRUE)] <- o[which(upper.tri(o),TRUE)]
o
# [,1] [,2] [,3]
# [1,] NA 0.4243697 0.3425926
# [2,] 0.4243697 NA 0.1954023
# [3,] 0.3425926 0.1954023 NA

And assign the known-value of 1 to the diagonal:

diag(o) <- 1

edited Nov 15 '18 at 0:46

answered Nov 15 '18 at 0:14

r2evans

27.1k33058

add a comment |

Here's an R method:

x <- data.frame(
 geoid = c(1,1,1, 2,2, 3,3,3),
 bankid = c(1,2,4, 1,3, 2,3,5),
 lending = c(25,32,83, 76,22, 42,12,22)
)

myfunc <- function(x, i, j) 
 geos <- x$geoid %in% c(i, j)
 banks <- with(x, intersect(bankid[geoid == i], bankid[geoid == j]))
 with(x, sum(lending[geos & bankid %in% banks]) / sum(lending[geos]))


outer(unique(x$geoid), unique(x$geoid),
 function(i,j) mapply(myfunc, list(x), i, j))
# [,1] [,2] [,3]
# [1,] 1.0000000 0.4243697 0.3425926
# [2,] 0.4243697 1.0000000 0.1954023
# [3,] 0.3425926 0.1954023 1.0000000

Edit: slightly more efficient process that does not re-calculate equivalent pairs of geoid:

Split the data by geo:

geox <- split(x, x$geoid)

myfunc <- function(i, j) 
 if (i >= j) return(NA)
 banks <- intersect(geox[[i]]$bankid, geox[[j]]$bankid)
 sum(with(geox[[i]], lending[ bankid %in% banks ]),
 with(geox[[j]], lending[ bankid %in% banks ])) /
 sum(geox[[i]]$lending, geox[[j]]$lending)


o <- outer(seq_along(geox), seq_along(geox),
 function(i,j) mapply(myfunc, i, j))
o
# [,1] [,2] [,3]
# [1,] NA 0.4243697 0.3425926
# [2,] NA NA 0.1954023
# [3,] NA NA NA

(Just to prove we only calculated the minimum set.) Now, flip the upper triangle's data to lower triangle:

o[which(lower.tri(o),TRUE)] <- o[which(upper.tri(o),TRUE)]
o
# [,1] [,2] [,3]
# [1,] NA 0.4243697 0.3425926
# [2,] 0.4243697 NA 0.1954023
# [3,] 0.3425926 0.1954023 NA

And assign the known-value of 1 to the diagonal:

diag(o) <- 1

edited Nov 15 '18 at 0:46

answered Nov 15 '18 at 0:14

r2evans

27.1k33058

Here's an R method:

x <- data.frame(
 geoid = c(1,1,1, 2,2, 3,3,3),
 bankid = c(1,2,4, 1,3, 2,3,5),
 lending = c(25,32,83, 76,22, 42,12,22)
)

myfunc <- function(x, i, j) 
 geos <- x$geoid %in% c(i, j)
 banks <- with(x, intersect(bankid[geoid == i], bankid[geoid == j]))
 with(x, sum(lending[geos & bankid %in% banks]) / sum(lending[geos]))


outer(unique(x$geoid), unique(x$geoid),
 function(i,j) mapply(myfunc, list(x), i, j))
# [,1] [,2] [,3]
# [1,] 1.0000000 0.4243697 0.3425926
# [2,] 0.4243697 1.0000000 0.1954023
# [3,] 0.3425926 0.1954023 1.0000000

Edit: slightly more efficient process that does not re-calculate equivalent pairs of geoid:

Split the data by geo:

geox <- split(x, x$geoid)

myfunc <- function(i, j) 
 if (i >= j) return(NA)
 banks <- intersect(geox[[i]]$bankid, geox[[j]]$bankid)
 sum(with(geox[[i]], lending[ bankid %in% banks ]),
 with(geox[[j]], lending[ bankid %in% banks ])) /
 sum(geox[[i]]$lending, geox[[j]]$lending)


o <- outer(seq_along(geox), seq_along(geox),
 function(i,j) mapply(myfunc, i, j))
o
# [,1] [,2] [,3]
# [1,] NA 0.4243697 0.3425926
# [2,] NA NA 0.1954023
# [3,] NA NA NA

(Just to prove we only calculated the minimum set.) Now, flip the upper triangle's data to lower triangle:

o[which(lower.tri(o),TRUE)] <- o[which(upper.tri(o),TRUE)]
o
# [,1] [,2] [,3]
# [1,] NA 0.4243697 0.3425926
# [2,] 0.4243697 NA 0.1954023
# [3,] 0.3425926 0.1954023 NA

And assign the known-value of 1 to the diagonal:

diag(o) <- 1

edited Nov 15 '18 at 0:46

answered Nov 15 '18 at 0:14

r2evans

27.1k33058

edited Nov 15 '18 at 0:46

answered Nov 15 '18 at 0:14

r2evans

27.1k33058

answered Nov 15 '18 at 0:14

r2evans

27.1k33058

answered Nov 15 '18 at 0:14

r2evans

27.1k33058

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

89H4DzMqXXpqse8D4m3HWiMKXNJ,jxOMBe,v zP8Q5O,R6BWL,D3,MylFvcW12YaJF,8yvdd7L

搜尋此網誌

Myujth