Simple way to do a weighted hot deck imputation in Stata?

I'd like to do a simple weighted hot deck imputation in Stata. In SAS the equivalent command would be the following (and note that this is a newer SAS feature, beginning with SAS/STAT 14.1 in 2015 or so):

proc surveyimpute method=hotdeck(selection=weighted);

For clarity then, the basic requirements are:

Imputations most be row-based or simultaneous. If row 1 donates x to row 3, then it must also donate y.

Must account for weights. A donor with weight=2 should be twice as likely to be selected as a donor with weight=1

I'm assuming the missing data is rectangular. In other words, if the set of potentially missing variables consists of x and y then either both are missing or neither is missing. Here's some code to generate sample data.

global miss_vars "wealth income"
global weight "weight"

set obs 6
gen id = _n
gen type = id > 3
gen income = 5000 * _n
gen wealth = income * 4 + 500 * uniform()
gen weight = 1
replace weight = 4 if mod(id-1,3) == 0

// set income & wealth missing every 3 rows
gen impute = mod(_n,3) == 0
foreach v in $miss_vars 
 replace `v' = . if impute == 1

Data looks like this:

 id type income wealth weight impute
 1. 1 0 5000 20188.03 4 0
 2. 2 0 10000 40288.81 1 0
 3. 3 0 . . 1 1
 4. 4 1 20000 80350.85 4 0
 5. 5 1 25000 100378.8 1 0
 6. 6 1 . . 1 1

So in other words, we need to randomly (with weighting) select a donor of the same type observation for each row with missing values and use that donor to fill in both income and wealth values. In practical use the generation of the type variable is of course it's own problem, but I'm keeping that very simple here to focus on the main issue.

For example, row 3 might look like either of the following post hotdeck (because it fills both income and wealth from row 1, or from row 2 (but in contrast would never take income from row 1 and the wealth from row 2):

 3. 3 0 5000 20188.03 1 1
 3. 3 0 10000 40288.81 1 1

Also, since row 1 has weight=4 and row 2 has weight=1, row 1 should be the donor 80% of the time and row 2 should be the donor 20% of the time.

edited Nov 29 '18 at 21:23

asked Nov 15 '18 at 16:44

JohnE

14.3k53459

3

The community-contributed command hotdeck might do what you want.

– Pearly Spencer
Nov 15 '18 at 16:50

add a comment |

proc surveyimpute method=hotdeck(selection=weighted);

For clarity then, the basic requirements are:

Imputations most be row-based or simultaneous. If row 1 donates x to row 3, then it must also donate y.

Must account for weights. A donor with weight=2 should be twice as likely to be selected as a donor with weight=1

global miss_vars "wealth income"
global weight "weight"

set obs 6
gen id = _n
gen type = id > 3
gen income = 5000 * _n
gen wealth = income * 4 + 500 * uniform()
gen weight = 1
replace weight = 4 if mod(id-1,3) == 0

// set income & wealth missing every 3 rows
gen impute = mod(_n,3) == 0
foreach v in $miss_vars 
 replace `v' = . if impute == 1

Data looks like this:

 id type income wealth weight impute
 1. 1 0 5000 20188.03 4 0
 2. 2 0 10000 40288.81 1 0
 3. 3 0 . . 1 1
 4. 4 1 20000 80350.85 4 0
 5. 5 1 25000 100378.8 1 0
 6. 6 1 . . 1 1

 3. 3 0 5000 20188.03 1 1
 3. 3 0 10000 40288.81 1 1

Also, since row 1 has weight=4 and row 2 has weight=1, row 1 should be the donor 80% of the time and row 2 should be the donor 20% of the time.

edited Nov 29 '18 at 21:23

asked Nov 15 '18 at 16:44

JohnE

14.3k53459

3

The community-contributed command hotdeck might do what you want.

– Pearly Spencer
Nov 15 '18 at 16:50

add a comment |

proc surveyimpute method=hotdeck(selection=weighted);

For clarity then, the basic requirements are:

Imputations most be row-based or simultaneous. If row 1 donates x to row 3, then it must also donate y.

Must account for weights. A donor with weight=2 should be twice as likely to be selected as a donor with weight=1

global miss_vars "wealth income"
global weight "weight"

set obs 6
gen id = _n
gen type = id > 3
gen income = 5000 * _n
gen wealth = income * 4 + 500 * uniform()
gen weight = 1
replace weight = 4 if mod(id-1,3) == 0

// set income & wealth missing every 3 rows
gen impute = mod(_n,3) == 0
foreach v in $miss_vars 
 replace `v' = . if impute == 1

Data looks like this:

 id type income wealth weight impute
 1. 1 0 5000 20188.03 4 0
 2. 2 0 10000 40288.81 1 0
 3. 3 0 . . 1 1
 4. 4 1 20000 80350.85 4 0
 5. 5 1 25000 100378.8 1 0
 6. 6 1 . . 1 1

 3. 3 0 5000 20188.03 1 1
 3. 3 0 10000 40288.81 1 1

Also, since row 1 has weight=4 and row 2 has weight=1, row 1 should be the donor 80% of the time and row 2 should be the donor 20% of the time.

edited Nov 29 '18 at 21:23

asked Nov 15 '18 at 16:44

JohnE

14.3k53459

proc surveyimpute method=hotdeck(selection=weighted);

For clarity then, the basic requirements are:

Imputations most be row-based or simultaneous. If row 1 donates x to row 3, then it must also donate y.

Must account for weights. A donor with weight=2 should be twice as likely to be selected as a donor with weight=1

global miss_vars "wealth income"
global weight "weight"

set obs 6
gen id = _n
gen type = id > 3
gen income = 5000 * _n
gen wealth = income * 4 + 500 * uniform()
gen weight = 1
replace weight = 4 if mod(id-1,3) == 0

// set income & wealth missing every 3 rows
gen impute = mod(_n,3) == 0
foreach v in $miss_vars 
 replace `v' = . if impute == 1

Data looks like this:

 id type income wealth weight impute
 1. 1 0 5000 20188.03 4 0
 2. 2 0 10000 40288.81 1 0
 3. 3 0 . . 1 1
 4. 4 1 20000 80350.85 4 0
 5. 5 1 25000 100378.8 1 0
 6. 6 1 . . 1 1

 3. 3 0 5000 20188.03 1 1
 3. 3 0 10000 40288.81 1 1

Also, since row 1 has weight=4 and row 2 has weight=1, row 1 should be the donor 80% of the time and row 2 should be the donor 20% of the time.

sas stata imputation

edited Nov 29 '18 at 21:23

asked Nov 15 '18 at 16:44

JohnE

14.3k53459

edited Nov 29 '18 at 21:23

asked Nov 15 '18 at 16:44

JohnE

14.3k53459

edited Nov 29 '18 at 21:23

asked Nov 15 '18 at 16:44

JohnE

14.3k53459

asked Nov 15 '18 at 16:44

JohnE

14.3k53459

asked Nov 15 '18 at 16:44

JohnE

14.3k53459

3

The community-contributed command hotdeck might do what you want.

– Pearly Spencer
Nov 15 '18 at 16:50

add a comment |

3

The community-contributed command hotdeck might do what you want.

– Pearly Spencer
Nov 15 '18 at 16:50

The community-contributed command hotdeck might do what you want.

– Pearly Spencer
Nov 15 '18 at 16:50

add a comment |

2 Answers
2

active

oldest

votes

Here's a concise and simple approach that should also be quite fast even for large datasets as it only does 2 sorts and there is nothing else that should be computationally expensive. Here's the code with minimal comments, and further below is the same code but with more extensive comments:

gen sort_order = uniform()

// save recipient rows to file, keep donors
preserve
keep if impute == 1
save recipients, replace
restore
keep if impute == 0

// prep donor cells
sort type sort_order
by type: gen weight_sum = sum($weight)
by type: gen impute_weight = $weight / weight_sum[_N]
by type: replace impute_weight = sum(impute_weight)
drop weight_sum

// bring back recipient rows and sort entire data set 
append using recipients
replace sort_order = impute_weight if impute_weight != .
gsort type -sort_order

// replace missing values via a simple replace
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if impute == 1


// extra kludge step necessary to handle top rows
gsort type sort_order
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if `v' == .

This seems to work fine for the test example but I haven't tested on larger and more complicated cases. As noted in the question, I expect this should give the same results as the SAS method:

proc surveyimpute method=hotdeck(selection=weighted);

Note also that if you don't want to use weights, you could just set them to be a column of ones (e.g. gen weight = 1).

And here it the same code, with more comments:

gen sort_order = uniform()

// split off and save the recipient rows
preserve
keep if impute == 1
save recipients, replace

// restore full dataset and keep only donor rows
restore
keep if impute == 0

// set up the donor rows. the key idea here is to set up such 
// that each donor row represents a probability interval where
// the ordering of the intervals in a cell in random (based on
// the variable "sort_order" and the width of the interval is
// proportional to the weight
sort type sort_order
by type: gen weight_sum = sum($weight)
by type: gen impute_weight = $weight / weight_sum[_N]
by type: replace impute_weight = sum(impute_weight)
drop weight_sum

// append with recipients so we again have a full datasets
// with both donors and recipients
append using recipients

// now we intersperse the donors and recipients using "sort_order"
// which is based on randomness and weight for the donors and
// is purely random for the recipients
replace sort_order = impute_weight if impute_weight != .
gsort type -sort_order

// fill recipient variables from donor rows. conceptually
// this is very simple. each recipient row is in within the
// range of some donor cell. in practice, that is simply 
// the nearest preceding donor cell 
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if impute == 1


// however, there's a minor practical issue that recipient
// cells that are in the range of the first donor cell need
// to be filled by the nearest successive donor cell, which
// can be done by reversing the sort and then filling from
// the nearest preceding donor cell
gsort type sort_order
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if `v' == .

edited Nov 19 '18 at 2:12

answered Nov 16 '18 at 18:14

JohnE

14.3k53459

add a comment |

Here are some brief notes about the community contributed hotdeck routines by Adrian Mander and David Clayton mentioned in the comments above by @PearlySpencer (plus a followup version):

There seem to be a couple versions:

hotdeck.ado (2007) https://ideas.repec.org/c/boc/bocode/s366901.html

whotdeck.ado (2011) https://econpapers.repec.org/software/bocbocode/s433201.htm

As best I can tell both of these are designed to do an Approximate Bayesian Bootstrap which is essentially a multiple-imputation version of a hotdeck.
Unfortunately neither of them seems to handle sample (or survey) weights. The second of the two ("whotdeck") does have a parameter for weights but this appears to be for predicting "missingness" and does not have anything to do with sample/survey weights.

The first one ("hotdeck") does at least seem to do a standard hotdeck, so may be used in that way if you don't need weights. The second one ("whotdeck") probably does a simple hotdeck also, but the syntax was a little trickier and I didn't succeed in getting it to do so (which is probably a failure by me and in any event is not to knock it as it seems designed for more complex situations).

I emailed Adrian Mander and he said he doesn't use stackoverflow, but that it would be OK for me to post his email response to my question about using sample/survey weights with hotdeck or whotdeck:

Interesting problem, if the weights are frequency weights then the easiest thing to do is expand freq_weight and then use hotdeck.

It might be able to be done with a single line of code to make it work with other types of weight because currently the imputation is done by randomly ordering the rows of your dataset by generating a random number and then sorting.. with weights you would need to generate random numbers and then probably multiply the weights to the random numbers and then order them (I think this sort of thing would work but this idea has just popped into my head so would need some thinking about).

edited Nov 21 '18 at 18:59

answered Nov 19 '18 at 19:30

JohnE

14.3k53459

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53324137%2fsimple-way-to-do-a-weighted-hot-deck-imputation-in-stata%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

gen sort_order = uniform()

// save recipient rows to file, keep donors
preserve
keep if impute == 1
save recipients, replace
restore
keep if impute == 0

// prep donor cells
sort type sort_order
by type: gen weight_sum = sum($weight)
by type: gen impute_weight = $weight / weight_sum[_N]
by type: replace impute_weight = sum(impute_weight)
drop weight_sum

// bring back recipient rows and sort entire data set 
append using recipients
replace sort_order = impute_weight if impute_weight != .
gsort type -sort_order

// replace missing values via a simple replace
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if impute == 1


// extra kludge step necessary to handle top rows
gsort type sort_order
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if `v' == .

This seems to work fine for the test example but I haven't tested on larger and more complicated cases. As noted in the question, I expect this should give the same results as the SAS method:

proc surveyimpute method=hotdeck(selection=weighted);

Note also that if you don't want to use weights, you could just set them to be a column of ones (e.g. gen weight = 1).

And here it the same code, with more comments:

gen sort_order = uniform()

// split off and save the recipient rows
preserve
keep if impute == 1
save recipients, replace

// restore full dataset and keep only donor rows
restore
keep if impute == 0

// set up the donor rows. the key idea here is to set up such 
// that each donor row represents a probability interval where
// the ordering of the intervals in a cell in random (based on
// the variable "sort_order" and the width of the interval is
// proportional to the weight
sort type sort_order
by type: gen weight_sum = sum($weight)
by type: gen impute_weight = $weight / weight_sum[_N]
by type: replace impute_weight = sum(impute_weight)
drop weight_sum

// append with recipients so we again have a full datasets
// with both donors and recipients
append using recipients

// now we intersperse the donors and recipients using "sort_order"
// which is based on randomness and weight for the donors and
// is purely random for the recipients
replace sort_order = impute_weight if impute_weight != .
gsort type -sort_order

// fill recipient variables from donor rows. conceptually
// this is very simple. each recipient row is in within the
// range of some donor cell. in practice, that is simply 
// the nearest preceding donor cell 
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if impute == 1


// however, there's a minor practical issue that recipient
// cells that are in the range of the first donor cell need
// to be filled by the nearest successive donor cell, which
// can be done by reversing the sort and then filling from
// the nearest preceding donor cell
gsort type sort_order
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if `v' == .

edited Nov 19 '18 at 2:12

answered Nov 16 '18 at 18:14

JohnE

14.3k53459

add a comment |

gen sort_order = uniform()

// save recipient rows to file, keep donors
preserve
keep if impute == 1
save recipients, replace
restore
keep if impute == 0

// prep donor cells
sort type sort_order
by type: gen weight_sum = sum($weight)
by type: gen impute_weight = $weight / weight_sum[_N]
by type: replace impute_weight = sum(impute_weight)
drop weight_sum

// bring back recipient rows and sort entire data set 
append using recipients
replace sort_order = impute_weight if impute_weight != .
gsort type -sort_order

// replace missing values via a simple replace
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if impute == 1


// extra kludge step necessary to handle top rows
gsort type sort_order
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if `v' == .

This seems to work fine for the test example but I haven't tested on larger and more complicated cases. As noted in the question, I expect this should give the same results as the SAS method:

proc surveyimpute method=hotdeck(selection=weighted);

Note also that if you don't want to use weights, you could just set them to be a column of ones (e.g. gen weight = 1).

And here it the same code, with more comments:

gen sort_order = uniform()

// split off and save the recipient rows
preserve
keep if impute == 1
save recipients, replace

// restore full dataset and keep only donor rows
restore
keep if impute == 0

// set up the donor rows. the key idea here is to set up such 
// that each donor row represents a probability interval where
// the ordering of the intervals in a cell in random (based on
// the variable "sort_order" and the width of the interval is
// proportional to the weight
sort type sort_order
by type: gen weight_sum = sum($weight)
by type: gen impute_weight = $weight / weight_sum[_N]
by type: replace impute_weight = sum(impute_weight)
drop weight_sum

// append with recipients so we again have a full datasets
// with both donors and recipients
append using recipients

// now we intersperse the donors and recipients using "sort_order"
// which is based on randomness and weight for the donors and
// is purely random for the recipients
replace sort_order = impute_weight if impute_weight != .
gsort type -sort_order

// fill recipient variables from donor rows. conceptually
// this is very simple. each recipient row is in within the
// range of some donor cell. in practice, that is simply 
// the nearest preceding donor cell 
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if impute == 1


// however, there's a minor practical issue that recipient
// cells that are in the range of the first donor cell need
// to be filled by the nearest successive donor cell, which
// can be done by reversing the sort and then filling from
// the nearest preceding donor cell
gsort type sort_order
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if `v' == .

edited Nov 19 '18 at 2:12

answered Nov 16 '18 at 18:14

JohnE

14.3k53459

add a comment |

gen sort_order = uniform()

// save recipient rows to file, keep donors
preserve
keep if impute == 1
save recipients, replace
restore
keep if impute == 0

// prep donor cells
sort type sort_order
by type: gen weight_sum = sum($weight)
by type: gen impute_weight = $weight / weight_sum[_N]
by type: replace impute_weight = sum(impute_weight)
drop weight_sum

// bring back recipient rows and sort entire data set 
append using recipients
replace sort_order = impute_weight if impute_weight != .
gsort type -sort_order

// replace missing values via a simple replace
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if impute == 1


// extra kludge step necessary to handle top rows
gsort type sort_order
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if `v' == .

This seems to work fine for the test example but I haven't tested on larger and more complicated cases. As noted in the question, I expect this should give the same results as the SAS method:

proc surveyimpute method=hotdeck(selection=weighted);

Note also that if you don't want to use weights, you could just set them to be a column of ones (e.g. gen weight = 1).

And here it the same code, with more comments:

gen sort_order = uniform()

// split off and save the recipient rows
preserve
keep if impute == 1
save recipients, replace

// restore full dataset and keep only donor rows
restore
keep if impute == 0

// set up the donor rows. the key idea here is to set up such 
// that each donor row represents a probability interval where
// the ordering of the intervals in a cell in random (based on
// the variable "sort_order" and the width of the interval is
// proportional to the weight
sort type sort_order
by type: gen weight_sum = sum($weight)
by type: gen impute_weight = $weight / weight_sum[_N]
by type: replace impute_weight = sum(impute_weight)
drop weight_sum

// append with recipients so we again have a full datasets
// with both donors and recipients
append using recipients

// now we intersperse the donors and recipients using "sort_order"
// which is based on randomness and weight for the donors and
// is purely random for the recipients
replace sort_order = impute_weight if impute_weight != .
gsort type -sort_order

// fill recipient variables from donor rows. conceptually
// this is very simple. each recipient row is in within the
// range of some donor cell. in practice, that is simply 
// the nearest preceding donor cell 
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if impute == 1


// however, there's a minor practical issue that recipient
// cells that are in the range of the first donor cell need
// to be filled by the nearest successive donor cell, which
// can be done by reversing the sort and then filling from
// the nearest preceding donor cell
gsort type sort_order
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if `v' == .

edited Nov 19 '18 at 2:12

answered Nov 16 '18 at 18:14

JohnE

14.3k53459

gen sort_order = uniform()

// save recipient rows to file, keep donors
preserve
keep if impute == 1
save recipients, replace
restore
keep if impute == 0

// prep donor cells
sort type sort_order
by type: gen weight_sum = sum($weight)
by type: gen impute_weight = $weight / weight_sum[_N]
by type: replace impute_weight = sum(impute_weight)
drop weight_sum

// bring back recipient rows and sort entire data set 
append using recipients
replace sort_order = impute_weight if impute_weight != .
gsort type -sort_order

// replace missing values via a simple replace
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if impute == 1


// extra kludge step necessary to handle top rows
gsort type sort_order
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if `v' == .

This seems to work fine for the test example but I haven't tested on larger and more complicated cases. As noted in the question, I expect this should give the same results as the SAS method:

proc surveyimpute method=hotdeck(selection=weighted);

Note also that if you don't want to use weights, you could just set them to be a column of ones (e.g. gen weight = 1).

And here it the same code, with more comments:

gen sort_order = uniform()

// split off and save the recipient rows
preserve
keep if impute == 1
save recipients, replace

// restore full dataset and keep only donor rows
restore
keep if impute == 0

// set up the donor rows. the key idea here is to set up such 
// that each donor row represents a probability interval where
// the ordering of the intervals in a cell in random (based on
// the variable "sort_order" and the width of the interval is
// proportional to the weight
sort type sort_order
by type: gen weight_sum = sum($weight)
by type: gen impute_weight = $weight / weight_sum[_N]
by type: replace impute_weight = sum(impute_weight)
drop weight_sum

// append with recipients so we again have a full datasets
// with both donors and recipients
append using recipients

// now we intersperse the donors and recipients using "sort_order"
// which is based on randomness and weight for the donors and
// is purely random for the recipients
replace sort_order = impute_weight if impute_weight != .
gsort type -sort_order

// fill recipient variables from donor rows. conceptually
// this is very simple. each recipient row is in within the
// range of some donor cell. in practice, that is simply 
// the nearest preceding donor cell 
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if impute == 1


// however, there's a minor practical issue that recipient
// cells that are in the range of the first donor cell need
// to be filled by the nearest successive donor cell, which
// can be done by reversing the sort and then filling from
// the nearest preceding donor cell
gsort type sort_order
foreach v in $miss_vars 
 by type: replace `v' = `v'[_n-1] if `v' == .

edited Nov 19 '18 at 2:12

answered Nov 16 '18 at 18:14

JohnE

14.3k53459

edited Nov 19 '18 at 2:12

answered Nov 16 '18 at 18:14

JohnE

14.3k53459

answered Nov 16 '18 at 18:14

JohnE

14.3k53459

answered Nov 16 '18 at 18:14

JohnE

14.3k53459

add a comment |

Here are some brief notes about the community contributed hotdeck routines by Adrian Mander and David Clayton mentioned in the comments above by @PearlySpencer (plus a followup version):

There seem to be a couple versions:

hotdeck.ado (2007) https://ideas.repec.org/c/boc/bocode/s366901.html

whotdeck.ado (2011) https://econpapers.repec.org/software/bocbocode/s433201.htm

I emailed Adrian Mander and he said he doesn't use stackoverflow, but that it would be OK for me to post his email response to my question about using sample/survey weights with hotdeck or whotdeck:

Interesting problem, if the weights are frequency weights then the easiest thing to do is expand freq_weight and then use hotdeck.

It might be able to be done with a single line of code to make it work with other types of weight because currently the imputation is done by randomly ordering the rows of your dataset by generating a random number and then sorting.. with weights you would need to generate random numbers and then probably multiply the weights to the random numbers and then order them (I think this sort of thing would work but this idea has just popped into my head so would need some thinking about).

edited Nov 21 '18 at 18:59

answered Nov 19 '18 at 19:30

JohnE

14.3k53459

add a comment |

Here are some brief notes about the community contributed hotdeck routines by Adrian Mander and David Clayton mentioned in the comments above by @PearlySpencer (plus a followup version):

There seem to be a couple versions:

hotdeck.ado (2007) https://ideas.repec.org/c/boc/bocode/s366901.html

whotdeck.ado (2011) https://econpapers.repec.org/software/bocbocode/s433201.htm

I emailed Adrian Mander and he said he doesn't use stackoverflow, but that it would be OK for me to post his email response to my question about using sample/survey weights with hotdeck or whotdeck:

Interesting problem, if the weights are frequency weights then the easiest thing to do is expand freq_weight and then use hotdeck.

It might be able to be done with a single line of code to make it work with other types of weight because currently the imputation is done by randomly ordering the rows of your dataset by generating a random number and then sorting.. with weights you would need to generate random numbers and then probably multiply the weights to the random numbers and then order them (I think this sort of thing would work but this idea has just popped into my head so would need some thinking about).

edited Nov 21 '18 at 18:59

answered Nov 19 '18 at 19:30

JohnE

14.3k53459

add a comment |

Here are some brief notes about the community contributed hotdeck routines by Adrian Mander and David Clayton mentioned in the comments above by @PearlySpencer (plus a followup version):

There seem to be a couple versions:

hotdeck.ado (2007) https://ideas.repec.org/c/boc/bocode/s366901.html

whotdeck.ado (2011) https://econpapers.repec.org/software/bocbocode/s433201.htm

I emailed Adrian Mander and he said he doesn't use stackoverflow, but that it would be OK for me to post his email response to my question about using sample/survey weights with hotdeck or whotdeck:

Interesting problem, if the weights are frequency weights then the easiest thing to do is expand freq_weight and then use hotdeck.

It might be able to be done with a single line of code to make it work with other types of weight because currently the imputation is done by randomly ordering the rows of your dataset by generating a random number and then sorting.. with weights you would need to generate random numbers and then probably multiply the weights to the random numbers and then order them (I think this sort of thing would work but this idea has just popped into my head so would need some thinking about).

edited Nov 21 '18 at 18:59

answered Nov 19 '18 at 19:30

JohnE

14.3k53459

Here are some brief notes about the community contributed hotdeck routines by Adrian Mander and David Clayton mentioned in the comments above by @PearlySpencer (plus a followup version):

There seem to be a couple versions:

hotdeck.ado (2007) https://ideas.repec.org/c/boc/bocode/s366901.html

whotdeck.ado (2011) https://econpapers.repec.org/software/bocbocode/s433201.htm

I emailed Adrian Mander and he said he doesn't use stackoverflow, but that it would be OK for me to post his email response to my question about using sample/survey weights with hotdeck or whotdeck:

Interesting problem, if the weights are frequency weights then the easiest thing to do is expand freq_weight and then use hotdeck.

It might be able to be done with a single line of code to make it work with other types of weight because currently the imputation is done by randomly ordering the rows of your dataset by generating a random number and then sorting.. with weights you would need to generate random numbers and then probably multiply the weights to the random numbers and then order them (I think this sort of thing would work but this idea has just popped into my head so would need some thinking about).

edited Nov 21 '18 at 18:59

answered Nov 19 '18 at 19:30

JohnE

14.3k53459

edited Nov 21 '18 at 18:59

answered Nov 19 '18 at 19:30

JohnE

14.3k53459

answered Nov 19 '18 at 19:30

JohnE

14.3k53459

answered Nov 19 '18 at 19:30

JohnE

14.3k53459

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

XRb9 aX rwl30x,t,lh83 F7atgLbC 2t4CtXmt7iKK2jfS6zebe1sgjm,IhHv,l0GNQQ zRpiyEzX Ixv 5 J0pQF

搜尋此網誌

Myujth