One-hot encoding with model.matrix. Is the intercept required?

I understand what one-hot encoding does in converting a factor with k levels to k-1 dummy variables, but what I'm confused about is whether the intercept is required to be specified or can be left out. For example, this removes the intercept:

# Predictor variables of train dataset
x <- model.matrix(y ~ ., train_data)[,-1]

But the model output seems the same regardless of whether I remove it or not.

asked Nov 16 '18 at 2:26

LucaS

317112

I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

– thelatemail
Nov 16 '18 at 3:24

This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

– mickey
Nov 16 '18 at 3:28

Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

– LucaS
Nov 16 '18 at 3:41

2

The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

– Nutle
Nov 16 '18 at 14:38

add a comment |

# Predictor variables of train dataset
x <- model.matrix(y ~ ., train_data)[,-1]

But the model output seems the same regardless of whether I remove it or not.

asked Nov 16 '18 at 2:26

LucaS

317112

I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

– thelatemail
Nov 16 '18 at 3:24

This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

– mickey
Nov 16 '18 at 3:28

Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

– LucaS
Nov 16 '18 at 3:41

2

The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

– Nutle
Nov 16 '18 at 14:38

add a comment |

# Predictor variables of train dataset
x <- model.matrix(y ~ ., train_data)[,-1]

But the model output seems the same regardless of whether I remove it or not.

asked Nov 16 '18 at 2:26

LucaS

317112

# Predictor variables of train dataset
x <- model.matrix(y ~ ., train_data)[,-1]

But the model output seems the same regardless of whether I remove it or not.

r r-caret

asked Nov 16 '18 at 2:26

LucaS

317112

asked Nov 16 '18 at 2:26

LucaS

317112

asked Nov 16 '18 at 2:26

LucaS

317112

asked Nov 16 '18 at 2:26

LucaS

317112

asked Nov 16 '18 at 2:26

LucaS

317112

I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

– thelatemail
Nov 16 '18 at 3:24

This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

– mickey
Nov 16 '18 at 3:28

Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

– LucaS
Nov 16 '18 at 3:41

2

The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

– Nutle
Nov 16 '18 at 14:38

add a comment |

I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

– thelatemail
Nov 16 '18 at 3:24

This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

– mickey
Nov 16 '18 at 3:28

Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

– LucaS
Nov 16 '18 at 3:41

2

The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

– Nutle
Nov 16 '18 at 14:38

I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

– thelatemail
Nov 16 '18 at 3:24

This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

– mickey
Nov 16 '18 at 3:28

Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

– LucaS
Nov 16 '18 at 3:41

The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

– Nutle
Nov 16 '18 at 14:38

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330579%2fone-hot-encoding-with-model-matrix-is-the-intercept-required%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth