One-hot encoding with model.matrix. Is the intercept required?










1















I understand what one-hot encoding does in converting a factor with k levels to k-1 dummy variables, but what I'm confused about is whether the intercept is required to be specified or can be left out. For example, this removes the intercept:



# Predictor variables of train dataset
x <- model.matrix(y ~ ., train_data)[,-1]


But the model output seems the same regardless of whether I remove it or not.










share|improve this question






















  • I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

    – thelatemail
    Nov 16 '18 at 3:24












  • This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

    – mickey
    Nov 16 '18 at 3:28











  • Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

    – LucaS
    Nov 16 '18 at 3:41






  • 2





    The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

    – Nutle
    Nov 16 '18 at 14:38















1















I understand what one-hot encoding does in converting a factor with k levels to k-1 dummy variables, but what I'm confused about is whether the intercept is required to be specified or can be left out. For example, this removes the intercept:



# Predictor variables of train dataset
x <- model.matrix(y ~ ., train_data)[,-1]


But the model output seems the same regardless of whether I remove it or not.










share|improve this question






















  • I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

    – thelatemail
    Nov 16 '18 at 3:24












  • This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

    – mickey
    Nov 16 '18 at 3:28











  • Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

    – LucaS
    Nov 16 '18 at 3:41






  • 2





    The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

    – Nutle
    Nov 16 '18 at 14:38













1












1








1








I understand what one-hot encoding does in converting a factor with k levels to k-1 dummy variables, but what I'm confused about is whether the intercept is required to be specified or can be left out. For example, this removes the intercept:



# Predictor variables of train dataset
x <- model.matrix(y ~ ., train_data)[,-1]


But the model output seems the same regardless of whether I remove it or not.










share|improve this question














I understand what one-hot encoding does in converting a factor with k levels to k-1 dummy variables, but what I'm confused about is whether the intercept is required to be specified or can be left out. For example, this removes the intercept:



# Predictor variables of train dataset
x <- model.matrix(y ~ ., train_data)[,-1]


But the model output seems the same regardless of whether I remove it or not.







r r-caret






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 16 '18 at 2:26









LucaSLucaS

317112




317112












  • I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

    – thelatemail
    Nov 16 '18 at 3:24












  • This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

    – mickey
    Nov 16 '18 at 3:28











  • Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

    – LucaS
    Nov 16 '18 at 3:41






  • 2





    The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

    – Nutle
    Nov 16 '18 at 14:38

















  • I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

    – thelatemail
    Nov 16 '18 at 3:24












  • This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

    – mickey
    Nov 16 '18 at 3:28











  • Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

    – LucaS
    Nov 16 '18 at 3:41






  • 2





    The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

    – Nutle
    Nov 16 '18 at 14:38
















I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

– thelatemail
Nov 16 '18 at 3:24






I'm not sure exactly what you're trying to do, but you can leave the intercept out like model.matrix(~ a + 0, data=data.frame(a=factor(1:3))) which will give you a slightly different result.

– thelatemail
Nov 16 '18 at 3:24














This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

– mickey
Nov 16 '18 at 3:28





This might be better asked over at Cross Validated since you're really just reparametrizing the model. A linear combination of the dummy variables would act as the new "intercept" (provided the rank of the model matrices are the same).

– mickey
Nov 16 '18 at 3:28













Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

– LucaS
Nov 16 '18 at 3:41





Some online examples I'd seen had removed the in intercept with [,-1], but then I thought I had read somewhere you should only do that if you retain as many dummies as there are levels of the factor. I just wasn't sure. I've posted over at Cross Validated.

– LucaS
Nov 16 '18 at 3:41




2




2





The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

– Nutle
Nov 16 '18 at 14:38





The main idea is that you don't want your model.matrix to be singular. So it's either the intercept + k-1 dummies, or no intercept and all k dummies. It can be shown that the result should be the same, just with slight differences in parameter interpretation.

– Nutle
Nov 16 '18 at 14:38












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330579%2fone-hot-encoding-with-model-matrix-is-the-intercept-required%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330579%2fone-hot-encoding-with-model-matrix-is-the-intercept-required%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

政党

天津地下鉄3号線