Convert list of edges dataframe to adjacency matrix dataframe









up vote
0
down vote

favorite












My dataframe represents a list of edges of a graph and has the following format:



 node1 node2 weight
0 a c 1
1 b c 2
2 d c 3


My goal is to generate the equivalent adjacency matrix:



 a b c d
a 0 0 1 0
b 0 0 2 0
c 0 0 0 3
d 0 0 0 0


At the moment, while constructing the the dataframe of edges I count the number of nodes and create an NxN data frame and fill in the values manually. what is the pandas way of generating the second dataframe from the first one?










share|improve this question

























    up vote
    0
    down vote

    favorite












    My dataframe represents a list of edges of a graph and has the following format:



     node1 node2 weight
    0 a c 1
    1 b c 2
    2 d c 3


    My goal is to generate the equivalent adjacency matrix:



     a b c d
    a 0 0 1 0
    b 0 0 2 0
    c 0 0 0 3
    d 0 0 0 0


    At the moment, while constructing the the dataframe of edges I count the number of nodes and create an NxN data frame and fill in the values manually. what is the pandas way of generating the second dataframe from the first one?










    share|improve this question























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      My dataframe represents a list of edges of a graph and has the following format:



       node1 node2 weight
      0 a c 1
      1 b c 2
      2 d c 3


      My goal is to generate the equivalent adjacency matrix:



       a b c d
      a 0 0 1 0
      b 0 0 2 0
      c 0 0 0 3
      d 0 0 0 0


      At the moment, while constructing the the dataframe of edges I count the number of nodes and create an NxN data frame and fill in the values manually. what is the pandas way of generating the second dataframe from the first one?










      share|improve this question













      My dataframe represents a list of edges of a graph and has the following format:



       node1 node2 weight
      0 a c 1
      1 b c 2
      2 d c 3


      My goal is to generate the equivalent adjacency matrix:



       a b c d
      a 0 0 1 0
      b 0 0 2 0
      c 0 0 0 3
      d 0 0 0 0


      At the moment, while constructing the the dataframe of edges I count the number of nodes and create an NxN data frame and fill in the values manually. what is the pandas way of generating the second dataframe from the first one?







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 11 at 5:26









      Hamza

      7029




      7029






















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote













          Use pivot with reindex



          In [20]: vals = np.unique(df[['node1', 'node2']])

          In [21]: df.pivot(index='node1', columns='node2', values='weight'
          ).reindex(columns=vals, index=vals, fill_value=0)
          Out[21]:
          node2 a b c d
          node1
          a 0 0 1 0
          b 0 0 2 0
          c 0 0 0 0
          d 0 0 3 0


          Or use set_index and unstack



          In [27]: (df.set_index(['node1', 'node2'])['weight'].unstack()
          .reindex(columns=vals, index=vals, fill_value=0))
          Out[27]:
          node2 a b c d
          node1
          a 0 0 1 0
          b 0 0 2 0
          c 0 0 0 0
          d 0 0 3 0





          share|improve this answer



























            up vote
            1
            down vote













            Decided to have a little fun with the problem.



            You can convert node1 and node2 to Categorical dtype and then use groupby.



            from functools import partial

            vals = np.unique(df[['node1', 'node2']])
            p = partial(pd.Categorical, categories=vals)
            df['node1'], df['node2'] = p(df['node1']), p(df['node2'])

            (df.groupby(['node1', 'node2'])
            .first()
            .fillna(0, downcast='infer')
            .weight
            .unstack())

            node2 a b c d
            node1
            a 0 0 1 0
            b 0 0 2 0
            c 0 0 0 0
            d 0 0 3 0



            Another option is setting the underlying array values directly.



            df2 = pd.DataFrame(0, index=vals, columns=vals)
            f = df2.index.get_indexer
            df2.values[f(df.node1), f(df.node2)] = df.weight.values

            print(df2)
            a b c d
            a 0 0 1 0
            b 0 0 2 0
            c 0 0 0 0
            d 0 0 3 0





            share|improve this answer






















              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53246086%2fconvert-list-of-edges-dataframe-to-adjacency-matrix-dataframe%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              1
              down vote













              Use pivot with reindex



              In [20]: vals = np.unique(df[['node1', 'node2']])

              In [21]: df.pivot(index='node1', columns='node2', values='weight'
              ).reindex(columns=vals, index=vals, fill_value=0)
              Out[21]:
              node2 a b c d
              node1
              a 0 0 1 0
              b 0 0 2 0
              c 0 0 0 0
              d 0 0 3 0


              Or use set_index and unstack



              In [27]: (df.set_index(['node1', 'node2'])['weight'].unstack()
              .reindex(columns=vals, index=vals, fill_value=0))
              Out[27]:
              node2 a b c d
              node1
              a 0 0 1 0
              b 0 0 2 0
              c 0 0 0 0
              d 0 0 3 0





              share|improve this answer
























                up vote
                1
                down vote













                Use pivot with reindex



                In [20]: vals = np.unique(df[['node1', 'node2']])

                In [21]: df.pivot(index='node1', columns='node2', values='weight'
                ).reindex(columns=vals, index=vals, fill_value=0)
                Out[21]:
                node2 a b c d
                node1
                a 0 0 1 0
                b 0 0 2 0
                c 0 0 0 0
                d 0 0 3 0


                Or use set_index and unstack



                In [27]: (df.set_index(['node1', 'node2'])['weight'].unstack()
                .reindex(columns=vals, index=vals, fill_value=0))
                Out[27]:
                node2 a b c d
                node1
                a 0 0 1 0
                b 0 0 2 0
                c 0 0 0 0
                d 0 0 3 0





                share|improve this answer






















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  Use pivot with reindex



                  In [20]: vals = np.unique(df[['node1', 'node2']])

                  In [21]: df.pivot(index='node1', columns='node2', values='weight'
                  ).reindex(columns=vals, index=vals, fill_value=0)
                  Out[21]:
                  node2 a b c d
                  node1
                  a 0 0 1 0
                  b 0 0 2 0
                  c 0 0 0 0
                  d 0 0 3 0


                  Or use set_index and unstack



                  In [27]: (df.set_index(['node1', 'node2'])['weight'].unstack()
                  .reindex(columns=vals, index=vals, fill_value=0))
                  Out[27]:
                  node2 a b c d
                  node1
                  a 0 0 1 0
                  b 0 0 2 0
                  c 0 0 0 0
                  d 0 0 3 0





                  share|improve this answer












                  Use pivot with reindex



                  In [20]: vals = np.unique(df[['node1', 'node2']])

                  In [21]: df.pivot(index='node1', columns='node2', values='weight'
                  ).reindex(columns=vals, index=vals, fill_value=0)
                  Out[21]:
                  node2 a b c d
                  node1
                  a 0 0 1 0
                  b 0 0 2 0
                  c 0 0 0 0
                  d 0 0 3 0


                  Or use set_index and unstack



                  In [27]: (df.set_index(['node1', 'node2'])['weight'].unstack()
                  .reindex(columns=vals, index=vals, fill_value=0))
                  Out[27]:
                  node2 a b c d
                  node1
                  a 0 0 1 0
                  b 0 0 2 0
                  c 0 0 0 0
                  d 0 0 3 0






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 11 at 5:31









                  Zero

                  37.9k76388




                  37.9k76388






















                      up vote
                      1
                      down vote













                      Decided to have a little fun with the problem.



                      You can convert node1 and node2 to Categorical dtype and then use groupby.



                      from functools import partial

                      vals = np.unique(df[['node1', 'node2']])
                      p = partial(pd.Categorical, categories=vals)
                      df['node1'], df['node2'] = p(df['node1']), p(df['node2'])

                      (df.groupby(['node1', 'node2'])
                      .first()
                      .fillna(0, downcast='infer')
                      .weight
                      .unstack())

                      node2 a b c d
                      node1
                      a 0 0 1 0
                      b 0 0 2 0
                      c 0 0 0 0
                      d 0 0 3 0



                      Another option is setting the underlying array values directly.



                      df2 = pd.DataFrame(0, index=vals, columns=vals)
                      f = df2.index.get_indexer
                      df2.values[f(df.node1), f(df.node2)] = df.weight.values

                      print(df2)
                      a b c d
                      a 0 0 1 0
                      b 0 0 2 0
                      c 0 0 0 0
                      d 0 0 3 0





                      share|improve this answer


























                        up vote
                        1
                        down vote













                        Decided to have a little fun with the problem.



                        You can convert node1 and node2 to Categorical dtype and then use groupby.



                        from functools import partial

                        vals = np.unique(df[['node1', 'node2']])
                        p = partial(pd.Categorical, categories=vals)
                        df['node1'], df['node2'] = p(df['node1']), p(df['node2'])

                        (df.groupby(['node1', 'node2'])
                        .first()
                        .fillna(0, downcast='infer')
                        .weight
                        .unstack())

                        node2 a b c d
                        node1
                        a 0 0 1 0
                        b 0 0 2 0
                        c 0 0 0 0
                        d 0 0 3 0



                        Another option is setting the underlying array values directly.



                        df2 = pd.DataFrame(0, index=vals, columns=vals)
                        f = df2.index.get_indexer
                        df2.values[f(df.node1), f(df.node2)] = df.weight.values

                        print(df2)
                        a b c d
                        a 0 0 1 0
                        b 0 0 2 0
                        c 0 0 0 0
                        d 0 0 3 0





                        share|improve this answer
























                          up vote
                          1
                          down vote










                          up vote
                          1
                          down vote









                          Decided to have a little fun with the problem.



                          You can convert node1 and node2 to Categorical dtype and then use groupby.



                          from functools import partial

                          vals = np.unique(df[['node1', 'node2']])
                          p = partial(pd.Categorical, categories=vals)
                          df['node1'], df['node2'] = p(df['node1']), p(df['node2'])

                          (df.groupby(['node1', 'node2'])
                          .first()
                          .fillna(0, downcast='infer')
                          .weight
                          .unstack())

                          node2 a b c d
                          node1
                          a 0 0 1 0
                          b 0 0 2 0
                          c 0 0 0 0
                          d 0 0 3 0



                          Another option is setting the underlying array values directly.



                          df2 = pd.DataFrame(0, index=vals, columns=vals)
                          f = df2.index.get_indexer
                          df2.values[f(df.node1), f(df.node2)] = df.weight.values

                          print(df2)
                          a b c d
                          a 0 0 1 0
                          b 0 0 2 0
                          c 0 0 0 0
                          d 0 0 3 0





                          share|improve this answer














                          Decided to have a little fun with the problem.



                          You can convert node1 and node2 to Categorical dtype and then use groupby.



                          from functools import partial

                          vals = np.unique(df[['node1', 'node2']])
                          p = partial(pd.Categorical, categories=vals)
                          df['node1'], df['node2'] = p(df['node1']), p(df['node2'])

                          (df.groupby(['node1', 'node2'])
                          .first()
                          .fillna(0, downcast='infer')
                          .weight
                          .unstack())

                          node2 a b c d
                          node1
                          a 0 0 1 0
                          b 0 0 2 0
                          c 0 0 0 0
                          d 0 0 3 0



                          Another option is setting the underlying array values directly.



                          df2 = pd.DataFrame(0, index=vals, columns=vals)
                          f = df2.index.get_indexer
                          df2.values[f(df.node1), f(df.node2)] = df.weight.values

                          print(df2)
                          a b c d
                          a 0 0 1 0
                          b 0 0 2 0
                          c 0 0 0 0
                          d 0 0 3 0






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 11 at 9:13

























                          answered Nov 11 at 5:53









                          coldspeed

                          111k17101170




                          111k17101170



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53246086%2fconvert-list-of-edges-dataframe-to-adjacency-matrix-dataframe%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Top Tejano songwriter Luis Silva dead of heart attack at 64

                              ReactJS Fetched API data displays live - need Data displayed static

                              Evgeni Malkin