strange issue with glom() method with Pyspark DataFrame










0















I am using spark version 2.3 and facing an strange issue with dates while using the glom method to see the partitions size.



Below is my dataframe.



df1_data = spark.sql("""
SELECT *
from udb.partitioned_table_df1 where VEH_ENGINE in
(
'ABCDP3F27HL239911'
'ABCDP3F27HL230011'
)
""");

+-----------------+-------------------------+------------------------+----------

-----------+
| VEH_ENGINE |VEH_COUNTRY |VEH_RETAIL_SALE_DATE | VEH_MODEL_YEAR|
+-----------------+-------------------------+------------------------+---------------------+
|ABCDP3F27HL239911| CAN| 0001-01-01| 2017|
|ABCDP3F27HL230011| USA| 0001-01-01| 2018|
+-----------------+-------------------------+------------------------+---------------------+


At the source we have default start date as '0001-01-01' and same date has been loaded to pyspark dataframe as date column. no issues.
I can perform rest of the operations;join,filters etc as usual.
but I am facing an issue when I was looking at the spark partitions which I normally do.



partitionSizedf = df1_data.rdd.glom().map(len).collect()


I am getting below error:



ValueError: ('ordinal must be >= 1', <function <lambda> at 0x7fcc8d1c5848>, (u'ABCDP3F27HL239911', u'CAN', -719164, 2017))









share|improve this question


























    0















    I am using spark version 2.3 and facing an strange issue with dates while using the glom method to see the partitions size.



    Below is my dataframe.



    df1_data = spark.sql("""
    SELECT *
    from udb.partitioned_table_df1 where VEH_ENGINE in
    (
    'ABCDP3F27HL239911'
    'ABCDP3F27HL230011'
    )
    """);

    +-----------------+-------------------------+------------------------+----------

    -----------+
    | VEH_ENGINE |VEH_COUNTRY |VEH_RETAIL_SALE_DATE | VEH_MODEL_YEAR|
    +-----------------+-------------------------+------------------------+---------------------+
    |ABCDP3F27HL239911| CAN| 0001-01-01| 2017|
    |ABCDP3F27HL230011| USA| 0001-01-01| 2018|
    +-----------------+-------------------------+------------------------+---------------------+


    At the source we have default start date as '0001-01-01' and same date has been loaded to pyspark dataframe as date column. no issues.
    I can perform rest of the operations;join,filters etc as usual.
    but I am facing an issue when I was looking at the spark partitions which I normally do.



    partitionSizedf = df1_data.rdd.glom().map(len).collect()


    I am getting below error:



    ValueError: ('ordinal must be >= 1', <function <lambda> at 0x7fcc8d1c5848>, (u'ABCDP3F27HL239911', u'CAN', -719164, 2017))









    share|improve this question
























      0












      0








      0








      I am using spark version 2.3 and facing an strange issue with dates while using the glom method to see the partitions size.



      Below is my dataframe.



      df1_data = spark.sql("""
      SELECT *
      from udb.partitioned_table_df1 where VEH_ENGINE in
      (
      'ABCDP3F27HL239911'
      'ABCDP3F27HL230011'
      )
      """);

      +-----------------+-------------------------+------------------------+----------

      -----------+
      | VEH_ENGINE |VEH_COUNTRY |VEH_RETAIL_SALE_DATE | VEH_MODEL_YEAR|
      +-----------------+-------------------------+------------------------+---------------------+
      |ABCDP3F27HL239911| CAN| 0001-01-01| 2017|
      |ABCDP3F27HL230011| USA| 0001-01-01| 2018|
      +-----------------+-------------------------+------------------------+---------------------+


      At the source we have default start date as '0001-01-01' and same date has been loaded to pyspark dataframe as date column. no issues.
      I can perform rest of the operations;join,filters etc as usual.
      but I am facing an issue when I was looking at the spark partitions which I normally do.



      partitionSizedf = df1_data.rdd.glom().map(len).collect()


      I am getting below error:



      ValueError: ('ordinal must be >= 1', <function <lambda> at 0x7fcc8d1c5848>, (u'ABCDP3F27HL239911', u'CAN', -719164, 2017))









      share|improve this question














      I am using spark version 2.3 and facing an strange issue with dates while using the glom method to see the partitions size.



      Below is my dataframe.



      df1_data = spark.sql("""
      SELECT *
      from udb.partitioned_table_df1 where VEH_ENGINE in
      (
      'ABCDP3F27HL239911'
      'ABCDP3F27HL230011'
      )
      """);

      +-----------------+-------------------------+------------------------+----------

      -----------+
      | VEH_ENGINE |VEH_COUNTRY |VEH_RETAIL_SALE_DATE | VEH_MODEL_YEAR|
      +-----------------+-------------------------+------------------------+---------------------+
      |ABCDP3F27HL239911| CAN| 0001-01-01| 2017|
      |ABCDP3F27HL230011| USA| 0001-01-01| 2018|
      +-----------------+-------------------------+------------------------+---------------------+


      At the source we have default start date as '0001-01-01' and same date has been loaded to pyspark dataframe as date column. no issues.
      I can perform rest of the operations;join,filters etc as usual.
      but I am facing an issue when I was looking at the spark partitions which I normally do.



      partitionSizedf = df1_data.rdd.glom().map(len).collect()


      I am getting below error:



      ValueError: ('ordinal must be >= 1', <function <lambda> at 0x7fcc8d1c5848>, (u'ABCDP3F27HL239911', u'CAN', -719164, 2017))






      pyspark






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 16 '18 at 8:52









      vikrant ranavikrant rana

      6521317




      6521317






















          0






          active

          oldest

          votes












          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53334354%2fstrange-issue-with-glom-method-with-pyspark-dataframe%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53334354%2fstrange-issue-with-glom-method-with-pyspark-dataframe%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Top Tejano songwriter Luis Silva dead of heart attack at 64

          政党

          天津地下鉄3号線