Spark SQL: convert milliseconds timestamp in JSON format to dateformat










0















Schema has the dataType declared as Timestamp but spark job is not converting it in the right format.



Dataset<Row> stream = sparkSession.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", kafkaBootstrapServersString)
.option("subscribe", topic)
// .option("maxOffsetsPerTrigger", 10000)
.load();

Dataset<Row> rawStream = stream
.selectExpr("CAST(value AS STRING)")
.select(from_json(col("value"), eventSpecificStructType).as("eventData"))
.select("eventData.*")
.filter(col("eventType").equalTo("Test"));


Timestamp coming in as 1542126896113 gets converted to 50838-01-28 18:49:111.0.

Is there a way to cast milliseconds to datetime format?










share|improve this question


























    0















    Schema has the dataType declared as Timestamp but spark job is not converting it in the right format.



    Dataset<Row> stream = sparkSession.readStream()
    .format("kafka")
    .option("kafka.bootstrap.servers", kafkaBootstrapServersString)
    .option("subscribe", topic)
    // .option("maxOffsetsPerTrigger", 10000)
    .load();

    Dataset<Row> rawStream = stream
    .selectExpr("CAST(value AS STRING)")
    .select(from_json(col("value"), eventSpecificStructType).as("eventData"))
    .select("eventData.*")
    .filter(col("eventType").equalTo("Test"));


    Timestamp coming in as 1542126896113 gets converted to 50838-01-28 18:49:111.0.

    Is there a way to cast milliseconds to datetime format?










    share|improve this question
























      0












      0








      0








      Schema has the dataType declared as Timestamp but spark job is not converting it in the right format.



      Dataset<Row> stream = sparkSession.readStream()
      .format("kafka")
      .option("kafka.bootstrap.servers", kafkaBootstrapServersString)
      .option("subscribe", topic)
      // .option("maxOffsetsPerTrigger", 10000)
      .load();

      Dataset<Row> rawStream = stream
      .selectExpr("CAST(value AS STRING)")
      .select(from_json(col("value"), eventSpecificStructType).as("eventData"))
      .select("eventData.*")
      .filter(col("eventType").equalTo("Test"));


      Timestamp coming in as 1542126896113 gets converted to 50838-01-28 18:49:111.0.

      Is there a way to cast milliseconds to datetime format?










      share|improve this question














      Schema has the dataType declared as Timestamp but spark job is not converting it in the right format.



      Dataset<Row> stream = sparkSession.readStream()
      .format("kafka")
      .option("kafka.bootstrap.servers", kafkaBootstrapServersString)
      .option("subscribe", topic)
      // .option("maxOffsetsPerTrigger", 10000)
      .load();

      Dataset<Row> rawStream = stream
      .selectExpr("CAST(value AS STRING)")
      .select(from_json(col("value"), eventSpecificStructType).as("eventData"))
      .select("eventData.*")
      .filter(col("eventType").equalTo("Test"));


      Timestamp coming in as 1542126896113 gets converted to 50838-01-28 18:49:111.0.

      Is there a way to cast milliseconds to datetime format?







      java apache-spark apache-spark-sql






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 '18 at 18:36









      Himanshu YadavHimanshu Yadav

      5,93634119222




      5,93634119222






















          2 Answers
          2






          active

          oldest

          votes


















          0














          You will have to create a UDF in Java.



          import java.sql.Timestamp;
          import java.text.SimpleDateFormat;

          SimpleDateFormat dateFormat = new SimpleDateFormat("....Date time pattern...");
          spark.udf().register("timestamp", new UDF1<String, Timestamp>()
          private static final long serialVersionUID = 1335972766810808134L;
          @Override
          public Timestamp call(String source)

          try
          return new Timestamp(dateFormat.parse(source).getTime());
          catch (ParseException e)
          e.printStackTrace();


          return null;

          }, DataTypes.TimestampType);


          Finally:



          stream = stream.withColumn("col", callUDF("timestamp", dataframe.col("col")));





          share|improve this answer






























            0














            How about dividing the millisecond value by 1000. Is below meets your expectation?



            val df = Seq(("1542126896113"),("1542126896116")).toDF("unixtime")
            df.withColumn("times",from_unixtime('unixtime.cast("long")/1000)).show(false)


            Output



            +-------------+-------------------+
            |unixtime |times |
            +-------------+-------------------+
            |1542126896113|2018-11-13 22:04:56|
            |1542126896116|2018-11-13 22:04:56|
            +-------------+-------------------+





            share|improve this answer























            • Thanks. Saw a couple of examples in python and spark. I am looking for the syntax in Java.

              – Himanshu Yadav
              Nov 16 '18 at 18:48










            Your Answer






            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53287486%2fspark-sql-convert-milliseconds-timestamp-in-json-format-to-dateformat%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            You will have to create a UDF in Java.



            import java.sql.Timestamp;
            import java.text.SimpleDateFormat;

            SimpleDateFormat dateFormat = new SimpleDateFormat("....Date time pattern...");
            spark.udf().register("timestamp", new UDF1<String, Timestamp>()
            private static final long serialVersionUID = 1335972766810808134L;
            @Override
            public Timestamp call(String source)

            try
            return new Timestamp(dateFormat.parse(source).getTime());
            catch (ParseException e)
            e.printStackTrace();


            return null;

            }, DataTypes.TimestampType);


            Finally:



            stream = stream.withColumn("col", callUDF("timestamp", dataframe.col("col")));





            share|improve this answer



























              0














              You will have to create a UDF in Java.



              import java.sql.Timestamp;
              import java.text.SimpleDateFormat;

              SimpleDateFormat dateFormat = new SimpleDateFormat("....Date time pattern...");
              spark.udf().register("timestamp", new UDF1<String, Timestamp>()
              private static final long serialVersionUID = 1335972766810808134L;
              @Override
              public Timestamp call(String source)

              try
              return new Timestamp(dateFormat.parse(source).getTime());
              catch (ParseException e)
              e.printStackTrace();


              return null;

              }, DataTypes.TimestampType);


              Finally:



              stream = stream.withColumn("col", callUDF("timestamp", dataframe.col("col")));





              share|improve this answer

























                0












                0








                0







                You will have to create a UDF in Java.



                import java.sql.Timestamp;
                import java.text.SimpleDateFormat;

                SimpleDateFormat dateFormat = new SimpleDateFormat("....Date time pattern...");
                spark.udf().register("timestamp", new UDF1<String, Timestamp>()
                private static final long serialVersionUID = 1335972766810808134L;
                @Override
                public Timestamp call(String source)

                try
                return new Timestamp(dateFormat.parse(source).getTime());
                catch (ParseException e)
                e.printStackTrace();


                return null;

                }, DataTypes.TimestampType);


                Finally:



                stream = stream.withColumn("col", callUDF("timestamp", dataframe.col("col")));





                share|improve this answer













                You will have to create a UDF in Java.



                import java.sql.Timestamp;
                import java.text.SimpleDateFormat;

                SimpleDateFormat dateFormat = new SimpleDateFormat("....Date time pattern...");
                spark.udf().register("timestamp", new UDF1<String, Timestamp>()
                private static final long serialVersionUID = 1335972766810808134L;
                @Override
                public Timestamp call(String source)

                try
                return new Timestamp(dateFormat.parse(source).getTime());
                catch (ParseException e)
                e.printStackTrace();


                return null;

                }, DataTypes.TimestampType);


                Finally:



                stream = stream.withColumn("col", callUDF("timestamp", dataframe.col("col")));






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 23 '18 at 11:37









                NikhilNikhil

                389513




                389513























                    0














                    How about dividing the millisecond value by 1000. Is below meets your expectation?



                    val df = Seq(("1542126896113"),("1542126896116")).toDF("unixtime")
                    df.withColumn("times",from_unixtime('unixtime.cast("long")/1000)).show(false)


                    Output



                    +-------------+-------------------+
                    |unixtime |times |
                    +-------------+-------------------+
                    |1542126896113|2018-11-13 22:04:56|
                    |1542126896116|2018-11-13 22:04:56|
                    +-------------+-------------------+





                    share|improve this answer























                    • Thanks. Saw a couple of examples in python and spark. I am looking for the syntax in Java.

                      – Himanshu Yadav
                      Nov 16 '18 at 18:48















                    0














                    How about dividing the millisecond value by 1000. Is below meets your expectation?



                    val df = Seq(("1542126896113"),("1542126896116")).toDF("unixtime")
                    df.withColumn("times",from_unixtime('unixtime.cast("long")/1000)).show(false)


                    Output



                    +-------------+-------------------+
                    |unixtime |times |
                    +-------------+-------------------+
                    |1542126896113|2018-11-13 22:04:56|
                    |1542126896116|2018-11-13 22:04:56|
                    +-------------+-------------------+





                    share|improve this answer























                    • Thanks. Saw a couple of examples in python and spark. I am looking for the syntax in Java.

                      – Himanshu Yadav
                      Nov 16 '18 at 18:48













                    0












                    0








                    0







                    How about dividing the millisecond value by 1000. Is below meets your expectation?



                    val df = Seq(("1542126896113"),("1542126896116")).toDF("unixtime")
                    df.withColumn("times",from_unixtime('unixtime.cast("long")/1000)).show(false)


                    Output



                    +-------------+-------------------+
                    |unixtime |times |
                    +-------------+-------------------+
                    |1542126896113|2018-11-13 22:04:56|
                    |1542126896116|2018-11-13 22:04:56|
                    +-------------+-------------------+





                    share|improve this answer













                    How about dividing the millisecond value by 1000. Is below meets your expectation?



                    val df = Seq(("1542126896113"),("1542126896116")).toDF("unixtime")
                    df.withColumn("times",from_unixtime('unixtime.cast("long")/1000)).show(false)


                    Output



                    +-------------+-------------------+
                    |unixtime |times |
                    +-------------+-------------------+
                    |1542126896113|2018-11-13 22:04:56|
                    |1542126896116|2018-11-13 22:04:56|
                    +-------------+-------------------+






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 14 '18 at 3:29









                    stack0114106stack0114106

                    2,9871417




                    2,9871417












                    • Thanks. Saw a couple of examples in python and spark. I am looking for the syntax in Java.

                      – Himanshu Yadav
                      Nov 16 '18 at 18:48

















                    • Thanks. Saw a couple of examples in python and spark. I am looking for the syntax in Java.

                      – Himanshu Yadav
                      Nov 16 '18 at 18:48
















                    Thanks. Saw a couple of examples in python and spark. I am looking for the syntax in Java.

                    – Himanshu Yadav
                    Nov 16 '18 at 18:48





                    Thanks. Saw a couple of examples in python and spark. I am looking for the syntax in Java.

                    – Himanshu Yadav
                    Nov 16 '18 at 18:48

















                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53287486%2fspark-sql-convert-milliseconds-timestamp-in-json-format-to-dateformat%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Top Tejano songwriter Luis Silva dead of heart attack at 64

                    政党

                    天津地下鉄3号線