Not serializable result: org.apache.hadoop.io.IntWritable when reading Sequence File with Spark / Scala









up vote
0
down vote

favorite












Reading a sequence file with Int and String logically,



then if I do this:



val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x.toString(), y.toString().split("/")(0), y.toString().split("/")(1))
.collect


this is ok as the IntWritable is converted to String.



If I do this:



val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x, y.toString().split("/")(0), y.toString().split("/")(1))
.collect


then I get this error immediately:



org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0 in stage 42.0 (TID 692) had a not serializable result: org.apache.hadoop.io.IntWritable


Underlying reason is not really clear - serialization, but why so difficult? This is another type of serialization aspect I note. Also it is only noted at run-time.










share|improve this question



















  • 1




    Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
    – cricket_007
    Nov 10 at 14:10










  • So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
    – thebluephantom
    Nov 10 at 14:11






  • 1




    No, you can x.get(), then Integers are serializable as well (and less overhead of serializing strings)
    – cricket_007
    Nov 10 at 14:13










  • I took the get() out in this example. Ok, gotcha!
    – thebluephantom
    Nov 10 at 14:14










  • You may as well make an answer
    – thebluephantom
    Nov 10 at 14:14














up vote
0
down vote

favorite












Reading a sequence file with Int and String logically,



then if I do this:



val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x.toString(), y.toString().split("/")(0), y.toString().split("/")(1))
.collect


this is ok as the IntWritable is converted to String.



If I do this:



val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x, y.toString().split("/")(0), y.toString().split("/")(1))
.collect


then I get this error immediately:



org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0 in stage 42.0 (TID 692) had a not serializable result: org.apache.hadoop.io.IntWritable


Underlying reason is not really clear - serialization, but why so difficult? This is another type of serialization aspect I note. Also it is only noted at run-time.










share|improve this question



















  • 1




    Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
    – cricket_007
    Nov 10 at 14:10










  • So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
    – thebluephantom
    Nov 10 at 14:11






  • 1




    No, you can x.get(), then Integers are serializable as well (and less overhead of serializing strings)
    – cricket_007
    Nov 10 at 14:13










  • I took the get() out in this example. Ok, gotcha!
    – thebluephantom
    Nov 10 at 14:14










  • You may as well make an answer
    – thebluephantom
    Nov 10 at 14:14












up vote
0
down vote

favorite









up vote
0
down vote

favorite











Reading a sequence file with Int and String logically,



then if I do this:



val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x.toString(), y.toString().split("/")(0), y.toString().split("/")(1))
.collect


this is ok as the IntWritable is converted to String.



If I do this:



val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x, y.toString().split("/")(0), y.toString().split("/")(1))
.collect


then I get this error immediately:



org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0 in stage 42.0 (TID 692) had a not serializable result: org.apache.hadoop.io.IntWritable


Underlying reason is not really clear - serialization, but why so difficult? This is another type of serialization aspect I note. Also it is only noted at run-time.










share|improve this question















Reading a sequence file with Int and String logically,



then if I do this:



val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x.toString(), y.toString().split("/")(0), y.toString().split("/")(1))
.collect


this is ok as the IntWritable is converted to String.



If I do this:



val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x, y.toString().split("/")(0), y.toString().split("/")(1))
.collect


then I get this error immediately:



org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0 in stage 42.0 (TID 692) had a not serializable result: org.apache.hadoop.io.IntWritable


Underlying reason is not really clear - serialization, but why so difficult? This is another type of serialization aspect I note. Also it is only noted at run-time.







apache-spark hadoop serialization sequencefile






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 14:11









cricket_007

75.9k1042106




75.9k1042106










asked Nov 10 at 14:06









thebluephantom

2,0632823




2,0632823







  • 1




    Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
    – cricket_007
    Nov 10 at 14:10










  • So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
    – thebluephantom
    Nov 10 at 14:11






  • 1




    No, you can x.get(), then Integers are serializable as well (and less overhead of serializing strings)
    – cricket_007
    Nov 10 at 14:13










  • I took the get() out in this example. Ok, gotcha!
    – thebluephantom
    Nov 10 at 14:14










  • You may as well make an answer
    – thebluephantom
    Nov 10 at 14:14












  • 1




    Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
    – cricket_007
    Nov 10 at 14:10










  • So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
    – thebluephantom
    Nov 10 at 14:11






  • 1




    No, you can x.get(), then Integers are serializable as well (and less overhead of serializing strings)
    – cricket_007
    Nov 10 at 14:13










  • I took the get() out in this example. Ok, gotcha!
    – thebluephantom
    Nov 10 at 14:14










  • You may as well make an answer
    – thebluephantom
    Nov 10 at 14:14







1




1




Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
– cricket_007
Nov 10 at 14:10




Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
– cricket_007
Nov 10 at 14:10












So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
– thebluephantom
Nov 10 at 14:11




So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
– thebluephantom
Nov 10 at 14:11




1




1




No, you can x.get(), then Integers are serializable as well (and less overhead of serializing strings)
– cricket_007
Nov 10 at 14:13




No, you can x.get(), then Integers are serializable as well (and less overhead of serializing strings)
– cricket_007
Nov 10 at 14:13












I took the get() out in this example. Ok, gotcha!
– thebluephantom
Nov 10 at 14:14




I took the get() out in this example. Ok, gotcha!
– thebluephantom
Nov 10 at 14:14












You may as well make an answer
– thebluephantom
Nov 10 at 14:14




You may as well make an answer
– thebluephantom
Nov 10 at 14:14












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










If the goal is to just get an Integer value, you would need to call a get on the writable



.map{case (x, y) => (x.get()


And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface



String does implement Serializable






share|improve this answer




















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239761%2fnot-serializable-result-org-apache-hadoop-io-intwritable-when-reading-sequence%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    If the goal is to just get an Integer value, you would need to call a get on the writable



    .map{case (x, y) => (x.get()


    And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface



    String does implement Serializable






    share|improve this answer
























      up vote
      1
      down vote



      accepted










      If the goal is to just get an Integer value, you would need to call a get on the writable



      .map{case (x, y) => (x.get()


      And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface



      String does implement Serializable






      share|improve this answer






















        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted






        If the goal is to just get an Integer value, you would need to call a get on the writable



        .map{case (x, y) => (x.get()


        And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface



        String does implement Serializable






        share|improve this answer












        If the goal is to just get an Integer value, you would need to call a get on the writable



        .map{case (x, y) => (x.get()


        And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface



        String does implement Serializable







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 at 14:16









        cricket_007

        75.9k1042106




        75.9k1042106



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239761%2fnot-serializable-result-org-apache-hadoop-io-intwritable-when-reading-sequence%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Top Tejano songwriter Luis Silva dead of heart attack at 64

            政党

            天津地下鉄3号線