Not serializable result: org.apache.hadoop.io.IntWritable when reading Sequence File with Spark / Scala
up vote
0
down vote
favorite
Reading a sequence file with Int and String logically,
then if I do this:
val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x.toString(), y.toString().split("/")(0), y.toString().split("/")(1))
.collect
this is ok as the IntWritable is converted to String.
If I do this:
val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x, y.toString().split("/")(0), y.toString().split("/")(1))
.collect
then I get this error immediately:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0 in stage 42.0 (TID 692) had a not serializable result: org.apache.hadoop.io.IntWritable
Underlying reason is not really clear - serialization, but why so difficult? This is another type of serialization aspect I note. Also it is only noted at run-time.
apache-spark hadoop serialization sequencefile
add a comment |
up vote
0
down vote
favorite
Reading a sequence file with Int and String logically,
then if I do this:
val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x.toString(), y.toString().split("/")(0), y.toString().split("/")(1))
.collect
this is ok as the IntWritable is converted to String.
If I do this:
val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x, y.toString().split("/")(0), y.toString().split("/")(1))
.collect
then I get this error immediately:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0 in stage 42.0 (TID 692) had a not serializable result: org.apache.hadoop.io.IntWritable
Underlying reason is not really clear - serialization, but why so difficult? This is another type of serialization aspect I note. Also it is only noted at run-time.
apache-spark hadoop serialization sequencefile
1
Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
– cricket_007
Nov 10 at 14:10
So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
– thebluephantom
Nov 10 at 14:11
1
No, you canx.get()
, then Integers are serializable as well (and less overhead of serializing strings)
– cricket_007
Nov 10 at 14:13
I took the get() out in this example. Ok, gotcha!
– thebluephantom
Nov 10 at 14:14
You may as well make an answer
– thebluephantom
Nov 10 at 14:14
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Reading a sequence file with Int and String logically,
then if I do this:
val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x.toString(), y.toString().split("/")(0), y.toString().split("/")(1))
.collect
this is ok as the IntWritable is converted to String.
If I do this:
val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x, y.toString().split("/")(0), y.toString().split("/")(1))
.collect
then I get this error immediately:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0 in stage 42.0 (TID 692) had a not serializable result: org.apache.hadoop.io.IntWritable
Underlying reason is not really clear - serialization, but why so difficult? This is another type of serialization aspect I note. Also it is only noted at run-time.
apache-spark hadoop serialization sequencefile
Reading a sequence file with Int and String logically,
then if I do this:
val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x.toString(), y.toString().split("/")(0), y.toString().split("/")(1))
.collect
this is ok as the IntWritable is converted to String.
If I do this:
val sequence_data = sc.sequenceFile("/seq_01/seq-directory/*", classOf[IntWritable], classOf[Text])
.mapcase (x, y) => (x, y.toString().split("/")(0), y.toString().split("/")(1))
.collect
then I get this error immediately:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0 in stage 42.0 (TID 692) had a not serializable result: org.apache.hadoop.io.IntWritable
Underlying reason is not really clear - serialization, but why so difficult? This is another type of serialization aspect I note. Also it is only noted at run-time.
apache-spark hadoop serialization sequencefile
apache-spark hadoop serialization sequencefile
edited Nov 10 at 14:11
cricket_007
75.9k1042106
75.9k1042106
asked Nov 10 at 14:06
thebluephantom
2,0632823
2,0632823
1
Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
– cricket_007
Nov 10 at 14:10
So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
– thebluephantom
Nov 10 at 14:11
1
No, you canx.get()
, then Integers are serializable as well (and less overhead of serializing strings)
– cricket_007
Nov 10 at 14:13
I took the get() out in this example. Ok, gotcha!
– thebluephantom
Nov 10 at 14:14
You may as well make an answer
– thebluephantom
Nov 10 at 14:14
add a comment |
1
Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
– cricket_007
Nov 10 at 14:10
So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
– thebluephantom
Nov 10 at 14:11
1
No, you canx.get()
, then Integers are serializable as well (and less overhead of serializing strings)
– cricket_007
Nov 10 at 14:13
I took the get() out in this example. Ok, gotcha!
– thebluephantom
Nov 10 at 14:14
You may as well make an answer
– thebluephantom
Nov 10 at 14:14
1
1
Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
– cricket_007
Nov 10 at 14:10
Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
– cricket_007
Nov 10 at 14:10
So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
– thebluephantom
Nov 10 at 14:11
So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
– thebluephantom
Nov 10 at 14:11
1
1
No, you can
x.get()
, then Integers are serializable as well (and less overhead of serializing strings)– cricket_007
Nov 10 at 14:13
No, you can
x.get()
, then Integers are serializable as well (and less overhead of serializing strings)– cricket_007
Nov 10 at 14:13
I took the get() out in this example. Ok, gotcha!
– thebluephantom
Nov 10 at 14:14
I took the get() out in this example. Ok, gotcha!
– thebluephantom
Nov 10 at 14:14
You may as well make an answer
– thebluephantom
Nov 10 at 14:14
You may as well make an answer
– thebluephantom
Nov 10 at 14:14
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
If the goal is to just get an Integer value, you would need to call a get on the writable
.map{case (x, y) => (x.get()
And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface
String does implement Serializable
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
If the goal is to just get an Integer value, you would need to call a get on the writable
.map{case (x, y) => (x.get()
And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface
String does implement Serializable
add a comment |
up vote
1
down vote
accepted
If the goal is to just get an Integer value, you would need to call a get on the writable
.map{case (x, y) => (x.get()
And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface
String does implement Serializable
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
If the goal is to just get an Integer value, you would need to call a get on the writable
.map{case (x, y) => (x.get()
And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface
String does implement Serializable
If the goal is to just get an Integer value, you would need to call a get on the writable
.map{case (x, y) => (x.get()
And then the JVM handles serialization of the Integer object rather than not knowing how to process a IntWritable because it doesn't implement the Serializable interface
String does implement Serializable
answered Nov 10 at 14:16
cricket_007
75.9k1042106
75.9k1042106
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239761%2fnot-serializable-result-org-apache-hadoop-io-intwritable-when-reading-sequence%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Well, Strings are subclasses from Serializable, IntWritable is not. hadoop.apache.org/docs/r2.7.4/api/org/apache/hadoop/io/…
– cricket_007
Nov 10 at 14:10
So reading a sequence file means all aspects should be set to String and then processed and converted from there onwards?
– thebluephantom
Nov 10 at 14:11
1
No, you can
x.get()
, then Integers are serializable as well (and less overhead of serializing strings)– cricket_007
Nov 10 at 14:13
I took the get() out in this example. Ok, gotcha!
– thebluephantom
Nov 10 at 14:14
You may as well make an answer
– thebluephantom
Nov 10 at 14:14