Resolve “org.apache.hadoop.hive.serde2.SerDeException: Unexpected tag” in Spark and Hive

"org.apache.hadoop.hive.serde2.SerDeException: Unexpected tag" in Spark and Hive

We usually see the ERROR “org.apache.hadoop.hive.serde2.SerDeException: Unexpected tag” in Spark, When you are trying to connect the hive table via the HWC connector

Below ERROR represents there is an issue with the MapJoin (Hash value) while joining multiple tables

ERROR:

ERROR] [TezChild] |tez.TezProcessor|: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.SerDeException: Unexpected tag: 45 reserialized to 28
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.SerDeException: Unexpected tag: 45 reserialized to 28
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:298)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:252)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
	at 

Symptom:

This will cause Spark/hive query failure

Resolution:

To resolve the issue, We need to either increase the container memory or have to change the join logic which didn’t use much memory

1) Increase Container memory

We can increase the memory to make sure the container can able to handle the hash values while joining the table

set hive.tez.java.opts=-Xmx8192m;
set hive.tez.container.size=12384;

2) Change the Join Logic

set hive.auto.convert.join.noconditionaltask = true;
set hive.auto.convert.join.noconditionaltask.size = 10000000;

This means you are reverting the value of ” hive.auto.convert.join.noconditionaltask.size” to default 10 MB (10000000)

Tuning the below property enables the user to control what size table can fit in memory. This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory. Reducing the above value will result in not using map join

NOTE: This is applicable for both Spark and Hive/MR execution engine

Good Luck with your Learning !!

Related Topics:

Resolve the “Container killed by YARN for exceeding memory limits” Issue in Hive, Tez, and Spark jobs

Resolve the “java.lang.OutOfMemoryError: Java heap space” issue in Spark and Hive(Tez and MR)

Resolve “Cannot modify at runtime. It is not in list of params that are allowed to be modified at runtime” in Hive

Similar Posts