복붙노트

[HADOOP] 아파치 돼지 오류 JSON 데이터를 덤프 동안

HADOOP

아파치 돼지 오류 JSON 데이터를 덤프 동안

내가 JSON 파일이 아파치 돼지를 사용하여로드 할.

나는 다음은 샘플 JSON 데이터이며, 내장 JSONLOADER로드 JSON 데이터를 사용하고 있습니다.

cat jsondata1.json
{ "response": { "id": 10123, "thread": "Sloths", "comments": ["Sloths are adorable So chill"] }, "response_time": 0.425 }
{ "response": { "id": 13828, "thread": "Bigfoot", "comments": ["hello world"] } , "response_time": 0.517 }

저는 여기에 내장 JSON 로더를 사용하여 JSON 데이터를로드. 로드하는 동안 오류가 없지만, 데이터를 덤프하는 동안 다음과 같은 오류를 제공합니다.

grunt> a = load '/home/cloudera/jsondata1.json' using JsonLoader('response:tuple (id:int, thread:chararray, comments:bag {tuple(comment:chararray)}), response_time:double');

grunt> dump a;

2016-04-17 01:11:13,286 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/home/cloudera/jsondata1.json:0+229
2016-04-17 01:11:13,287 [pool-4-thread-1] WARN  org.apache.hadoop.conf.Configuration - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2016-04-17 01:11:13,311 [pool-4-thread-1] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-04-17 01:11:13,321 [pool-4-thread-1] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: a[5,4] C:  R: 
2016-04-17 01:11:13,349 [Thread-16] INFO  org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2016-04-17 01:11:13,351 [Thread-16] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local801054416_0004
java.lang.Exception: org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not numeric, can not use numeric value accessors
 at [Source: java.io.ByteArrayInputStream@2484de3c; line: 1, column: 120]
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not numeric, can not use numeric value accessors
 at [Source: java.io.ByteArrayInputStream@2484de3c; line: 1, column: 120]
    at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
    at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
    at org.codehaus.jackson.impl.JsonNumericParserBase._parseNumericValue(JsonNumericParserBase.java:399)
    at org.codehaus.jackson.impl.JsonNumericParserBase.getDoubleValue(JsonNumericParserBase.java:311)
    at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:203)
    at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:157)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
2016-04-17 01:11:13,548 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local801054416_0004
2016-04-17 01:11:13,548 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a
2016-04-17 01:11:13,548 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[5,4] C:  R: 
2016-04-17 01:11:18,059 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2016-04-17 01:11:18,059 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local801054416_0004 has failed! Stop running all dependent jobs
2016-04-17 01:11:18,059 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-04-17 01:11:18,059 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2016-04-17 01:11:18,060 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2016-04-17 01:11:18,060 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
2.0.0-cdh4.7.0  0.11.0-cdh4.7.0 cloudera    2016-04-17 01:11:12 2016-04-17 01:11:18 UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_local801054416_0004 a   MAP_ONLY    Message: Job failed!    file:/tmp/temp-1766116741/tmp1151698221,

Input(s):
Failed to read data from "/home/cloudera/jsondata1.json"

Output(s):
Failed to produce result in "file:/tmp/temp-1766116741/tmp1151698221"

Job DAG:
job_local801054416_0004


2016-04-17 01:11:18,060 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-04-17 01:11:18,061 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias a
Details at logfile: /home/cloudera/pig_1460877001124.log

나는이 문제를 찾을 수 없습니다. 나는 위의 JSON 데이터에 대한 올바른 스키마를 정의하는 방법을 알 수있다?

해결법

  1. ==============================

    1.이 시도:

    이 시도:

    comments:{(chararray)}
    

    이 버전 때문에 :

    comments:bag {tuple(comment:chararray)}
    

    이 JSON 스키마에 맞는 :

    "comments": [{comment:"hello world"}]
    

    당신은 간단한 문자열 값이 아닌 다른 중첩 된 문서가 :

    "comments": ["hello world"]
    
  2. from https://stackoverflow.com/questions/36674290/apache-pig-error-while-dumping-json-data by cc-by-sa and MIT license