[HADOOP] 아파치 돼지 오류 JSON 데이터를 덤프 동안
HADOOP아파치 돼지 오류 JSON 데이터를 덤프 동안
내가 JSON 파일이 아파치 돼지를 사용하여로드 할.
나는 다음은 샘플 JSON 데이터이며, 내장 JSONLOADER로드 JSON 데이터를 사용하고 있습니다.
cat jsondata1.json
{ "response": { "id": 10123, "thread": "Sloths", "comments": ["Sloths are adorable So chill"] }, "response_time": 0.425 }
{ "response": { "id": 13828, "thread": "Bigfoot", "comments": ["hello world"] } , "response_time": 0.517 }
저는 여기에 내장 JSON 로더를 사용하여 JSON 데이터를로드. 로드하는 동안 오류가 없지만, 데이터를 덤프하는 동안 다음과 같은 오류를 제공합니다.
grunt> a = load '/home/cloudera/jsondata1.json' using JsonLoader('response:tuple (id:int, thread:chararray, comments:bag {tuple(comment:chararray)}), response_time:double');
grunt> dump a;
2016-04-17 01:11:13,286 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/home/cloudera/jsondata1.json:0+229
2016-04-17 01:11:13,287 [pool-4-thread-1] WARN org.apache.hadoop.conf.Configuration - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2016-04-17 01:11:13,311 [pool-4-thread-1] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-04-17 01:11:13,321 [pool-4-thread-1] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: a[5,4] C: R:
2016-04-17 01:11:13,349 [Thread-16] INFO org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2016-04-17 01:11:13,351 [Thread-16] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local801054416_0004
java.lang.Exception: org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not numeric, can not use numeric value accessors
at [Source: java.io.ByteArrayInputStream@2484de3c; line: 1, column: 120]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: org.codehaus.jackson.JsonParseException: Current token (FIELD_NAME) not numeric, can not use numeric value accessors
at [Source: java.io.ByteArrayInputStream@2484de3c; line: 1, column: 120]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonNumericParserBase._parseNumericValue(JsonNumericParserBase.java:399)
at org.codehaus.jackson.impl.JsonNumericParserBase.getDoubleValue(JsonNumericParserBase.java:311)
at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:203)
at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:157)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local801054416_0004
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases a
2016-04-17 01:11:13,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a[5,4] C: R:
2016-04-17 01:11:18,059 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2016-04-17 01:11:18,059 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local801054416_0004 has failed! Stop running all dependent jobs
2016-04-17 01:11:18,059 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-04-17 01:11:18,059 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.7.0 0.11.0-cdh4.7.0 cloudera 2016-04-17 01:11:12 2016-04-17 01:11:18 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local801054416_0004 a MAP_ONLY Message: Job failed! file:/tmp/temp-1766116741/tmp1151698221,
Input(s):
Failed to read data from "/home/cloudera/jsondata1.json"
Output(s):
Failed to produce result in "file:/tmp/temp-1766116741/tmp1151698221"
Job DAG:
job_local801054416_0004
2016-04-17 01:11:18,060 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2016-04-17 01:11:18,061 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias a
Details at logfile: /home/cloudera/pig_1460877001124.log
나는이 문제를 찾을 수 없습니다. 나는 위의 JSON 데이터에 대한 올바른 스키마를 정의하는 방법을 알 수있다?
해결법
-
==============================
1.이 시도:
이 시도:
comments:{(chararray)}
이 버전 때문에 :
comments:bag {tuple(comment:chararray)}
이 JSON 스키마에 맞는 :
"comments": [{comment:"hello world"}]
당신은 간단한 문자열 값이 아닌 다른 중첩 된 문서가 :
"comments": ["hello world"]
from https://stackoverflow.com/questions/36674290/apache-pig-error-while-dumping-json-data by cc-by-sa and MIT license
'HADOOP' 카테고리의 다른 글
[HADOOP] UDF에 대한 인수로 돼지 패스 관계 (0) | 2019.09.20 |
---|---|
[HADOOP] 하둡 - "코드가 계산에 가까운 데이터를 이동" (0) | 2019.09.20 |
[HADOOP] 자바를 통해 하둡 클러스터의 모든 원사 응용 프로그램을 나열 (0) | 2019.09.20 |
[HADOOP] HDP 2.5 : 스파크 역사 서버 UI가 완료되지 않은 응용 프로그램을 표시하지 않습니다 (0) | 2019.09.20 |
[HADOOP] 어떻게 자바를 사용하여 하이브 MySQL에서 테이블을 가져? (0) | 2019.09.20 |