복붙노트

[HADOOP] 코끼리 조류와 하이브 예를 주소록 protobuf 데이터를 쿼리 할 수 ​​없습니다

HADOOP

코끼리 조류와 하이브 예를 주소록 protobuf 데이터를 쿼리 할 수 ​​없습니다

나는 몇 가지 예를 protobuf 데이터를 쿼리하는 코끼리 새를 사용하는 것을 시도하고있다. 내가 주소록 예제를 사용하고, 나는 파일로 가짜 AddressBooks의 소수를 직렬화 / 사용자 / foo는 / 데이터 / 코끼리 조류 / addressbooks에서 HDFS에 넣어 /는 쿼리 결과를 반환

내가 설정 때문에 같은 테이블 및 쿼리 :

add jar /home/foo/downloads/elephant-bird/hadoop-compat/target/elephant-bird-hadoop-compat-4.6-SNAPSHOT.jar;
add jar /home/foo/downloads/elephant-bird/core/target/elephant-bird-core-4.6-SNAPSHOT.jar;
add jar /home/foo/downloads/elephant-bird/hive/target/elephant-bird-hive-4.6-SNAPSHOT.jar;

create external table addresses
row format serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"
with serdeproperties (
"serialization.class"="com.twitter.data.proto.tutorial.AddressBookProtos$AddressBook")
STORED AS
-- elephant-bird provides an input format for use with hive
INPUTFORMAT "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat"
-- placeholder as we will not be writing to this table
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION '/user/foo/data/elephant-bird/addressbooks/';


describe formatted addresses;

OK
# col_name              data_type               comment

person array{ struct{  name:string, id:int, email:string, phone:array {struct {number:string, type:string}}}}  from deserializer
byteData                binary                  from deserializer

# Detailed Table Information
Database:               default
Owner:                  foo
CreateTime:             Tue Oct 28 13:49:53 PDT 2014
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://foo:8020/user/foo/data/elephant-bird/addressbooks
Table Type:             EXTERNAL_TABLE
Table Parameters:
        EXTERNAL                TRUE
        transient_lastDdlTime   1414529393

# Storage Information
SerDe Library:          com.twitter.elephantbird.hive.serde.ProtobufDeserializer
InputFormat:            com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.class     com.twitter.data.proto.tutorial.AddressBookProtos$AddressBook
        serialization.format    1
Time taken: 0.421 seconds, Fetched: 29 row(s)

내가 데이터를 선택하려고 할 때, 그것은 어떤 결과를 (행을 읽어 표시되지 않습니다) 반환 :

select count(*) from addresses;

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Starting Job = job_1413311929339_0061, Tracking URL = http://foo:8088/proxy/application_1413311929339_0061/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1413311929339_0061
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1
2014-10-28 13:50:37,674 Stage-1 map = 0%,  reduce = 0%
2014-10-28 13:50:51,055 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 2.14 sec
2014-10-28 13:50:52,152 Stage-1 map = 0%,  reduce = 100%, Cumulative CPU 2.14 sec
MapReduce Total cumulative CPU time: 2 seconds 140 msec
Ended Job = job_1413311929339_0061
MapReduce Jobs Launched:
Job 0: Reduce: 1   Cumulative CPU: 2.14 sec   HDFS Read: 0 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 140 msec
OK
0
Time taken: 37.519 seconds, Fetched: 1 row(s)

내가 명시 적으로 외부 테이블로 데이터를 가져 오는 경우 내가 아닌 외부 테이블을 만들 경우 같은 일을 참조하거나.

내 설치에 대한 버전 정보 :

Thrift 0.7
protobuf: libprotoc 2.5.0
hadoop:
Hadoop 2.5.0-cdh5.2.0
Subversion http://github.com/cloudera/hadoop -r e1f20a08bde76a33b79df026d00a0c91b2298387
Compiled by jenkins on 2014-10-11T21:00Z
Compiled with protoc 2.5.0
From source with checksum 309bccd135b199bdfdd6df5f3f4153d

최신 정보:

나는 로그에이 오류를 참조하십시오. HDFS에서 내 데이터는 원시 protobuf (비 압축)입니다. 나는 원시 이진 protobuf를 읽을 수 있는지 그게 문제가 있는지 파악, 그리고 것입니다.


    Error: java.io.IOException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
    at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:346)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:293)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:407)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:560)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:168)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:332)
    ... 11 more
    Caused by: java.io.IOException: No codec for file hdfs://foo:8020/user/foo/data/elephantbird/addressbooks/1000AddressBooks-1684394246.bin found
    at com.twitter.elephantbird.mapreduce.input.MultiInputFormat.determineFileFormat(MultiInputFormat.java:176)
    at com.twitter.elephantbird.mapreduce.input.MultiInputFormat.createRecordReader(MultiInputFormat.java:88)
    at com.twitter.elephantbird.mapreduce.input.RawMultiInputFormat.createRecordReader(RawMultiInputFormat.java:36)
    at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper$RecordReaderWrapper.(DeprecatedInputFormatWrapper.java:256)
    at com.twitter.elephantbird.mapred.input.DeprecatedInputFormatWrapper.getRecordReader(DeprecatedInputFormatWrapper.java:121)
    at com.twitter.elephantbird.mapred.input.DeprecatedFileInputFormatWrapper.getRecordReader(DeprecatedFileInputFormatWrapper.java:55)
    at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65)
    ... 16 more

해결법

  1. ==============================

    1.당신은 문제를 해결 했습니까?

    당신은 문제를 해결 했습니까?

    난 당신이 설명대로 같은 문제가 있었다.

    예를 맞아, 나는 생의 바이너리 protobuf 직접 읽을 수없는 것을 발견된다.

    이것은 내가 질문했던 문제입니다. protobuf 데이터를 읽을 하이브와 코끼리 새를 사용하여

    희망이 도움이

    친애하는

  2. from https://stackoverflow.com/questions/26618929/cannot-query-example-addressbook-protobuf-data-in-hive-with-elephant-bird by cc-by-sa and MIT license