[HADOOP] hbase 스캔을하는 동안 예외가 발생했습니다.
HADOOPhbase 스캔을하는 동안 예외가 발생했습니다.
내가 hbase 스파크 분산 스캔 예제를 시도했다.
내 간단한 코드는 다음과 같습니다.
public class DistributedHBaseScanToRddDemo {
public static void main(String[] args) {
JavaSparkContext jsc = getJavaSparkContext("hbasetable1");
Configuration hbaseConf = getHbaseConf(0, "", "");
JavaHBaseContext javaHbaseContext = new JavaHBaseContext(jsc, hbaseConf);
Scan scan = new Scan();
scan.setCaching(100);
JavaRDD<Tuple2<ImmutableBytesWritable, Result>> javaRdd =
javaHbaseContext.hbaseRDD(TableName.valueOf("hbasetable1"), scan);
List<String> results = javaRdd.map(new ScanConvertFunction()).collect();
System.out.println("Result Size: " + results.size());
}
public static Configuration getHbaseConf(int pRimeout, String pQuorumIP, String pClientPort)
{
Configuration hbaseConf = HBaseConfiguration.create();
hbaseConf.setInt("timeout", 120000);
hbaseConf.set("hbase.zookeeper.quorum", "10.56.36.14");
hbaseConf.set("hbase.zookeeper.property.clientPort", "2181");
return hbaseConf;
}
public static JavaSparkContext getJavaSparkContext(String pTableName)
{
SparkConf sparkConf = new SparkConf().setAppName("JavaHBaseBulkPut" + pTableName);
sparkConf.setMaster("local");
sparkConf.set("spark.testing.memory", "471859200");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
return jsc;
}
private static class ScanConvertFunction implements Function<Tuple2<ImmutableBytesWritable, Result>, String> {
public String call(Tuple2<ImmutableBytesWritable, Result> v1) throws Exception {
return Bytes.toString(v1._1().copyBytes());
}
}
}
다음과 같은 예외가 발생합니다.
Exception in thread "main" org.apache.hadoop.hbase.DoNotRetryIOException: /10.56.48.219:16020 is unable to read call parameter from client 10.56.49.148; java.lang.UnsupportedOperationException: GetRegionLoad
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:93)
at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:83)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:368)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:345)
at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.getRegionLoad(ProtobufUtil.java:1746)
at org.apache.hadoop.hbase.client.HBaseAdmin.getRegionLoad(HBaseAdmin.java:2089)
at org.apache.hadoop.hbase.mapreduce.RegionSizeCalculator.init(RegionSizeCalculator.java:82)
at org.apache.hadoop.hbase.mapreduce.RegionSizeCalculator.<init>(RegionSizeCalculator.java:60)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.oneInputSplitPerRegion(TableInputFormatBase.java:293)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:257)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:254)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:121)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1911)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:893)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.collect(RDD.scala:892)
at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:360)
at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at com.myproj.poc.sparkhbaseneo4j.DistributedHBaseScanToRddDemo.main(DistributedHBaseScanToRddDemo.java:32)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException): /10.56.48.219:16020 is unable to read call parameter from client 10.56.49.148; java.lang.UnsupportedOperationException: GetRegionLoad
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:387)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.readResponse(NettyRpcDuplexHandler.java:161)
at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelRead(NettyRpcDuplexHandler.java:191)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at org.apache.hadoop.hbase.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at org.apache.hadoop.hbase.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at org.apache.hadoop.hbase.shaded.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at org.apache.hadoop.hbase.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at org.apache.hadoop.hbase.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at org.apache.hadoop.hbase.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at org.apache.hadoop.hbase.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at org.apache.hadoop.hbase.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at org.apache.hadoop.hbase.shaded.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:745)
나는 또한 벌크 시도하고 예제를 넣어 그들은 제대로 작동하고 있습니다. 그래서 벌크 스캔 예제가 잘못되어 가고있는 것 같았습니다.
해결법
-
==============================
1.이 Cloudera hbase-spark 커넥터가 작동하는 것 같습니다.
이 Cloudera hbase-spark 커넥터가 작동하는 것 같습니다.
https://mvnrepository.com/artifact/org.apache.hbase/hbase-spark?repo=cloudera
그래서, 다음과 같이 pom.xml에 추가하십시오 :
<repositories> <repository> <id>cloudera</id> <name>cloudera</name> <url>https://repository.cloudera.com/content/repositories/releases/</url> </repository> </repositories>
의존성 :
<dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-spark</artifactId> <version>${hbase-spark.version}</version> </dependency>
한가지 주목할 점은이 기능이 HBase 연결을 잘 재사용하지 않는 것으로 보아 모든 파티션에 대해 다시 설정하려고한다는 것입니다. 내 질문 및 관련 토론을 참조하십시오.
HBase-Spark 커넥터 : 매 스캔마다 HBase에 연결 되었습니까?
이런 이유로 나는 실제로이 기능을 피하지만, 이것에 대한 당신의 경험을 알고 싶어합니다.
from https://stackoverflow.com/questions/50271222/exception-while-doing-hbase-scan by cc-by-sa and MIT license
'HADOOP' 카테고리의 다른 글
[HADOOP] hadoop 스트리밍을위한 분할 자 지정 방법 (0) | 2019.06.30 |
---|---|
[HADOOP] pod에서 ssh를 사용할 수 있습니까? (0) | 2019.06.30 |
[HADOOP] Hadoop Reducer : 어떻게 투기 실행을 사용하여 여러 디렉토리로 출력 할 수 있습니까? (0) | 2019.06.30 |
[HADOOP] impala 및 hbase에 연결하는 중 Kerberos 오류가 발생했습니다. (0) | 2019.06.30 |
[HADOOP] 하이브 udf에서 하이브 conf 변수를 전달하는 방법은 무엇입니까? (0) | 2019.06.30 |