복붙노트

[HADOOP] 메소를 사용하는 경우 HDFS 읽고 스파크가 실패

HADOOP

메소를 사용하는 경우 HDFS 읽고 스파크가 실패

우리는 작은 불꽃 클러스터를 설정하고,이 HDFS에서 읽을 수 있다면 우리는 테스트했다. 우리는 HDFS에서 읽고 표준 출력으로 읽기 파일에서 라인의 양을 기록 작은 작업과 테스트하고 있습니다. 로컬 마스터 실행하는 경우 잘 작동 (로컬 [*]). 그러나, 메소에 작업을 제출하려고 (메소 : // ZK : // host1의 : 2181, 호스트 2 : 2181, host3에게 : 2181 / 메소), 우리는 다음과 같은 오류가 발생합니다 :

16/03/14 11:10:27 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, XXX): java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "XXXX"; destination host is: "XXX":9000;
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
    at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:254)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1220)
    at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1210)
    at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1200)
    at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:271)
    at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:238)
    at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:231)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1498)
    at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:302)
    at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:298)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:298)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
    at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:79)
    at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:163)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1553)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1848)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1848)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Couldn't set up IO streams
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:786)
    at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    ... 38 more
Caused by: java.util.ServiceConfigurationError: org.apache.hadoop.security.SecurityInfo: Provider org.apache.hadoop.security.AnnotatedSecurityInfo not found
    at java.util.ServiceLoader.fail(ServiceLoader.java:231)
    at java.util.ServiceLoader.access$300(ServiceLoader.java:181)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:365)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
    at org.apache.hadoop.security.SecurityUtil.getTokenInfo(SecurityUtil.java:327)
    at org.apache.hadoop.security.SaslRpcClient.getServerToken(SaslRpcClient.java:263)
    at org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:219)
    at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:159)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553)
    at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717)
    ... 41 more

모든 스파크 노드는 하둡 라이브러리 (2.6.0)의 동일한 버전과 동일한 구성을 가지고있다. 나는 클래스가 존재하는 경우 수동으로 확인하고, 또한 스파크 셸에서 (...) Class.forName을 함께 테스트하여했습니다. 우리는 Kerberos 인증을 사용하고 모든 노드가 읽기에 대한 유효한 Kerberos 티켓을 가지고있다.

우리는 스파크 1.5.1을 실행하고 있습니다.

모든 제안을 환영합니다.

해결법

    from https://stackoverflow.com/questions/35986055/spark-hdfs-read-fails-when-using-mesos by cc-by-sa and MIT license