[HADOOP] 메소 + 스파크 및 사용자 정의 항아리 UnknownHostException의
HADOOP메소 + 스파크 및 사용자 정의 항아리 UnknownHostException의
메소에 스파크와 사용자 정의 항아리를 실행할 때 나는를 가지는 UnknownHostException를 수신하고 있습니다. 스파크 쉘을 실행할 때 문제가 발생하지 않습니다.
내 spark-env.sh에는 다음이 포함됩니다 :
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
export HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop/
나의 불꽃은 defaults.conf에는 다음이 포함됩니다 :
spark.master mesos://zk://172.31.0.81:2181,172.31.16.81:2181,172.31.32.81:2181/mesos
spark.mesos.executor.home /spark-1.5.0-bin-hadoop2.6/
이 설정은 모든 마스터와 슬레이브에 있습니다.
다음과 다음 줄을 실행하면 제대로 작동으로 스파크 쉘을 시작 :
/spark-1.5.0-bin-hadoop2.6/bin/spark-shell
sc.textFile("/tmp/Input").collect.foreach(println)
스파크 - 쉘에 대한 로그 :
15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(88528) called with curMem=0, maxMem=556038881
15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 86.5 KB, free 530.2 MB)
15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(20236) called with curMem=88528, maxMem=556038881
15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 19.8 KB, free 530.2 MB)
15/09/28 20:04:49 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.31.21.104:49048 (size: 19.8 KB, free: 530.3 MB)
15/09/28 20:04:49 INFO spark.SparkContext: Created broadcast 0 from textFile at <console>:22
15/09/28 20:04:49 INFO mapred.FileInputFormat: Total input paths to process : 1
15/09/28 20:04:49 INFO spark.SparkContext: Starting job: collect at <console>:22
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Got job 0 (collect at <console>:22) with 3 output partitions
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Final stage: ResultStage 0(collect at <console>:22)
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Missing parents: List()
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at textFile at <console>:22), which has no missing parents
15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(3120) called with curMem=108764, maxMem=556038881
15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 530.2 MB)
15/09/28 20:04:49 INFO storage.MemoryStore: ensureFreeSpace(1784) called with curMem=111884, maxMem=556038881
15/09/28 20:04:49 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1784.0 B, free 530.2 MB)
15/09/28 20:04:49 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.31.21.104:49048 (size: 1784.0 B, free: 530.3 MB)
15/09/28 20:04:49 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861
15/09/28 20:04:49 INFO scheduler.DAGScheduler: Submitting 3 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at textFile at <console>:22)
15/09/28 20:04:49 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 3 tasks
15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-172-31-37-82.us-west-2.compute.internal, NODE_LOCAL, 2142 bytes)
15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-172-31-21-104.us-west-2.compute.internal, NODE_LOCAL, 2142 bytes)
15/09/28 20:04:49 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, ip-172-31-4-4.us-west-2.compute.internal, NODE_LOCAL, 2142 bytes)
15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-172-31-4-4.us-west-2.compute.internal:50648 with 530.3 MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S2, ip-172-31-4-4.us-west-2.compute.internal, 50648)
15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-172-31-37-82.us-west-2.compute.internal:52624 with 530.3 MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S1, ip-172-31-37-82.us-west-2.compute.internal, 52624)
15/09/28 20:04:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-172-31-21-104.us-west-2.compute.internal:56628 with 530.3 MB RAM, BlockManagerId(20150928-190245-1358962604-5050-11297-S0, ip-172-31-21-104.us-west-2.compute.internal, 56628)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-37-82.us-west-2.compute.internal:52624 (size: 1784.0 B, free: 530.3 MB)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-21-104.us-west-2.compute.internal:56628 (size: 1784.0 B, free: 530.3 MB)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-4-4.us-west-2.compute.internal:50648 (size: 1784.0 B, free: 530.3 MB)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-37-82.us-west-2.compute.internal:52624 (size: 19.8 KB, free: 530.3 MB)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-21-104.us-west-2.compute.internal:56628 (size: 19.8 KB, free: 530.3 MB)
15/09/28 20:04:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-4-4.us-west-2.compute.internal:50648 (size: 19.8 KB, free: 530.3 MB)
15/09/28 20:04:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3907 ms on ip-172-31-37-82.us-west-2.compute.internal (1/3)
15/09/28 20:04:53 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 3884 ms on ip-172-31-4-4.us-west-2.compute.internal (2/3)
15/09/28 20:04:53 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 3907 ms on ip-172-31-21-104.us-west-2.compute.internal (3/3)
15/09/28 20:04:53 INFO scheduler.DAGScheduler: ResultStage 0 (collect at <console>:22) finished in 3.940 s
15/09/28 20:04:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/09/28 20:04:53 INFO scheduler.DAGScheduler: Job 0 finished: collect at <console>:22, took 4.019454 s
pepsi
cocacola
항아리로 컴파일 다음 샘플 코드가 실패
샘플 코드 :
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
sc.textFile("/tmp/Input").collect.foreach(println)
}
}
를 통해 실행
/spark-1.5.0-bin-hadoop2.6/bin/spark-submit --class "SimpleApp" /home/hdfs/test_2.10-0.1.jar
불꽃 제출에 대한 로그 :
java.lang.IllegalArgumentException: java.net.UnknownHostException: affinio
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:665)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:601)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1007)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: affinio
... 35 more
HDFS-site.xml 파일
<property>
<name>dfs.nameservices</name>
<value>affinio</value>
</property>
<property>
<name>dfs.ha.namenodes.affinio</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.affinio.nn1</name>
<value>172.31.16.81:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.affinio.nn2</name>
<value>172.31.32.81:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.affinio.nn1</name>
<value>172.31.16.81:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.affinio.nn2</name>
<value>172.31.32.81:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>file:///nfs/dfs/ha-name-dir-shared</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.affinio</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hdfs/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/namenode</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hdfs</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>172.31.16.81:2181,172.31.32.81:2181,172.31.0.81:2181</value>
</property>
</configuration>
코어를 site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://affinio</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
스파크 쉘 conf.toDebugString
spark.app.id=20150929-173220-1361059756-5050-16026-0005
spark.app.name=Spark shell
spark.driver.host=172.31.25.67
spark.driver.port=37613
spark.executor.id=driver
spark.externalBlockStore.folderName=spark-d4bf255f-f1f3-4026-83bf-b377a24f5f2c
spark.fileserver.uri=http://172.31.25.67:54526
spark.jars=
spark.master=mesos://zk://172.31.0.81:2181,172.31.16.81:2181,172.31.32.81:2181/mesos
spark.mesos.executor.home=/spark-1.5.0-bin-hadoop2.6/
spark.repl.class.uri=http://172.31.25.67:45553
spark.submit.deployMode=client
불꽃 제출 conf.toDebugString을
spark.app.id=20150929-173220-1361059756-5050-16026-0004
spark.app.name=Simple Application
spark.driver.host=172.31.25.67
spark.driver.port=47968
spark.executor.id=driver
spark.externalBlockStore.folderName=spark-846de0d9-8bb1-414b-8b81-f2d6646a58d3
spark.fileserver.uri=http://172.31.25.67:45283
spark.jars=file:/home/hdfs/./test_2.10-0.1.jar
spark.master=mesos://zk://172.31.0.81:2181,172.31.16.81:2181,172.31.32.81:2181/mesos
spark.mesos.executor.home=/spark-1.5.0-bin-hadoop2.6/
spark.submit.deployMode=client
나는 다음과 같이 내가 그것을 실행하면 작동 할 수 있어요 :
spark-submit --files /hadoop-2.7.1/etc/hadoop/hdfs-site.xml,/hadoop-2.7.1/etc/hadoop/core-site.xml ./test_2.10-0.1.jar
그래서, 구성은 내가 /spark-1.5.0-bin-hadoop2.6/conf/spark-에 /hadoop-2.7.1/etc/hadoop/ 모든 시스템에 HADOOP_CONF_DIR을 설정하더라도, 기본적으로로드되지 않습니다 사용자 프로필 설정에서뿐만 아니라 보려면 env.sh :
고양이 /etc/profile.d/hadoop.sh
# Set path for hadoop
export HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop/
export PATH=$PATH:/hadoop-2.7.1/bin
--verbose 스위치의 출력
System properties:
spark.local.dir -> /data/spark/
SPARK_SUBMIT -> true
spark.files -> file:///hadoop-2.7.1/etc/hadoop/hdfs-site.xml,file:///hadoop-2.7.1/etc/hadoop/core-site.xml
spark.app.name -> SimpleApp
spark.jars -> file:/home/hdfs/./test_2.10-0.1.jar
spark.submit.deployMode -> client
spark.mesos.executor.home -> /spark-1.5.0-bin-hadoop2.6
spark.master -> mesos://zk://172.31.0.81:2181,172.31.16.81:2181,172.31.32.81:2181/mesos
Classpath elements:
file:/home/hdfs/./test_2.10-0.1.jar
또한 응용 프로그램 실행 프로그램에서 환경 변수를 인쇄했다
sc.parallelize(Array(1)).flatMap( v=>System.getenv ).collect.foreach(v=>println(s"${v._1}=${v._2}"))
산출:
LIBPROCESS_PORT=0
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
SPARK_EXECUTOR_MEMORY=1024m
SHLVL=1
MESOS_EXECUTOR_ID=20150930-115952-1361059756-5050-15990-S1
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
MESOS_DIRECTORY=/data/slaves/20150930-115952-1361059756-5050-15990-S1/frameworks/20150930-115952-1361059756-5050-15990-0008/executors/20150930-115952-1361059756-5050-15990-S1/runs/2baa786a-be89-4823-a248-bb35034bb2fa
MESOS_SLAVE_PID=slave(1)@172.31.32.118:5051
_SPARK_ASSEMBLY=/spark-1.5.0-bin-hadoop2.6/lib/spark-assembly-1.5.0-hadoop2.6.0.jar
SPARK_HOME=/spark-1.5.0-bin-hadoop2.6
MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos-0.24.0.so
SPARK_SCALA_VERSION=2.10
SPARK_USER=hdfs
PWD=/data/slaves/20150930-115952-1361059756-5050-15990-S1/frameworks/20150930-115952-1361059756-5050-15990-0008/executors/20150930-115952-1361059756-5050-15990-S1/runs/2baa786a-be89-4823-a248-bb35034bb2fa
SPARK_ENV_LOADED=1
MESOS_FRAMEWORK_ID=20150930-115952-1361059756-5050-15990-0008
MESOS_SLAVE_ID=20150930-115952-1361059756-5050-15990-S1
MESOS_CHECKPOINT=0
HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop/
SPARK_EXECUTOR_OPTS=
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
그래서 우리는 집행이 자신의 환경에서 HADOOP_CONF_DIR을 볼 수 있지만 여전히 spark.files를 사용하지 않고 작동하지 않습니다
최신 정보:
다운 그레이드는-불꽃 1.3.1 및 문제가 사라질 수 있습니다. 스파크 1.5 휴식 클래스 경로에서 뭔가
스파크 1.3.1 출력 :
System properties:
SPARK_SUBMIT -> true
spark.app.name -> SimpleApp
spark.jars -> file:/home/hdfs/./test_2.10-0.1.jar
spark.mesos.executor.home -> /spark-1.3.1-bin-hadoop2.6
spark.master -> mesos://zk://172.31.0.81:2181,172.31.16.81:2181,172.31.32.81:2181/mesos
Classpath elements:
file:/home/hdfs/./test_2.10-0.1.jar
집행자 환경 :
LIBPROCESS_PORT=0
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
SPARK_EXECUTOR_MEMORY=512m
SHLVL=1
MESOS_EXECUTOR_ID=20150930-115952-1361059756-5050-15990-S2
CLASSPATH=/spark-1.3.1-bin-hadoop2.6/conf:/spark-1.3.1-bin-hadoop2.6/lib/spark-assembly-1.3.1-hadoop2.6.0.jar:/spark-1.3.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/spark-1.3.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/spark-1.3.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/hadoop-2.7.1/etc/hadoop
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
MESOS_DIRECTORY=/data/slaves/20150930-115952-1361059756-5050-15990-S2/frameworks/20150930-115952-1361059756-5050-15990-0013/executors/20150930-115952-1361059756-5050-15990-S2/runs/23c38710-14d7-4550-b3f7-2879576ce1d2
MESOS_SLAVE_PID=slave(1)@172.31.18.189:5051
PYTHONPATH=/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip:/spark-1.3.1-bin-hadoop2.6/python:
SPARK_HOME=/spark-1.3.1-bin-hadoop2.6
SPARK_CONF_DIR=/spark-1.3.1-bin-hadoop2.6/conf
MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos-0.24.0.so
SPARK_SCALA_VERSION=2.10
SPARK_USER=hdfs
PWD=/data/slaves/20150930-115952-1361059756-5050-15990-S2/frameworks/20150930-115952-1361059756-5050-15990-0013/executors/20150930-115952-1361059756-5050-15990-S2/runs/23c38710-14d7-4550-b3f7-2879576ce1d2
SPARK_ENV_LOADED=1
MESOS_FRAMEWORK_ID=20150930-115952-1361059756-5050-15990-0013
MESOS_SLAVE_ID=20150930-115952-1361059756-5050-15990-S2
MESOS_CHECKPOINT=0
HADOOP_CONF_DIR=/hadoop-2.7.1/etc/hadoop
SPARK_EXECUTOR_OPTS=
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
해결법
from https://stackoverflow.com/questions/32833860/unknownhostexception-with-mesos-spark-and-custom-jar by cc-by-sa and MIT license
'HADOOP' 카테고리의 다른 글
[HADOOP] Sqoop을 가져 오기를 사용하여, 어떻게 기존 하이브 테이블에 행을 추가하려면? (0) | 2019.09.28 |
---|---|
[HADOOP] 오류 브로 - 백업 하이브 테이블을 쿼리 : java.lang.IllegalArgumentException가 (0) | 2019.09.28 |
[HADOOP] 하둡 맵리 듀스 (사) 다른 전원 / 사양 사용하여 호스트 (0) | 2019.09.28 |
[HADOOP] 어떻게 하둡 하이브에서 타임 스탬프 주어진 일주일의 첫 번째 날의 날짜를 얻으려면? (0) | 2019.09.28 |
[HADOOP] MAPR - 파일 읽기 및 쓰기 프로세스 (0) | 2019.09.28 |