[HADOOP] 수정 된 SimpleShortestPathsVertex에서 GiraphRunner를 실행하는 ClassNotFoundException
HADOOP수정 된 SimpleShortestPathsVertex에서 GiraphRunner를 실행하는 ClassNotFoundException
저는 Giraph를 처음 접했고 코드에서 Giraph edit-compile-deploy 루프를 작동 시키려고합니다. http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/에서 영감을 얻은 다양한 예제를 실행할 수 있지만 수정 된 버전의 SimpleShortestPathsVertex Giraph 예제를 실행할 때 ClassNotFoundException이 발생했습니다. -libjars와 HADOOP_CLASSPATH의 다양한 조합을 시도했지만 아이디어가 없으므로 도움을 주셔서 감사합니다. 자세한 내용은 다음과 같습니다.
$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.benchmark.PageRankBenchmark \
-Dgiraph.zkList=<myhost>:2181 \
-e 1 -s 3 -v -V 50 -w 1
...
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
...
(full output is below)
$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
org.apache.giraph.examples.SimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op goutput/shortestpathsC2 \
-ca SimpleShortestPathsVertex.source=2 \
-w 1
...
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
...
(full output is below)
보너스 : 결과는 정확합니다 :
$ hadoop fs -cat goutput/shortestpathsC2/p*
0 1.0
2 2.0
1 0.0
3 1.0
4 5.0
수정 된 정점을 포함하는 jar (KdlSimpleShortestPathsVertex, 패키지 없음)는 정상입니다.
$ jar -tf ~/kdl_hadoop_play.jar
META-INF/MANIFEST.MF
KdlSimpleShortestPathsVertex.class
META-INF/
그러나 내 달리는 말다툼 :
$ hadoop jar $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
-Dgiraph.zkList=<myhost>:2181 \
-libjars ~/kdl_hadoop_play.jar \
KdlSimpleShortestPathsVertex \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /user/cornell/ginput/tiny_graph.txt \
-of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /user/cornell/goutput/shortestpathsC2 \
-ca KdlSimpleShortestPathsVertex.source=2 \
-w 1
Exception in thread "main" java.lang.ClassNotFoundException: KdlSimpleShortestPathsVertex
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.giraph.utils.ConfigurationUtils.populateGiraphConfiguration(ConfigurationUtils.java:210)
at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:147)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
... 둘러 본 후 GiraphRunner가 http://grepalex.com/2013/02/25/hadoop-libjars/ ( "코드가 GenericOptionsParser를 사용하고 있는지 확인하십시오." ). Giraph 소스를 찾아 보면 클래스에 액세스 할 수 없습니다. HADOOP_CLASSPATH를 내 항아리로 설정하려고 시도했지만 문제가 해결되지 않았습니다.
어떤 도움이라도 좋을 것입니다!
14/08/01 11:42:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:42:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:42:28 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything
14/08/01 11:42:29 INFO mapred.JobClient: Running job: job_201407291058_0015
14/08/01 11:42:30 INFO mapred.JobClient: map 0% reduce 0%
14/08/01 11:42:40 INFO mapred.JobClient: map 50% reduce 0%
14/08/01 11:42:41 INFO mapred.JobClient: map 100% reduce 0%
14/08/01 11:42:44 INFO mapred.JobClient: Job complete: job_201407291058_0015
14/08/01 11:42:44 INFO mapred.JobClient: Counters: 39
14/08/01 11:42:44 INFO mapred.JobClient: File System Counters
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of bytes read=0
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of bytes written=369846
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of read operations=0
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient: FILE: Number of write operations=0
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of bytes read=88
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of bytes written=0
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of read operations=2
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/08/01 11:42:44 INFO mapred.JobClient: HDFS: Number of write operations=1
14/08/01 11:42:44 INFO mapred.JobClient: Job Counters
14/08/01 11:42:44 INFO mapred.JobClient: Launched map tasks=2
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=15772
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:42:44 INFO mapred.JobClient: Map-Reduce Framework
14/08/01 11:42:44 INFO mapred.JobClient: Map input records=2
14/08/01 11:42:44 INFO mapred.JobClient: Map output records=0
14/08/01 11:42:44 INFO mapred.JobClient: Input split bytes=88
14/08/01 11:42:44 INFO mapred.JobClient: Spilled Records=0
14/08/01 11:42:44 INFO mapred.JobClient: CPU time spent (ms)=2230
14/08/01 11:42:44 INFO mapred.JobClient: Physical memory (bytes) snapshot=411357184
14/08/01 11:42:44 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2428895232
14/08/01 11:42:44 INFO mapred.JobClient: Total committed heap usage (bytes)=806027264
14/08/01 11:42:44 INFO mapred.JobClient: Giraph Stats
14/08/01 11:42:44 INFO mapred.JobClient: Aggregate edges=50
14/08/01 11:42:44 INFO mapred.JobClient: Aggregate finished vertices=50
14/08/01 11:42:44 INFO mapred.JobClient: Aggregate vertices=50
14/08/01 11:42:44 INFO mapred.JobClient: Current master task partition=0
14/08/01 11:42:44 INFO mapred.JobClient: Current workers=1
14/08/01 11:42:44 INFO mapred.JobClient: Last checkpointed superstep=0
14/08/01 11:42:44 INFO mapred.JobClient: Sent messages=0
14/08/01 11:42:44 INFO mapred.JobClient: Superstep=4
14/08/01 11:42:44 INFO mapred.JobClient: Giraph Timers
14/08/01 11:42:44 INFO mapred.JobClient: Input superstep (milliseconds)=238
14/08/01 11:42:44 INFO mapred.JobClient: Setup (milliseconds)=2903
14/08/01 11:42:44 INFO mapred.JobClient: Shutdown (milliseconds)=68
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 0 (milliseconds)=77
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 1 (milliseconds)=64
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 2 (milliseconds)=45
14/08/01 11:42:44 INFO mapred.JobClient: Superstep 3 (milliseconds)=43
14/08/01 11:42:44 INFO mapred.JobClient: Total (milliseconds)=3442
14/08/01 11:47:37 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
14/08/01 11:47:37 INFO utils.ConfigurationUtils: Setting custom argument [SimpleShortestPathsVertex.source] to [2] in GiraphConfiguration
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known
14/08/01 11:47:37 WARN job.GiraphConfigurationValidator: Output format edge value type is not known
14/08/01 11:47:37 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
14/08/01 11:47:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/08/01 11:47:38 INFO mapred.JobClient: Running job: job_201407291058_0017
14/08/01 11:47:39 INFO mapred.JobClient: map 0% reduce 0%
14/08/01 11:47:44 INFO mapred.JobClient: map 50% reduce 0%
14/08/01 11:47:45 INFO mapred.JobClient: map 100% reduce 0%
14/08/01 11:47:46 INFO mapred.JobClient: Job complete: job_201407291058_0017
14/08/01 11:47:46 INFO mapred.JobClient: Counters: 39
14/08/01 11:47:46 INFO mapred.JobClient: File System Counters
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of bytes read=0
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of bytes written=367068
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of read operations=0
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient: FILE: Number of write operations=0
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of bytes read=200
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of bytes written=30
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of read operations=5
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/08/01 11:47:46 INFO mapred.JobClient: HDFS: Number of write operations=2
14/08/01 11:47:46 INFO mapred.JobClient: Job Counters
14/08/01 11:47:46 INFO mapred.JobClient: Launched map tasks=2
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=8538
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/08/01 11:47:46 INFO mapred.JobClient: Map-Reduce Framework
14/08/01 11:47:46 INFO mapred.JobClient: Map input records=2
14/08/01 11:47:46 INFO mapred.JobClient: Map output records=0
14/08/01 11:47:46 INFO mapred.JobClient: Input split bytes=88
14/08/01 11:47:46 INFO mapred.JobClient: Spilled Records=0
14/08/01 11:47:46 INFO mapred.JobClient: CPU time spent (ms)=1590
14/08/01 11:47:46 INFO mapred.JobClient: Physical memory (bytes) snapshot=341344256
14/08/01 11:47:46 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2363527168
14/08/01 11:47:46 INFO mapred.JobClient: Total committed heap usage (bytes)=504758272
14/08/01 11:47:46 INFO mapred.JobClient: Giraph Stats
14/08/01 11:47:46 INFO mapred.JobClient: Aggregate edges=12
14/08/01 11:47:46 INFO mapred.JobClient: Aggregate finished vertices=5
14/08/01 11:47:46 INFO mapred.JobClient: Aggregate vertices=5
14/08/01 11:47:46 INFO mapred.JobClient: Current master task partition=0
14/08/01 11:47:46 INFO mapred.JobClient: Current workers=1
14/08/01 11:47:46 INFO mapred.JobClient: Last checkpointed superstep=0
14/08/01 11:47:46 INFO mapred.JobClient: Sent messages=0
14/08/01 11:47:46 INFO mapred.JobClient: Superstep=4
14/08/01 11:47:46 INFO mapred.JobClient: Giraph Timers
14/08/01 11:47:46 INFO mapred.JobClient: Input superstep (milliseconds)=181
14/08/01 11:47:46 INFO mapred.JobClient: Setup (milliseconds)=313
14/08/01 11:47:46 INFO mapred.JobClient: Shutdown (milliseconds)=128
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 0 (milliseconds)=57
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 1 (milliseconds)=54
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 2 (milliseconds)=36
14/08/01 11:47:46 INFO mapred.JobClient: Superstep 3 (milliseconds)=35
14/08/01 11:47:46 INFO mapred.JobClient: Total (milliseconds)=805
해결법
-
==============================
1.Hadoop 및 Giraph 소스와 함께 hadoop 스크립트를 살펴본 후 이해했다고 생각합니다. 큰 힌트는 Hadoop과 함께 libjars 옵션을 사용하여 출력의 다음 줄에서 얻은 것입니다.
Hadoop 및 Giraph 소스와 함께 hadoop 스크립트를 살펴본 후 이해했다고 생각합니다. 큰 힌트는 Hadoop과 함께 libjars 옵션을 사용하여 출력의 다음 줄에서 얻은 것입니다.
GiraphRunner는 'org.apache.hadoop.util.GenericOptionsParser.getCommandLine ()을 사용하는 대신 권장되는 org.apache.commons.cli.CommandLine을 얻기 위해 자체 ConfigurationUtils.parseArgs ()를 사용하기 때문에' libjars '옵션. 이로 인해 Hadoop의 일반 클래스 경로 처리 도구 인 CLASSPATH 및 / 또는 HADOOP_CLASSPATH가 사용되었습니다. 다음은 효과가 있습니다.
예를 들어, 내 컴퓨터에서 :
$ export GIRAPH_HOME=/share/apps/giraph $ export HADOOP_CLASSPATH=/home/<me>/kdl_hadoop_play.jar:$GIRAPH_HOME/giraph-ex.jar:$HADOOP_CLASSPATH $ export LIBJARS=/home/<me>/kdl_hadoop_play.jar,$GIRAPH_HOME/giraph-core.jar $ hadoop fs -rm -R goutput/shortestpathsC2 $ hadoop jar $GIRAPH_HOME/giraph-ex.jar org.apache.giraph.GiraphRunner \ -Dgiraph.zkList=<myhost>:2181 \ -libjars ${LIBJARS} \ KdlSimpleShortestPathsVertex \ -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \ -vip /user/cornell/ginput/tiny_graph.txt \ -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ -op /user/cornell/goutput/shortestpathsC2 \ -ca SimpleShortestPathsVertex.source=2 \ -w 1 ... $ hadoop fs -cat goutput/shortestpathsC2/p*
예상되는 결과와 결과를 제공합니다.
좀 더 일반적으로 Giraph 팀이 표준 파서를 사용하도록 코드를 변경하면 도움이됩니다.
희망이 도움이됩니다!
-
==============================
2.왜 이것이 작동하지 않는지 모르겠지만이를 해결하는 빠르고 더러운 방법이 있습니다. giraph-examples / src / main / java / org / apache / giraph / examples / 디렉토리 (SimpleShortestPath가있는 디렉토리)에 코드를 입력하십시오. 그런 다음 mvn -DskipTests --projects giraph-examples --also-make 패키지를 실행하여 giraph-examples jar을 빌드하십시오. 그런 다음 SimpleShortestPath를 파일 이름으로 바꾸어 SimpleShortestPath와 마찬가지로 프로그램을 실행하십시오. 도움이 되길 바랍니다.
왜 이것이 작동하지 않는지 모르겠지만이를 해결하는 빠르고 더러운 방법이 있습니다. giraph-examples / src / main / java / org / apache / giraph / examples / 디렉토리 (SimpleShortestPath가있는 디렉토리)에 코드를 입력하십시오. 그런 다음 mvn -DskipTests --projects giraph-examples --also-make 패키지를 실행하여 giraph-examples jar을 빌드하십시오. 그런 다음 SimpleShortestPath를 파일 이름으로 바꾸어 SimpleShortestPath와 마찬가지로 프로그램을 실행하십시오. 도움이 되길 바랍니다.
from https://stackoverflow.com/questions/25084629/classnotfoundexception-running-giraphrunner-on-a-modified-simpleshortestpathsver by cc-by-sa and MIT license
'HADOOP' 카테고리의 다른 글
[HADOOP] Windows 7 64 비트에서 Hadoop 2.7을 시작할 때 : 이름 또는 서비스를 알 수 없음 (0) | 2019.08.17 |
---|---|
[HADOOP] 원사 로그를 삭제하는 방법 (0) | 2019.08.17 |
[HADOOP] java에서 yarn api로 mapreduce 작업을 제출하는 방법 (0) | 2019.08.17 |
[HADOOP] Hadoop HDFS MapReduce 출력을 MongoDb로 (0) | 2019.08.17 |
[HADOOP] 연결이 빠른 시작을 거부했습니다. (0) | 2019.08.17 |