복붙노트

[HADOOP] 원사로 구성된 원격 클러스터에 mapreduce 작업을 제출하는 방법은 무엇입니까?

HADOOP

원사로 구성된 원격 클러스터에 mapreduce 작업을 제출하는 방법은 무엇입니까?

이클립스에서 간단한 mapreduce 프로그램을 실행하려고합니다. 다음은 내 프로그램입니다.

package wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://quickstart.cloudera:8020");
        conf.set("mapreduce.framework.name", "yarn");
        conf.set("yarn.resourcemanager.address", "quickstart.cloudera:8032");
        conf.set("yarn.app.mapreduce.am.staging-dir", "/user");
        Job job = Job.getInstance(conf);
        job.setJarByClass(WordCount.class);
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-mapreduce-client-app-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-yarn-common-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-common-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-yarn-api-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-mapreduce-client-core-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.7.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/commons-logging-1.2.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/guava-15.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/commons-collections-3.2.2.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/protobuf-java-2.5.0.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/commons-configuration-1.7.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/commons-lang-2.6.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/log4j-1.2.16.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/slf4j-api-1.7.5.jar"));
        job.addFileToClassPath(new Path("/user/cloudera/prasad/jars/slf4j-log4j12-1.7.5.jar"));
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path("/user/cloudera/prasad/test.txt"));
        FileOutputFormat.setOutputPath(job, new Path("/user/cloudera/prasad/wordout2"));
        job.waitForCompletion(true);
    }
}

처음에 위의 프로그램을 실행할 때 컨테이너 로그에 ClassNotFoundExceptions이 발생하여 프로그램에 작성된 모든 해당 jar을 추가했습니다. 이제 컨테이너 로그에 오류가 표시되지 않지만 mapreduce 작업이 실패합니다.

그러나 리소스 관리자는 아래 오류를 표시합니다

Exception from container-launch with container ID: container_1473338609943_0003_01_000001 and exit code: 1
ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
    at org.apache.hadoop.util.Shell.run(Shell.java:478)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

응용 프로그램 로그를 클릭하면 다음 메시지 만 표시하는 것이 표시되지 않습니다.

 Log Type: stderr

Log Upload Time: Thu Sep 08 05:26:35 -0700 2016

Log Length: 243

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.


Log Type: stdout

Log Upload Time: Thu Sep 08 05:26:35 -0700 2016

Log Length: 0 

내 프로그램에 어떤 문제가 있는지 알려주십시오.

다음은 내가 사용하는 lib의 화면입니다.

해결법

    from https://stackoverflow.com/questions/39396103/how-to-sumit-a-mapreduce-job-to-remote-cluster-configured-with-yarn by cc-by-sa and MIT license