출력을 생성하지 않는 하둡

이전 API를 사용하여 hadoop 작업을 실행하고 구현을 새 API로 옮겼으며 실행하는 데 문제가 있습니다. 작업이 실행될 때 예외가 발생하지 않지만 출력 파일이 생성되지 않습니다. 이전 API에서는 정렬 된 결과 목록이있는 출력 파일을 생성합니다. 이것은 실행중인 작업입니다.

Configuration config = new Configuration();
Job job = Job.getInstance(config, "sorting");

job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);

job.setMapperClass(SortMapper.class);
job.setCombinerClass(SortReducer.class);
job.setReducerClass(SortReducer.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.setInputPaths(job, new Path(inputFileLocation));
FileOutputFormat.setOutputPath(job, new Path(outputFileLocation));

job.setJarByClass(HadoopTest.class);

long startTime = System.currentTimeMillis();
job.submit();
long endTime = System.currentTimeMillis();

long duration = endTime - startTime;
System.out.println("Duration: " + duration);

이것은 내 매퍼 impl입니다.

public static class SortMapper extends MultithreadedMapper<LongWritable, Text, IntWritable, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private IntWritable intKey = new IntWritable();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        intKey.set(Integer.parseInt(value.toString()));
        context.write(intKey, one);
    }
}

이것은 내 감속기입니다.

public static class SortReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
    @Override
    protected void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        Iterator<IntWritable> iterator = values.iterator();
        while (iterator.hasNext()) {
            sum += iterator.next().get();
        }
        context.write(key, new IntWritable(sum));
    }
}

로그는 다음과 같이 나타납니다 (이전 API로 실행할 때 항상 "영역 매핑 정보를로드 할 수 없습니다 ..."및 "기본 하둡을로드 할 수 없습니다 ..."에 대한 불만이있었습니다.

2014-03-18 10:19:41.299 java[13311:1d03] Unable to load realm mapping info from SCDynamicStore
14/03/18 10:19:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/03/18 10:19:41 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
14/03/18 10:19:41 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/03/18 10:19:41 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
14/03/18 10:19:41 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
14/03/18 10:19:41 INFO input.FileInputFormat: Total input paths to process : 2
14/03/18 10:19:41 INFO mapreduce.JobSubmitter: number of splits:2
14/03/18 10:19:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local904621238_0001
14/03/18 10:19:42 WARN conf.Configuration: file:/tmp/hadoop-james.mchugh/mapred/staging/james.mchugh904621238/.staging/job_local904621238_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/03/18 10:19:42 WARN conf.Configuration: file:/tmp/hadoop-james.mchugh/mapred/staging/james.mchugh904621238/.staging/job_local904621238_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/03/18 10:19:42 WARN conf.Configuration: file:/tmp/hadoop-james.mchugh/mapred/local/localRunner/james.mchugh/job_local904621238_0001/job_local904621238_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/03/18 10:19:42 WARN conf.Configuration: file:/tmp/hadoop-james.mchugh/mapred/local/localRunner/james.mchugh/job_local904621238_0001/job_local904621238_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/03/18 10:19:42 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/03/18 10:19:42 INFO mapred.LocalJobRunner: OutputCommitter set in config null

해결법

==============================
1.job.waitForCompletion (true)을 시도하십시오. job.submit (); 대신. 로컬에서 mapreduce를 실행 중이므로 JUnit이 로컬 작업 추적기를 종료하기 전에 결과를 기다려야합니다.

job.waitForCompletion (true)을 시도하십시오. job.submit (); 대신. 로컬에서 mapreduce를 실행 중이므로 JUnit이 로컬 작업 추적기를 종료하기 전에 결과를 기다려야합니다.

from https://stackoverflow.com/questions/22476852/hadoop-producing-no-output by cc-by-sa and MIT license

'HADOOP' 카테고리의 다른 글

[HADOOP] Custom WritableCompare는 객체 참조를 출력으로 표시합니다 (0)	2019.09.17
[HADOOP] SSH를 사용하여 Java 앱에서 Hadoop에 연결 (0)	2019.09.17
[HADOOP] wordcount hadoop 출력 결과를 다시 정렬하고 값별로 정렬하는 방법 (0)	2019.09.16
[HADOOP] 인수로 전달하지 않고 Hive UDF 내의 하이브 변수에 액세스하는 방법 (0)	2019.09.16
[HADOOP] dataFrameWriter partition으로 데이터를 섞습니까? (0)	2019.09.16

복붙노트

[HADOOP] 출력을 생성하지 않는 하둡

출력을 생성하지 않는 하둡

해결법

1.job.waitForCompletion (true)을 시도하십시오. job.submit (); 대신. 로컬에서 mapreduce를 실행 중이므로 JUnit이 로컬 작업 추적기를 종료하기 전에 결과를 기다려야합니다.

'HADOOP' 카테고리의 다른 글

티스토리툴바