MapReduce WordCount 프로그램 - 출력이 입력 파일과 동일합니다.

예상 한 출력은 입력 파일의 모든 단어 수입니다. 하지만 내 출력은 전체 입력 파일입니다. 내 Mapper 를 Reducer 클래스의 Reducer 에 사용하고 있습니다. 여기 내 코드가있다.

driver.java 공용 클래스 드라이버 확장 된 구성 도구 { public int run (String [] args) 예외를 throw합니다. { 구성 conf = 새 구성 (); 작업 작업 = 새 작업 (conf, "wordcount"); job.setMapperClass (mapper.class); job.setReducerClass (reducer.class); job.setOutputKeyClass (Text.class); job.setOutputValueClass (Text.class); job.setInputFormatClass (KeyValueTextInputFormat.class); FileInputFormat.addInputPath (job, newPath (args [0])); FileOutputFormat.setOutputPath (job, 새 경로 (args [1])); job.waitForCompletion (true); //JobClient.runJob((JobConf) conf); //System.exit(job.waitForCompletion(true)? 0 : 1); 0을 반환; } public static void main (String [] args) 예외를 throw합니다. { 긴 시작 = System.currentTimeMillis (); // int res = ToolRunner.run (새 Configuration (), 새 드라이버 (), args); int res = ToolRunner.run (새 Configuration (), 새 드라이버 (), args); 긴 stop = System.currentTimeMillis (); System.out.println ( "Time :"+ (stop-start)); System.exit (res); } }

mapper.java

공용 클래스 매퍼는 매퍼를 확장합니다. { // hadoop 지원 데이터 유형 개인 최종 정적 정적 IntWritable 하나 = 새로운 IntWritable (1); 개인 텍스트 단어 = 새 텍스트 (); // 토큰 라이어 작업을 수행하고 초기 키 값 쌍 프레이밍하는 메소드 맵핑 공개 무효화 맵 (LongWritable 키, 텍스트 값, OutputCollector <텍스트, IntWritable> 출력, 리포터 리포터)은 IOException을 던집니다. { String line = value.toString (); StringTokenizer 토크 나이저 = 새로운 StringTokenizer (라인); while (tokenizer.hasMoreTokens ()) { word.set (tokenizer.nextToken ()); output.collect (word, one); } } } 감속기 .java 공용 클래스 감속기는 감속기 { // reduce 메서드는 매퍼에서 키 값 쌍을 받아들이고 키를 기반으로 집계를 수행하고 최종 출력을 생성합니다. public void reduce (텍스트 키, Iterator 값, OutputCollector <텍스트, IntWritable> 출력, Reporter 리포터) throws IOException { int sum = 0; 동안 (values.hasNext ()) { sum + = values.next (). get (); } output.collect (key, new IntWritable (sum)); } }

해결법

==============================
1.MapReduce의 새롭고 오래된 API에 당황 스럽다. 나는 당신이 새로운 API로 WordCount 프로그램을 작성하려고했지만, 오래된 API (아마도 오래된 블로그 포스트)에서 발췌 문장을 가져 왔다고 생각한다. @override 주석을 map & reduce 메소드에 모두 추가하면 문제를 직접 발견 할 수 있습니다.

MapReduce의 새롭고 오래된 API에 당황 스럽다. 나는 당신이 새로운 API로 WordCount 프로그램을 작성하려고했지만, 오래된 API (아마도 오래된 블로그 포스트)에서 발췌 문장을 가져 왔다고 생각한다. @override 주석을 map & reduce 메소드에 모두 추가하면 문제를 직접 발견 할 수 있습니다.

진화 후 어떻게되는지보십시오.

방금 이전 서명을 지정하는 두 가지 새로운 메서드를 작성 했으므로 아무 것도 재정의하지 않고 아무데도 호출하지 않습니다. 실제 메소드 호출시 빈 바디 (body)가 생기기 때문에 코드는 아무런 일도하지 않고있다. (디폴트 구현이 있다고 생각하지 않는다.

어쨌든 코딩을위한 기본 규칙을 따라야합니다.

==============================

2.이 시도,

이 시도,

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;


public class WordCount  {

    public static class Map extends MapReduceBase implements
            Mapper<LongWritable, Text, Text, IntWritable> {

        @Override
        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {

            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            System.out.println(line);
            while (tokenizer.hasMoreTokens()) {
                value.set(tokenizer.nextToken());
                output.collect(value, new IntWritable(1));
            }

        }
    }

    public static class Reduce extends MapReduceBase implements
            Reducer<Text, IntWritable, Text, IntWritable> {

        @Override
        public void reduce(Text key, Iterator<IntWritable> values,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {
            int sum = 0;
            while (values.hasNext()) {
                sum += values.next().get();
            }

            output.collect(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception,IOException  {

        JobConf conf = new JobConf(WordCount.class);
        conf.setJobName("WordCount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path("/home/user17/test.txt"));
        FileOutputFormat.setOutputPath(conf, new Path("hdfs://localhost:9000/out2"));

        JobClient.runJob(conf);

    }
}

jar를 만들고 commandLine에서 주어진 명령을 실행하십시오.

hadoop jar WordCount.jar WordCount /inputfile /outputfile

==============================

3.코드에 문제가있는 경우이 코드를 실행하십시오.이 코드에는 매퍼, 축소 기 및 주요 기능이 포함되어 있습니다.

코드에 문제가있는 경우이 코드를 실행하십시오.이 코드에는 매퍼, 축소 기 및 주요 기능이 포함되어 있습니다.

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;    
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class WordCount {

  public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
       String line = value.toString();
       StringTokenizer tokenizer = new StringTokenizer(line);

       while (tokenizer.hasMoreTokens()) {
              word.set(tokenizer.nextToken());
              output.collect(word, one);
       }
   }
}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

   public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

       int sum = 0;     
       while (values.hasNext()){
          sum += values.next().get();
       }
      output.collect(key, new IntWritable(sum)); 
    }
}

public static void main(String[] args) throws Exception {
     JobConf conf = new JobConf(WordCount.class);
     conf.setJobName("wordcount");
     conf.setOutputKeyClass(Text.class);
     conf.setOutputValueClass(IntWritable.class);
     conf.setMapperClass(Map.class);
     conf.setCombinerClass(Reduce.class); 
     conf.setReducerClass(Reduce.class);
     conf.setInputFormat(TextInputFormat.class); 
     conf.setOutputFormat(TextOutputFormat.class);

     FileInputFormat.setInputPaths(conf, new Path(args[0])); 
     FileOutputFormat.setOutputPath(conf, new Path(args[1]));

     JobClient.runJob(conf);
  }
}

2) 그런 다음이 코드의 jar 파일을 작성하여 홈 디렉토리 (/home/user/wordcount.jar)에 저장된 wordcount.jar을 실행하고 다음 명령을 실행하십시오.

hadoop jar wordcount.jar classname /inputfile /outputfile /

그러면 hadoop의 / (루트) 디렉토리 아래에 파일 출력 파일이 생성됩니다. 결과보기

hadoop dfs -cat /outputfile/part-m-00000

이렇게하면 단어 수 계산 프로그램이 성공적으로 실행됩니다.

from https://stackoverflow.com/questions/26710866/mapreduce-wordcount-program-output-is-same-as-the-input-file by cc-by-sa and MIT license

'HADOOP' 카테고리의 다른 글

[HADOOP] 로깅 정보없이 하이브 스크립트의 결과를 파일로 내보내기 (0)	2019.08.01
[HADOOP] Hadoop MapReduce : Mapper-Reducer의 키로 두 값 (0)	2019.08.01
[HADOOP] CDH4 jar 파일에서 Hadoop Eclipse 라이브러리 빌드 (0)	2019.08.01
[HADOOP] hadoop의 부분 정렬, 전체 정렬 및 2 차 정렬 간의 차이점 (0)	2019.08.01
[HADOOP] 하둡 kerberos 티켓 자동 갱신 (0)	2019.08.01

복붙노트

[HADOOP] MapReduce WordCount 프로그램 - 출력이 입력 파일과 동일합니다.

MapReduce WordCount 프로그램 - 출력이 입력 파일과 동일합니다.

해결법

2.이 시도,

3.코드에 문제가있는 경우이 코드를 실행하십시오.이 코드에는 매퍼, 축소 기 및 주요 기능이 포함되어 있습니다.

'HADOOP' 카테고리의 다른 글

티스토리툴바