Hadoop-> Mapper-> 우리가 어떻게 주어진 입력 경로에서 각 파일에서만 상위 N 행을 읽을 수 있습니까?

내가 하둡에 새로운 오전, 내 요구 사항은 내가 각 입력 파일 만 처음 10 개 행을 처리 할 필요가있다. 어떻게 각 파일의 10 개 행을 읽은 후 매퍼를 종료합니다.

사람이 몇 가지 예제 코드를 제공 할 수있는 경우, 그것은 큰 도움이 될 것입니다.

사전에 감사합니다.

해결법

==============================
1.당신은 당신의 매퍼의 실행 메소드를 오버라이드 (override) 할 수 있으며,지도 루프 10 번 반복 한 후에는 while 루프에서 중단 할 수 있습니다. 이것은 그렇지 않으면 당신은 각 분할에서 처음 10 선을 얻을 것이다, 당신의 파일을 splitable되지 않습니다 가정합니다 :

당신은 당신의 매퍼의 실행 메소드를 오버라이드 (override) 할 수 있으며,지도 루프 10 번 반복 한 후에는 while 루프에서 중단 할 수 있습니다. 이것은 그렇지 않으면 당신은 각 분할에서 처음 10 선을 얻을 것이다, 당신의 파일을 splitable되지 않습니다 가정합니다 :
```
@Override
public void run(Context context) throws IOException, InterruptedException {
  setup(context);

  int rows = 0;
  while (context.nextKeyValue()) {
    if (rows++ == 10) {
      break;
    }

    map(context.getCurrentKey(), context.getCurrentValue(), context);
  }

  cleanup(context);
}
```

==============================

2.N = 10, 우리는 다음과 같이 파일에서 10 레코드를 읽고 다음과 같은 코드를 사용할 수 있습니다 가정 : 1 호선 2 호선 . . . line20

N = 10, 우리는 다음과 같이 파일에서 10 레코드를 읽고 다음과 같은 코드를 사용할 수 있습니다 가정 : 1 호선 2 호선 . . . line20

   //mapper
   class Mapcls extends Mapper<LongWritable, Text, Text, NullWritable> 
   {
    public void run(Context con) throws IOException, InterruptedException
    {
        setup(con);
        int rows = 0;
        while(con.nextKeyValue())
        {
            if(rows++ == 10)
            {
                break;
            }
            map(con.getCurrentKey(), con.getCurrentValue(), con);
        }

        cleanup(con);
     }

    public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException
     {
        con.write(value, NullWritable.get());
     }
    }


    //driver
    public class Testjob extends Configured implements Tool
    {

     @Override
     public int run(String[] args) throws Exception 
     {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "Test-job");
        job.setJobName("tst001");
        job.setJarByClass(getClass());

        job.setMapperClass(Mapcls.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(NullWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        return job.waitForCompletion(true) ? 0 : 1;
      }

      public static void main(String[] args) throws Exception
      {
        int rc = ToolRunner.run(new Configuration(), new Testjob(), args);
        System.exit(rc);
      }
    }

다음에 출력 될 것이다 번행 라인 1 라인 2 라인 3 행 4 행 5 LINE6 line7 line8 line9

from https://stackoverflow.com/questions/20009648/hadoop-mapper-how-can-we-read-only-top-n-rows-from-each-file-from-given-input by cc-by-sa and MIT license

'HADOOP' 카테고리의 다른 글

[HADOOP] 작업은 예 하둡 0.23.0에서 실행 매달려 때 (0)	2019.09.22
[HADOOP] 오류 KeyProviderCache : 키와 URI를 찾을 수 없습니다 (0)	2019.09.22
[HADOOP] HiveQL에서 "ALTER TABLE IF는 t2에서는 TO T1의 RENAME 존재"? (0)	2019.09.22
[HADOOP] 불꽃 폭발 구조체 (0)	2019.09.22
[HADOOP] CDH 커뮤니티 에디션 롤링은 5.7에서 5.13로 업그레이드 (0)	2019.09.22

복붙노트

[HADOOP] Hadoop-> Mapper-> 우리가 어떻게 주어진 입력 경로에서 각 파일에서만 상위 N 행을 읽을 수 있습니까?

Hadoop-> Mapper-> 우리가 어떻게 주어진 입력 경로에서 각 파일에서만 상위 N 행을 읽을 수 있습니까?

해결법

2.N = 10, 우리는 다음과 같이 파일에서 10 레코드를 읽고 다음과 같은 코드를 사용할 수 있습니다 가정 : 1 호선 2 호선 . . . line20

'HADOOP' 카테고리의 다른 글

티스토리툴바