ArrayWritable을 사용한 직렬화가 재미있는 방식으로 작동하는 것 같습니다.

나는 ArrayWritable로 작업하고 있었는데, Hadoop이 어떻게 ArrayWritable을 직렬화 하는지를 점검해야 할 필요가 있었는데, 이것은 job.setNumReduceTasks (0)로 설정 한 것이다.

0    IntArrayWritable@10f11b8
3    IntArrayWritable@544ec1
6    IntArrayWritable@fe748f
8    IntArrayWritable@1968e23
11    IntArrayWritable@14da8f4
14    IntArrayWritable@18f6235

이것은 내가 사용하고있는 테스트 매퍼입니다 :

public static class MyMapper extends Mapper<LongWritable, Text, LongWritable, IntArrayWritable> {

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        int red = Integer.parseInt(value.toString());
        IntWritable[] a = new IntWritable[100];

        for (int i =0;i<a.length;i++){
            a[i] = new IntWritable(red+i);
        }

        IntArrayWritable aw = new IntArrayWritable();
        aw.set(a);
        context.write(key, aw);
    }
}

IntArrayWritable는, javadoc : ArrayWritable로 지정된 예를 나타냅니다.

import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;

public class IntArrayWritable extends ArrayWritable {
    public IntArrayWritable() {
        super(IntWritable.class);
    }
}

실제로 Hadoop의 소스 코드를 확인한 결과 나에게 의미가 없습니다. ArrayWritable은 클래스 이름을 serialize해서는 안되며 100 IntWritable의 배열을 6/7 16 진수 값을 사용하여 직렬화 할 수있는 방법이 없습니다. 응용 프로그램은 실제로 잘 작동하는 것처럼 보이고 감속기는 올바른 값을 deserialize ... 무슨 일 이니? 내가 뭘 놓치고 있니?

해결법

==============================
1.문제는 MapReduce 작업에서 얻는 결과가 해당 데이터의 직렬화 된 버전이 아니라는 것입니다. 꽤 인쇄 된 문자열로 변환되는 것입니다.

문제는 MapReduce 작업에서 얻는 결과가 해당 데이터의 직렬화 된 버전이 아니라는 것입니다. 꽤 인쇄 된 문자열로 변환되는 것입니다.

축소 기의 수를 0으로 설정하면 매퍼가 출력 형식을 거쳐 데이터를 형식화하여 읽을 수있는 문자열로 변환 할 수 있습니다. 감속기에 의해 픽업 될 것처럼 직렬화되어 있지 않습니다.

==============================

2.기본 toString () 메서드를 재정의해야합니다.

기본 toString () 메서드를 재정의해야합니다.

이것은 사람이 읽을 수있는 형식을 만들기 위해 TextOutputFormat에 의해 호출됩니다.

다음 코드를 시험해보고 결과를 확인하십시오.

public class IntArrayWritable extends ArrayWritable {
    public IntArrayWritable() {
        super(IntWritable.class);
    }

    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder();
        for (String s : super.toStrings())
        {
            sb.append(s).append(" ");
        }
        return sb.toString();
    }
}

==============================
3.SequenceFileInputFormat과 SequenceFileOutputFormat을 살펴 보았습니까? 다음과 같이 설정할 수 있습니다.

SequenceFileInputFormat과 SequenceFileOutputFormat을 살펴 보았습니까? 다음과 같이 설정할 수 있습니다.
```
job.setInputFormatClass(SequenceFileInputFormat.class); 
```
과
```
job.setOutputFormatClass(TextOutputFormat.class);
```
==============================
4.매우 간단합니다. Hadoop은 write (DataOutput out) 메서드를 사용하여 직렬화 된 버전으로 객체를 작성합니다 (자세한 내용은 hadoop ArrayWritable doc 참조). IntArrayWritable로 ArrayWritable을 확장하면 자신의 클래스가 상속 된 클래스에서 이러한 메서드를 사용합니다. 안녕.

매우 간단합니다. Hadoop은 write (DataOutput out) 메서드를 사용하여 직렬화 된 버전으로 객체를 작성합니다 (자세한 내용은 hadoop ArrayWritable doc 참조). IntArrayWritable로 ArrayWritable을 확장하면 자신의 클래스가 상속 된 클래스에서 이러한 메서드를 사용합니다. 안녕.

from https://stackoverflow.com/questions/7919035/serialization-using-arraywritable-seems-to-work-in-a-funny-way by cc-by-sa and MIT license

'HADOOP' 카테고리의 다른 글

[HADOOP] Hadoop 에코 시스템 용 호스트 파일을 구성하는 방법 (0)	2019.06.19
[HADOOP] Hive에서 'InputFormat, OutputFormat'및 'Stored as'의 차이점은 무엇입니까? (0)	2019.06.19
[HADOOP] Hive 컴파일러가 생성 한 MapReduce 작업 소스 코드는 어떻게 얻을 수 있습니까? (0)	2019.06.19
[HADOOP] Hadoop dfs 복제 (0)	2019.06.18
[HADOOP] Spio 작업이 java.io.NotSerializableException 때문에 실패했습니다 : org.apache.spark.SparkContext (0)	2019.06.18

복붙노트

[HADOOP] ArrayWritable을 사용한 직렬화가 재미있는 방식으로 작동하는 것 같습니다.

ArrayWritable을 사용한 직렬화가 재미있는 방식으로 작동하는 것 같습니다.

해결법

1.문제는 MapReduce 작업에서 얻는 결과가 해당 데이터의 직렬화 된 버전이 아니라는 것입니다. 꽤 인쇄 된 문자열로 변환되는 것입니다.

2.기본 toString () 메서드를 재정의해야합니다.

3.SequenceFileInputFormat과 SequenceFileOutputFormat을 살펴 보았습니까? 다음과 같이 설정할 수 있습니다.

'HADOOP' 카테고리의 다른 글

티스토리툴바