Hadoop에서 Java 객체를 직렬화하는 방법은 무엇입니까?

Object는 Hadoop에서 전송 될 때 직렬화되도록 Writable 인터페이스를 구현해야합니다. Lucene ScoreDoc 클래스를 예로 들어 보겠습니다.

public class ScoreDoc implements java.io.Serializable {

  /** The score of this document for the query. */
  public float score;

  /** Expert: A hit document's number.
   * @see Searcher#doc(int) */
  public int doc;

  /** Only set by {@link TopDocs#merge} */
  public int shardIndex;

  /** Constructs a ScoreDoc. */
  public ScoreDoc(int doc, float score) {
    this(doc, score, -1);
  }

  /** Constructs a ScoreDoc. */
  public ScoreDoc(int doc, float score, int shardIndex) {
    this.doc = doc;
    this.score = score;
    this.shardIndex = shardIndex;
  }

  // A convenience method for debugging.
  @Override
  public String toString() {
    return "doc=" + doc + " score=" + score + " shardIndex=" + shardIndex;
  }
}

Writable 인터페이스로 어떻게 직렬화해야합니까? Writable과 java.io.serializable 인터페이스 사이의 연결은 무엇입니까?

해결법

==============================

1.나는 내장 된 Lucene 클래스를 조작하는 것이 좋은 생각이 아닐 것이라고 생각합니다. 대신 ScoreDoc 유형의 필드를 포함 할 수있는 클래스를 가지고 인터페이스에 Hadoop 쓰기 가능을 구현하십시오. 그것은 다음과 같이 될 것입니다 :

나는 내장 된 Lucene 클래스를 조작하는 것이 좋은 생각이 아닐 것이라고 생각합니다. 대신 ScoreDoc 유형의 필드를 포함 할 수있는 클래스를 가지고 인터페이스에 Hadoop 쓰기 가능을 구현하십시오. 그것은 다음과 같이 될 것입니다 :

public class MyScoreDoc implements Writable  {      

  private ScoreDoc sd;

  public void write(DataOutput out) throws IOException {
      String [] splits = sd.toString().split(" ");

      // get the score value from the string
      Float score = Float.parseFloat((splits[0].split("="))[1]);

      // do the same for doc and shardIndex fields
      // ....    

      out.writeInt(score);
      out.writeInt(doc);
      out.writeInt(shardIndex);
  }

  public void readFields(DataInput in) throws IOException {
      float score = in.readInt();
      int doc = in.readInt();
      int shardIndex = in.readInt();

      sd = new ScoreDoc (score, doc, shardIndex);
  }

  //String toString()
}

==============================
2.먼저 Hadoop을 참조하십시오 : Java 직렬화를 사용할 수있는 Writable 인터페이스없이 출력 값으로 객체를 갖는 쉬운 방법 또는

먼저 Hadoop을 참조하십시오 : Java 직렬화를 사용할 수있는 Writable 인터페이스없이 출력 값으로 객체를 갖는 쉬운 방법 또는

http://developer.yahoo.com/hadoop/tutorial/module5.html을 참조하십시오. 쓰기 및 읽기 기능을 필요로합니다. 내부가 API를 호출하여 int, flaot, string 등을 읽고 쓸 수 있습니다.

Writable을 사용한 귀하의 예 (가져 오기 필요)
```
public class ScoreDoc implements java.io.Serializable, Writable  {      
    /** The score of this document for the query. */
    public float score;//... as in above

  public void write(DataOutput out) throws IOException {
      out.writeInt(score);
      out.writeInt(doc);
      out.writeInt(shardIndex);
  }

  public void readFields(DataInput in) throws IOException {
      score = in.readInt();
      doc = in.readInt();
      shardIndex = in.readInt();    
  }

  //rest toStirng etc
}
```
참고 : 쓰기 및 읽기 순서는 동일하거나 값이 서로 같아야하며, 서로 다른 유형을 사용하면 읽는 동안 직렬화 오류가 발생합니다.

from https://stackoverflow.com/questions/16837640/how-to-serialize-an-java-object-in-hadoop by cc-by-sa and MIT license

'HADOOP' 카테고리의 다른 글

[HADOOP] 슬레이브 머신의 DiskErrorException - Hadoop multinode (0)	2019.06.27
[HADOOP] 하둡의 입력 분할 - 어떻게 작동하나요? (0)	2019.06.27
[HADOOP] 모든 쿼리에 대해 빈 결과 집합을 반환하는 하이브 테이블 (0)	2019.06.27
[HADOOP] Spark를 사용하여 디렉토리에서 여러 파일 읽기 (0)	2019.06.27
[HADOOP] ReduceByKey를 사용하여 값 목록 그룹화 (0)	2019.06.27

복붙노트

[HADOOP] Hadoop에서 Java 객체를 직렬화하는 방법은 무엇입니까?

Hadoop에서 Java 객체를 직렬화하는 방법은 무엇입니까?

해결법

2.먼저 Hadoop을 참조하십시오 : Java 직렬화를 사용할 수있는 Writable 인터페이스없이 출력 값으로 객체를 갖는 쉬운 방법 또는

'HADOOP' 카테고리의 다른 글

티스토리툴바