hadoop에서 정렬을 구현하는 방법은 무엇입니까?

내 문제는 파일에서 값을 정렬하는 것입니다. 키와 값은 정수이며 정렬 된 값의 키를 유지해야합니다.

key   value
1     24
3     4
4     12
5     23

산출:

방대한 데이터로 작업 중이며 코드가 hadoop 컴퓨터 클러스터에서 실행되어야합니다. mapreduce로 어떻게 할 수 있습니까?

해결법

==============================

1.당신은 아마 이것을 할 수 있습니다 (저는 여기에서 자바를 사용하고 있다고 가정합니다)

당신은 아마 이것을 할 수 있습니다 (저는 여기에서 자바를 사용하고 있다고 가정합니다)

지도 에서이 같은 방출 -

context.write(24,1);
context.write(4,3);
context.write(12,4)
context.write(23,5)

따라서 정렬해야하는 모든 값은 맵 축소 작업의 핵심 요소 여야합니다. Hadoop은 기본적으로 키의 오름차순으로 정렬합니다.

따라서 내림차순으로 정렬하거나,

job.setSortComparatorClass(LongWritable.DecreasingComparator.class);

아니면 이거,

당신은 당신의 직업에서 이와 같이되는 사용자 정의 내림차순 정렬 비교기를 설정해야합니다.

public static class DescendingKeyComparator extends WritableComparator {
    protected DescendingKeyComparator() {
        super(Text.class, true);
    }

    @SuppressWarnings("rawtypes")
    @Override
    public int compare(WritableComparable w1, WritableComparable w2) {
        LongWritable key1 = (LongWritable) w1;
        LongWritable key2 = (LongWritable) w2;          
        return -1 * key1.compareTo(key2);
    }
}

Hadoop의 셔플 및 정렬 단계는 내림차순으로 키 정렬을 처리합니다. 24,4,12,23

댓글 후 :

Descending IntWritable Comparable가 필요한 경우, 하나를 생성하고 다음과 같이 사용할 수 있습니다 -

job.setSortComparatorClass(DescendingIntComparable.class);

JobConf를 사용하는 경우이 설정을 사용하여

jobConfObject.setOutputKeyComparatorClass(DescendingIntComparable.class);

main () 함수 아래에 다음 코드를 추가하십시오 -

public static void main(String[] args) {
    int exitCode = ToolRunner.run(new YourDriver(), args);
    System.exit(exitCode);
}

//this class is defined outside of main not inside
public static class DescendingIntWritableComparable extends IntWritable {
    /** A decreasing Comparator optimized for IntWritable. */ 
    public static class DecreasingComparator extends Comparator {
        public int compare(WritableComparable a, WritableComparable b) {
            return -super.compare(a, b);
        }
        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return -super.compare(b1, s1, l1, b2, s2, l2);
        }
    }
}

from https://stackoverflow.com/questions/18154686/how-to-implement-sort-in-hadoop by cc-by-sa and MIT license

'HADOOP' 카테고리의 다른 글

[HADOOP] 집계 후 스파크가있는 하이브 테이블 읽기 및 쓰기 (0)	2019.07.13
[HADOOP] OLAP CUBE를 HBase에서 처리 할 수 있습니까? (0)	2019.07.13
[HADOOP] avro-tools를 사용하여 Concat Avro 파일 (0)	2019.07.13
[HADOOP] hadoop 실행 중 오류 (0)	2019.07.13
[HADOOP] 단어를 기준으로 문자열 필터링 (0)	2019.07.13

복붙노트

[HADOOP] hadoop에서 정렬을 구현하는 방법은 무엇입니까?

hadoop에서 정렬을 구현하는 방법은 무엇입니까?

해결법

1.당신은 아마 이것을 할 수 있습니다 (저는 여기에서 자바를 사용하고 있다고 가정합니다)

'HADOOP' 카테고리의 다른 글

티스토리툴바