하둡-출력 키 / 값 구분 기호

Output Separator를;로 변경하고 싶습니다. 탭 대신. 나는 이미 시도했다 : 하둡 : 키와 값은 출력 파일에서 탭으로 구분됩니다. 세미콜론으로 구분하는 방법은 무엇입니까? 하지만 여전히 내 출력은

key (tab) value

Cloudera Demo (CDH 4.1.3)를 사용하고 있습니다. 내 코드는 다음과 같습니다.

Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: Driver <in> <out>");
            System.exit(2);
        }
        conf.set("mapreduce.textoutputformat.separator", ";");

        Path in = new Path(otherArgs[0]);
        Path out = new Path(otherArgs[1]);

        Job job= new Job(getConf());
        job.setJobName("MapReduce");

        job.setMapperClass(Mapper.class);
        job.setReducerClass(Reducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, in);
        FileOutputFormat.setOutputPath(job, out);

        job.setJarByClass(Driver.class);
        job.waitForCompletion(true) ? 0 : 1;

내가 원하는

key;value

내 출력으로.

해결법

==============================
1.이 속성을 mapreduce.output.textoutputformat.separator라고합니다. 따라서 기본적으로 출력이 누락되었습니다.

이 속성을 mapreduce.output.textoutputformat.separator라고합니다. 따라서 기본적으로 출력이 누락되었습니다.

Apache SVN에있는 최신 트렁크 소스 코드에서이를 확인할 수 있습니다.

==============================

2.conf.set ( "mapred.textoutputformat.separator", ";");를 사용해야합니다. conf.set ( "mapreduce.textoutputformat.separator", ";") 대신;

conf.set ( "mapred.textoutputformat.separator", ";");를 사용해야합니다. conf.set ( "mapreduce.textoutputformat.separator", ";") 대신;

매핑 및 맵 감소

링크

전체 코드 :이 작동합니다.

    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
        System.err.println("Usage: Driver <in> <out>");
        System.exit(2);
    }
    conf.set("mapred.textoutputformat.separator", ";");

    Path in = new Path(otherArgs[0]);
    Path out = new Path(otherArgs[1]);

    Job job= new Job(getConf());
    job.setJobName("MapReduce");

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.setInputPaths(job, in);
    FileOutputFormat.setOutputPath(job, out);

    job.setJarByClass(Driver.class);
    job.waitForCompletion(true) ? 0 : 1;

==============================
3.2017 년에는 getConf (). set (TextOutputFormat.SEPARATOR, ";");

2017 년에는 getConf (). set (TextOutputFormat.SEPARATOR, ";");

기본 상수를 사용하면 유지 관리 성이 향상되고 믿기 어려워집니다.

중요 사항 :이 특성은 작업이 매개 변수를 복사하고 추가 conf 수정을 고려하지 않으므로 Job.getInstance (getConf ()) / new Job (getConf ()) 전에 설정해야합니다.

from https://stackoverflow.com/questions/16614029/hadoop-output-key-value-separator by cc-by-sa and MIT license

'HADOOP' 카테고리의 다른 글

[HADOOP] 공동 배치 된 조인 (a-la-netezza)이 이론적으로 하이브에서 가능합니까? (0)	2019.09.09
[HADOOP] 하이브 테이블을 생성하는 동안 <EOF> 오류를 예상하는 일치하지 않는 입력 'ROW' (0)	2019.09.09
[HADOOP] 하둡 작업 제출 (0)	2019.09.09
[HADOOP] ResourceManager 및 NodeManager를 시작하는 Hadoop 오류 (0)	2019.09.09
[HADOOP] "bin / mkdistro.sh -DskipTests"를 실행하는 중 Oozie 오류 (0)	2019.09.09

복붙노트

[HADOOP] 하둡-출력 키 / 값 구분 기호

하둡-출력 키 / 값 구분 기호

해결법

1.이 속성을 mapreduce.output.textoutputformat.separator라고합니다. 따라서 기본적으로 출력이 누락되었습니다.

2.conf.set ( "mapred.textoutputformat.separator", ";");를 사용해야합니다. conf.set ( "mapreduce.textoutputformat.separator", ";") 대신;

3.2017 년에는 getConf (). set (TextOutputFormat.SEPARATOR, ";");

'HADOOP' 카테고리의 다른 글

티스토리툴바