
[HADOOP] 완료 맵리 듀스 작업 -Taking 너무 오래


완료 맵리 듀스 작업 -Taking 너무 오래

우리는 로그 파일을 처리 할 수있는 맵리 듀스 작업을 작성했습니다. 지금부터 우리는 입력 파일의 52기가바이트 주위에 있지만 data.It를 처리하는 데 약 1 시간 걸리는 것은 우리가 작업을 감소에서 시간 초과 오류를 볼 수 있습니다 default.Often에 의해 하나의 감속기 작업을 만든 다음 그것을 다시 시작되고 완료됩니다 . 아래는 작업의 성공적인 완료에 대한 통계입니다. 친절하게 우리가 성능이 향상 될 수있는 방법을 알려 주시기 바랍니다.

File System Counters
            FILE: Number of bytes read=876100387
            FILE: Number of bytes written=1767603407
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=52222279591
            HDFS: Number of bytes written=707429882
            HDFS: Number of read operations=351
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
    Job Counters 
            Failed reduce tasks=1
            Launched map tasks=116
            Launched reduce tasks=2
            Other local map tasks=116
            Total time spent by all maps in occupied slots (ms)=9118125
            Total time spent by all reduces in occupied slots (ms)=7083783
            Total time spent by all map tasks (ms)=3039375
            Total time spent by all reduce tasks (ms)=2361261
            Total vcore-seconds taken by all map tasks=3039375
            Total vcore-seconds taken by all reduce tasks=2361261
            Total megabyte-seconds taken by all map tasks=25676640000
            Total megabyte-seconds taken by all reduce tasks=20552415744
    Map-Reduce Framework
            Map input records=49452982
            Map output records=5730971
            Map output bytes=864140911
            Map output materialized bytes=876101077
            Input split bytes=13922
            Combine input records=0
            Combine output records=0
            Reduce input groups=1082133
            Reduce shuffle bytes=876101077
            Reduce input records=5730971
            Reduce output records=5730971
            Spilled Records=11461942
            Shuffled Maps =116
            Failed Shuffles=0
            Merged Map outputs=116
            GC time elapsed (ms)=190633
            CPU time spent (ms)=4536110
            Physical memory (bytes) snapshot=340458307584
            Virtual memory (bytes) snapshot=1082745069568
            Total committed heap usage (bytes)=378565820416
    Shuffle Errors
    File Input Format Counters 
            Bytes Read=52222265669
    File Output Format Counters 
            Bytes Written=707429882

내가 감속기의 수를 증가한다면 나는 아래와 같은 classcast 예외를 얻고있다. 나는 문제가 파티션 프로그램 클래스에서 오는 것 같아요.

java.lang.Exception: java.lang.ClassCastException: com.emaar.bigdata.exchg.logs.CompositeWritable cannot be cast to org.apache.hadoop.io.Text
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: com.emaar.bigdata.exchg.logs.CompositeWritable cannot be cast to org.apache.hadoop.io.Text
    at com.emaar.bigdata.exchg.logs.ActualKeyPartitioner.getPartition(ActualKeyPartitioner.java:1)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:716)
    at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
    at com.emaar.bigdata.exchg.logs.ExchgLogsMapper.map(ExchgLogsMapper.java:56)
    at com.emaar.bigdata.exchg.logs.ExchgLogsMapper.map(ExchgLogsMapper.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) 
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

public class ActualKeyPartitioner extends Partitioner<CompositeKey, Text> {

    HashPartitioner<Text, Text> hashPartitioner = new HashPartitioner<Text, Text>();
    Text newKey = new Text();

    public int getPartition(CompositeKey key, Text value, int numReduceTasks) {

        try {
            // Execute the default partitioner over the first part of the key
            return hashPartitioner.getPartition(newKey, value, numReduceTasks);
        } catch (Exception e) {
            return (int) (Math.random() * numReduceTasks); // this would return
                                                            // a random value in
                                                            // the range
            // [0,numReduceTasks)

매퍼 코드

public class ExchgLogsMapper extends Mapper<LongWritable, List<Text>, CompositeKey, Writable> {
    String recepientAddresses = "";
    public static final String DELIVER = "DELIVER";
    public static final String RESOLVED = "Resolved";
    public static final String JUNK = "Junk E-mail";
    public static final String SEMICOLON = ";";
    public static final String FW1 = "FW: ";
    public static final String FW2 = "Fw: ";
    public static final String FW3 = "FWD: ";
    public static final String FW4 = "Fwd: ";
    public static final String FW5 = "fwd: ";
    public static final String RE1 = "RE: ";
    public static final String RE2 = "Re: ";
    public static final String RE3 = "re: ";

    Text mailType = new Text("NEW");
    Text fwType = new Text("FW");
    Text reType = new Text("RE");
    Text recepientAddr = new Text();

    public void map(LongWritable key, List<Text> values, Context context) throws IOException, InterruptedException {
        String subj = null;
        int lstSize=values.size() ;
        if ((lstSize >= 26)) {
            if (values.get(8).toString().equals(DELIVER)) {
                if (!(ExclusionList.exclusions.contains(values.get(18).toString()))) {
                    if (!(JUNK.equals((values.get(12).toString())))) {
                        subj = values.get(17).toString();
                        recepientAddresses = values.get(11).toString();
                        String[] recepientAddressArr = recepientAddresses.split(SEMICOLON);
                        if (subj.startsWith(FW1) || subj.startsWith(FW2) || subj.startsWith(FW3)
                                || subj.startsWith(FW4) || subj.startsWith(FW5)) {
                            mailType = fwType;
                            subj = subj.substring(4);
                        } else if (subj.startsWith(RE1) || subj.startsWith(RE2) || subj.startsWith(RE3)) {
                            mailType = reType;
                            subj = subj.substring(4);
                        for (int i = 0; i < recepientAddressArr.length; i++) {
                            CompositeKey ckey = new CompositeKey(subj, values.get(0).toString());
                            CompositeWritable out = new CompositeWritable(mailType, recepientAddr, values.get(18),
                            context.write(ckey, out);
//                          System.err.println(out);



  1. ==============================

    1.이 로그를 많이 기록 된 루프 내부의 감속기 코드에서 몇 sysouts이었다 그들을 제거한 후 감속기는 분의 몇 완료됩니다.!

    이 로그를 많이 기록 된 루프 내부의 감속기 코드에서 몇 sysouts이었다 그들을 제거한 후 감속기는 분의 몇 완료됩니다.!

  2. from https://stackoverflow.com/questions/37773114/mapreduce-job-taking-too-long-to-complete by cc-by-sa and MIT license