[HADOOP] Hadoop MapReduce의 감속기에서 매퍼 카운터에 액세스
HADOOPHadoop MapReduce의 감속기에서 매퍼 카운터에 액세스
감속기의 매퍼에서 카운터에 액세스해야합니다. 이 솔루션을 수행하려고했습니다. 내 WordCount 코드는 아래에 있습니다.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Cluster;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;
import java.util.StringTokenizer;
public class WordCount {
static enum TestCounters { TEST }
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
context.getCounter(TestCounters.TEST).increment(1);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
private long mapperCounter;
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
@Override
protected void setup(Context context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
Cluster cluster = new Cluster(conf);
Job currentJob = cluster.getJob(context.getJobID());
mapperCounter = currentJob.getCounters().findCounter(TestCounters.TEST).getValue();;
System.out.println(mapperCounter);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "WordCount");
job.setJarByClass(WordCount.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
이 코드는 IntelliJ에서 다음과 같은 종속성으로 실행합니다.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.1</version>
</dependency>
그러나 NoSuchFieldError : SEPERATOR가 발생하여 해결하지 못했습니다. Cluster cluster = new Cluster (conf)를 실행할 때 오류가 발생합니다. 선.
15/10/01 19:55:29 WARN mapred.LocalJobRunner: job_local482979212_0001
java.lang.NoSuchFieldError: SEPERATOR
at org.apache.hadoop.mapreduce.util.ConfigUtil.addDeprecatedKeys(ConfigUtil.java:54)
at org.apache.hadoop.mapreduce.util.ConfigUtil.loadResources(ConfigUtil.java:42)
at org.apache.hadoop.mapreduce.Cluster.<clinit>(Cluster.java:71)
at WordCount$Reduce.setup(WordCount.java:51)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
15/10/01 19:55:30 INFO mapred.JobClient: map 100% reduce 0%
15/10/01 19:55:30 INFO mapred.JobClient: Job complete: job_local482979212_0001
15/10/01 19:55:30 INFO mapred.JobClient: Counters: 20
15/10/01 19:55:30 INFO mapred.JobClient: Map-Reduce Framework
15/10/01 19:55:30 INFO mapred.JobClient: Spilled Records=16
15/10/01 19:55:30 INFO mapred.JobClient: Map output materialized bytes=410
15/10/01 19:55:30 INFO mapred.JobClient: Reduce input records=0
15/10/01 19:55:30 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
15/10/01 19:55:30 INFO mapred.JobClient: Map input records=8
15/10/01 19:55:30 INFO mapred.JobClient: SPLIT_RAW_BYTES=103
15/10/01 19:55:30 INFO mapred.JobClient: Map output bytes=372
15/10/01 19:55:30 INFO mapred.JobClient: Reduce shuffle bytes=0
15/10/01 19:55:30 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
15/10/01 19:55:30 INFO mapred.JobClient: Reduce input groups=0
15/10/01 19:55:30 INFO mapred.JobClient: Combine output records=0
15/10/01 19:55:30 INFO mapred.JobClient: Reduce output records=0
15/10/01 19:55:30 INFO mapred.JobClient: Map output records=16
15/10/01 19:55:30 INFO mapred.JobClient: Combine input records=0
15/10/01 19:55:30 INFO mapred.JobClient: CPU time spent (ms)=0
15/10/01 19:55:30 INFO mapred.JobClient: Total committed heap usage (bytes)=160432128
15/10/01 19:55:30 INFO mapred.JobClient: WordCount$TestCounters
15/10/01 19:55:30 INFO mapred.JobClient: TEST=16
15/10/01 19:55:30 INFO mapred.JobClient: File Input Format Counters
15/10/01 19:55:30 INFO mapred.JobClient: Bytes Read=313
15/10/01 19:55:30 INFO mapred.JobClient: FileSystemCounters
15/10/01 19:55:30 INFO mapred.JobClient: FILE_BYTES_WRITTEN=51594
15/10/01 19:55:30 INFO mapred.JobClient: FILE_BYTES_READ=472
그 후 jar 파일을 빌드하고 단일 노드 2.6.0 hadoop에서 실행했습니다. 여기에 다음과 같은 오류가 발생했습니다.
15/10/01 20:58:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/01 20:58:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/01 20:58:17 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/10/01 20:58:31 INFO input.FileInputFormat: Total input paths to process : 1
15/10/01 20:58:33 INFO mapreduce.JobSubmitter: number of splits:1
15/10/01 20:58:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1443718874432_0002
15/10/01 20:58:36 INFO impl.YarnClientImpl: Submitted application application_1443718874432_0002
15/10/01 20:58:36 INFO mapreduce.Job: The url to track the job: http://tolga-Aspire-5741G:8088/proxy/application_1443718874432_0002/
15/10/01 20:58:36 INFO mapreduce.Job: Running job: job_1443718874432_0002
15/10/01 20:59:22 INFO mapreduce.Job: Job job_1443718874432_0002 running in uber mode : false
15/10/01 20:59:22 INFO mapreduce.Job: map 0% reduce 0%
15/10/01 21:00:17 INFO mapreduce.Job: map 100% reduce 0%
15/10/01 21:00:20 INFO mapreduce.Job: map 0% reduce 0%
15/10/01 21:00:20 INFO mapreduce.Job: Task Id : attempt_1443718874432_0002_m_000000_0, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
15/10/01 21:00:47 INFO mapreduce.Job: Task Id : attempt_1443718874432_0002_m_000000_1, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
15/10/01 21:00:55 INFO mapreduce.Job: Task Id : attempt_1443718874432_0002_m_000000_2, Status : FAILED
Error: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
15/10/01 21:01:03 INFO mapreduce.Job: map 100% reduce 100%
15/10/01 21:01:07 INFO mapreduce.Job: Job job_1443718874432_0002 failed with state FAILED due to: Task failed task_1443718874432_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/10/01 21:01:08 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=78203
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=78203
Total vcore-seconds taken by all map tasks=78203
Total megabyte-seconds taken by all map tasks=80079872
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
이 문제를 해결하는 방법?
관심을 가져 주셔서 감사합니다.
주석 : 입력 파일이 IntelliJ 및 단일 노드 하둡 클러스터에서 사용됨
해결법
from https://stackoverflow.com/questions/32894366/accessing-a-mappers-counter-from-a-reducer-in-hadoop-mapreduce by cc-by-sa and MIT license
'HADOOP' 카테고리의 다른 글
[HADOOP] start-dfs.sh를 실행할 수 없습니다 (0) | 2019.09.08 |
---|---|
[HADOOP] ssh : 호스트 이름을 확인할 수 없습니다. 알 수없는 이름 또는 서비스 (0) | 2019.09.08 |
[HADOOP] jooq는 기존의 방언을 확장합니다. Hive 방언을 아파치하기 위해 MySQL 방언 채택 (0) | 2019.09.08 |
[HADOOP] Spark (Scala) 드라이버에서 로컬 파일 시스템으로 쓰기 및 읽기 (0) | 2019.09.08 |
[HADOOP] Spark의 Hive 2.1.1-사용해야하는 Spark 버전 (0) | 2019.09.08 |