[HADOOP] Nutch SOLR의 dataimport 핸들러?
HADOOPNutch SOLR의 dataimport 핸들러?
내가 설정을 하둡의 상단에 nutch 크롤러 있습니다. 다음은 각각의 버전과 소프트웨어 스택이다. 아파치 nutch-2.3.1, HBase를-0.98.8-hadoop2 모두 하둡-2.5.2의 상단에. HBase를에 데이터를 삽입까지 모든 과정이 잘 작동된다. 내가 클래스를 사용하여 IndexingJob를 호출하려고 org.apache.nutch.indexer.IndexingJob 명령이 성공적으로 실행하지만 기록이 SOLR에 인덱스 상태가 정상 때 문제입니다. SOLR 버전은 SOLR-5.3.1입니다.
아래는 내가 실행 한 명령의 출력은 다음과 같습니다
15/12/15 18:26:32 INFO mapreduce.Job: Running job: job_1450175405767_0007
15/12/15 18:26:43 INFO mapreduce.Job: Job job_1450175405767_0007 running in uber mode : false
15/12/15 18:26:43 INFO mapreduce.Job: map 0% reduce 0%
15/12/15 18:28:00 INFO mapreduce.Job: map 50% reduce 0%
15/12/15 18:28:22 INFO mapreduce.Job: map 100% reduce 0%
15/12/15 18:28:22 INFO mapreduce.Job: Job job_1450175405767_0007 completed successfully
15/12/15 18:28:23 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=230132
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1324
HDFS: Number of bytes written=0
HDFS: Number of read operations=2
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Killed map tasks=1
Launched map tasks=3
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=192484
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=192484
Total vcore-seconds taken by all map tasks=192484
Total megabyte-seconds taken by all map tasks=197103616
Map-Reduce Framework
Map input records=3312819
Map output records=0
Input split bytes=1324
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1678
CPU time spent (ms)=62560
Physical memory (bytes) snapshot=406765568
Virtual memory (bytes) snapshot=3877060608
Total committed heap usage (bytes)=239075328
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/12/15 18:28:23 INFO indexer.IndexWriters: Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
15/12/15 18:28:23 INFO indexer.IndexingJob: Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : username for authentication
solr.auth.password : password for authentication
15/12/15 18:28:23 INFO conf.Configuration: found resource solrindex-mapping.xml at file:/tmp/hadoop-root/hadoop-unjar491190780945254030/solrindex-mapping.xml
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: content dest: content
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: title dest: title
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: host dest: host
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: batchId dest: batchId
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: boost dest: boost
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: digest dest: digest
15/12/15 18:28:23 INFO solr.SolrMappingReader: source: tstamp dest: tstamp
15/12/15 18:28:23 INFO solr.SolrIndexWriter: Total 0 document is added.
15/12/15 18:28:23 INFO indexer.IndexingJob: IndexingJob: done.
해결법
from https://stackoverflow.com/questions/34290214/nutch-solr-dataimport-handler by cc-by-sa and MIT license
'HADOOP' 카테고리의 다른 글
[HADOOP] 하둡 멀티 노드 클러스터에 start-dfs.sh을 시작할 수 없습니다 (0) | 2019.09.28 |
---|---|
[HADOOP] 하둡 HDFS 오류로 로컬 데이터를 복사 (0) | 2019.09.28 |
[HADOOP] 실 작업에 스파크가 ExitCode를 실패했습니다 : 1 및 stderr은 "주 클래스를 찾을 수 없습니다"라는 (0) | 2019.09.28 |
[HADOOP] java.lang.UnsupportedOperationException가 : FileSystem.get 동안 DistributedFileSystem 파일 시스템 구현에 의해 구현되지 않음 () (0) | 2019.09.28 |
[HADOOP] 구성 객체에서 하이브 테이블 쿼리 오류 (0) | 2019.09.28 |