hadoop에서 hdfs로 S3 복사하기

나는 Amazon EMR에서 마호 타고 벡터 라이징 작업을 성공적으로 마쳤다. 이제 HDFS에서 S3로 결과를 복사하려고합니다 (향후 클러스터링에서 사용하기 위해).

For that I've used hadoop distcp:

den@aws:~$ elastic-mapreduce --jar s3://elasticmapreduce/samples/distcp/distcp.jar \
> --arg hdfs://my.bucket/prj1/seqfiles \
> --arg s3n://ACCESS_KEY:SECRET_KEY@my.bucket/prj1/seqfiles \
> -j $JOBID

실패한. 그 제안을 발견 : 사용 s3distcp 시도해보십시오 :

elastic-mapreduce --jobflow $JOBID \
> --jar --arg s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar \
> --arg --s3Endpoint --arg 's3-eu-west-1.amazonaws.com' \
> --arg --src --arg 'hdfs://my.bucket/prj1/seqfiles' \
> --arg --dest --arg 's3://my.bucket/prj1/seqfiles'

두 경우 모두 동일한 오류가 있습니다 : java.net.UnknownHostException : 알 수없는 호스트 : my.bucket 두 번째 경우의 전체 오류 출력보다 낮습니다.

2012-09-06 13:25:08,209 FATAL com.amazon.external.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system
java.net.UnknownHostException: unknown host: my.bucket
    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1193)
    at org.apache.hadoop.ipc.Client.call(Client.java:1047)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:127)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:249)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:214)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1413)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:68)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1431)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256)
    at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:431)
    at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:216)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:12)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

해결법

==============================

1.버그를 발견했습니다.

버그를 발견했습니다.

그러나:

2012-09-06 13:27:33,909 FATAL com.amazon.external.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system

그래서. 소스 경로에 슬래시를 1 개 더 추가 한 후 문제없이 작업이 시작되었습니다. 올바른 명령은 다음과 같습니다.

elastic-mapreduce --jobflow $JOBID \
> --jar --arg s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar \
> --arg --s3Endpoint --arg 's3-eu-west-1.amazonaws.com' \
> --arg --src --arg 'hdfs:///my.bucket/prj1/seqfiles' \
> --arg --dest --arg 's3://my.bucket/prj1/seqfiles'

추신 그래서. 그것은 효과가있다. 작업이 올바르게 완료되었습니다. 30Gb 파일로 dir을 성공적으로 복사했습니다.

from https://stackoverflow.com/questions/12301613/hadoop-copying-from-hdfs-to-s3 by cc-by-sa and MIT license

'HADOOP' 카테고리의 다른 글

[HADOOP] --map-column-hive와 함께 Sqoop 가져 오기 명령을 사용하는 방법? (0)	2019.06.11
[HADOOP] 스파크 벡터와 스칼라 불변 벡터의 차이점은 무엇입니까? (0)	2019.06.11
[HADOOP] 여러 매퍼가있는 다중 입력 경로에 대해 oozie 워크 플로를 구성하는 방법 (0)	2019.06.11
[HADOOP] Kerberos에 액세스하는 것은 SPnego없이 WebHDFS를 보호했습니다. (0)	2019.06.11
[HADOOP] Hadoop의 TableInputFormat을 확장하여 타임 스탬프 키의 배포에 사용되는 접두어로 스캔합니다. (0)	2019.06.11

복붙노트

[HADOOP] hadoop에서 hdfs로 S3 복사하기

hadoop에서 hdfs로 S3 복사하기

해결법

1.버그를 발견했습니다.

'HADOOP' 카테고리의 다른 글

티스토리툴바