복붙노트

[HADOOP] 로그없이 하둡 NodeManager 종료

HADOOP

로그없이 하둡 NodeManager 종료

내 하둡 노드 관리자 프로세스가 아무 이유와 충돌 아무런 로그가되지 않습니다 발견했다.

1. 서버 정보

레드햇 엔터프라이즈 리눅스 서버 7.2 (Maipo)를 해제

하둡 2.8.0

네임 노드 32 개 코어 인텔 (R) 제온 (R)의 CPU E5-2650 V2에서 @의 2.60GHz, 64G 램, 2 개 네임 노드에 대한 노드와 ResourceManager에 / 하이브 1

datatanode 24 개 코어 인텔 (R) 제온 (R)의 CPU E5-2620 V2에서 @의 2.10GHz, 128G 램, 총 5 개 데이타 노드

NodeManager의 conf :

2. 충돌 않습니다

내가 하이브에서 이러한 SQL을 실행하면, 일부 데이타 노드는 이유인지 모르겠어요, 충돌이 발생할 수 있지만, 충돌 타이밍은 같은 시간에 보인다.

select max(ts) from beacon where year = 2017 and month = 6 and day >= 21

delete from beacon where ts = 1498629599829

insert into table beacon partition(year,month,day) select type,page,name,ts,year,month,day from beacon_txt where ts >= 1498629599829

3. 충돌 할 때 증상이 보여 무엇을

1) 프로세스 nodemanager 사라

얀 로그 2) 예외는 아래 끝에 데이터 노드 로그 니펫

2017-06-28 17:11:20,189 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://avatarcluster/user/hadoop/.hiveJars/hive-exec-2.1.1-5f4a7e952d29bb8013edd30bbc39476ec56bc381b96b0530a6b2fbbf28e309d3.jar(->/home/hadoop/hadoop-data/hadoop-tmp-data/nm-local-dir/usercache/hadoop/filecache/10/hive-exec-2.1.1-5f4a7e952d29bb8013edd30bbc39476ec56bc381b96b0530a6b2fbbf28e309d3.jar) transitioned from DOWNLOADING to LOCALIZED

2017-06-28 17:11:20,701 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://avatarcluster/apps/tez-0.8.5.tar.gz(->/home/hadoop/hadoop-data/hadoop-tmp-data/nm-local-dir/filecache/10/tez-0.8.5.tar.gz) transitioned from DOWNLOADING to LOCALIZED

2017-06-28 17:11:20,703 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1496722904961_18405_01_000006 transitioned from LOCALIZING to LOCALIZED

2017-06-28 17:11:20,703 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1496722904961_18405_01_000011 transitioned from LOCALIZING to LOCALIZED

2017-06-28 17:11:20,740 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1496722904961_18405_01_000006 transitioned from LOCALIZED to RUNNING

2017-06-28 17:11:20,740 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1496722904961_18405_01_000011 transitioned from LOCALIZED to RUNNING

2017-06-28 17:11:20,744 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /home/hadoop/hadoop-data/hadoop-tmp-data/nm-local-dir/usercache/hadoop/appcache/application_1496722904961_18405/container_1496722904961_18405_01_000011/default_container_executor.sh]

2017-06-28 17:11:20,744 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /home/hadoop/hadoop-data/hadoop-tmp-data/nm-local-dir/usercache/hadoop/appcache/application_1496722904961_18405/container_1496722904961_18405_01_000006/default_container_executor.sh]

일부 로그는 / var에 / 로그 / 메시지에 표시

65270 Jun 28 17:11:11 hdp06 systemd: Removed slice user-0.slice.
65271 Jun 28 17:11:11 hdp06 systemd: Stopping user-0.slice.
65272 Jun 28 17:11:33 hdp06 abrt-server: Executable '/opt/jdk1.8.0_71/bin/java' doesn'      t belong to any package and ProcessUnpackaged is set to 'no'
65273 Jun 28 17:11:33 hdp06 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2017-0      6-28-17:11:20-375612' exited with 1
65274 Jun 28 17:11:33 hdp06 abrt-server: Deleting problem directory '/var/spool/abrt/c      cpp-2017-06-28-17:11:20-375612'
65275 Jun 28 17:11:43 hdp06 systemd-logind: Removed session 206241.
65276 Jun 28 17:12:01 hdp06 systemd: Created slice user-0.slice.
65277 Jun 28 17:12:01 hdp06 systemd: Starting user-0.slice.

자바는 일반적으로 추락하거나 종료 것 같다,하지만 로그가 없습니다.

4. 코어 덤프

나는 yarn-env.sh에 ABRT - 자바 커넥터 옵션을 추가

24 : 28-63796 그리고는 / var / 스풀 / ABRT / 복합 화력 발전소-2017-07-03-14에서 일부 충돌 로그 파일을 만듭니다

-rw-r----- 1 root abrt          6 Jul  3 14:24 abrt_version
-rw-r----- 1 root abrt          4 Jul  3 14:24 analyzer
-rw-r----- 1 root abrt          6 Jul  3 14:24 architecture
-rw-r----- 1 root abrt        178 Jul  3 14:24 cgroup
-rw-r----- 1 root abrt       1974 Jul  3 14:24 cmdline
-rw-r----- 1 root abrt     380795 Jul  3 14:25 core_backtrace
-rw-r----- 1 root abrt 4887654400 Jul  3 14:24 coredump
-rw-r----- 1 root abrt          1 Jul  3 14:25 count
-rw-r----- 1 root abrt       1072 Jul  3 14:25 dso_list
-rw-r----- 1 root abrt       3318 Jul  3 14:24 environ
-rw-r----- 1 root abrt          0 Jul  3 14:25 event_log
-rw-r----- 1 root abrt         26 Jul  3 14:24 executable
-rw-r----- 1 root abrt         82 Jul  3 14:25 exploitable
-rw-r----- 1 root abrt          5 Jul  3 14:24 global_pid
-rw-r----- 1 root abrt         25 Jul  3 14:24 hostname
-rw-r----- 1 root abrt         21 Jul  3 14:24 kernel
-rw-r----- 1 root abrt         10 Jul  3 14:24 last_occurrence
-rw-r----- 1 root abrt       1323 Jul  3 14:24 limits
-rw-r----- 1 root abrt        135 Jul  3 14:25 machineid
-rw-r----- 1 root abrt      60706 Jul  3 14:24 maps
-rw-r----- 1 root abrt        243 Jul  3 14:24 open_fds
-rw-r----- 1 root abrt        495 Jul  3 14:24 os_info
-rw-r----- 1 root abrt         51 Jul  3 14:24 os_release
-rw-r----- 1 root abrt          5 Jul  3 14:24 pid
-rw-r----- 1 root abrt       1137 Jul  3 14:24 proc_pid_status
-rw-r----- 1 root abrt        149 Jul  3 14:24 pwd
-rw-r----- 1 root abrt         22 Jul  3 14:24 reason
-rw-r----- 1 root abrt          4 Jul  3 14:24 runlevel
-rw-r----- 1 root abrt   10746600 Jul  3 14:25 sosreport.tar.xz
-rw-r----- 1 root abrt         10 Jul  3 14:24 time
-rw-r----- 1 root abrt          4 Jul  3 14:24 type
-rw-r----- 1 root abrt          9 Jul  3 14:24 uid
-rw-r----- 1 root abrt          7 Jul  3 14:24 username
-rw-r----- 1 root abrt         40 Jul  3 14:25 uuid
-rw-r----- 1 root abrt     185370 Jul  3 14:25 var_log_messages

은 "이유"파일 쇼 "SIGSEGV에 의해 살해 자바"

내가 "GDB / 옵션 / JDK / 빈 / 자바 코어 덤프"를 실행하면, 그것은 보여줍니다

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/jdk1.8.0_131/bin/java...Missing separate debuginfo for /opt/jdk1.8.0_131/bin/java
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/c9/0f19ee0af98c47ccaa7181853cfd14867bc931.debug
(no debugging symbols found)...done.
[New LWP 63796]
[New LWP 62554]
[New LWP 62605]
[New LWP 62604]
[New LWP 62606]
[New LWP 62575]
[New LWP 62603]
[New LWP 62607]
[New LWP 62576]
[New LWP 62610]
[New LWP 62668]
[New LWP 62708]
[New LWP 62574]
[New LWP 63790]
[New LWP 63717]
[New LWP 63739]
[New LWP 62580]
[New LWP 62579]
[New LWP 63738]
[New LWP 62592]
[New LWP 62577]
[New LWP 62583]
[New LWP 63740]
[New LWP 62581]
[New LWP 62688]
[New LWP 62587]
[New LWP 62597]
[New LWP 63744]
[New LWP 63737]
[New LWP 62591]
[New LWP 62578]
[New LWP 62582]
[New LWP 62758]
[New LWP 62573]
[New LWP 62626]
[New LWP 63720]
[New LWP 63719]
[New LWP 63726]
[New LWP 63727]
[New LWP 62598]
[New LWP 62774]
[New LWP 62705]
[New LWP 62614]
[New LWP 62703]
[New LWP 62593]
[New LWP 62720]
[New LWP 62590]
[New LWP 62690]
[New LWP 63731]
[New LWP 63810]
[New LWP 63724]
[New LWP 62585]
[New LWP 62753]
[New LWP 62682]
[New LWP 62709]
[New LWP 62684]
[New LWP 62773]
[New LWP 62588]
[New LWP 63722]
[New LWP 62595]
[New LWP 62734]
[New LWP 62616]
[New LWP 62728]
[New LWP 62721]
[New LWP 62689]
[New LWP 62769]
[New LWP 62659]
[New LWP 63743]
[New LWP 62726]
[New LWP 62680]
[New LWP 62704]
[New LWP 62750]
[New LWP 63759]
[New LWP 62594]
[New LWP 63791]
[New LWP 62768]
[New LWP 62600]
[New LWP 63741]
[New LWP 62613]
[New LWP 63718]
[New LWP 62710]
[New LWP 62589]
[New LWP 62731]
[New LWP 63735]
[New LWP 62683]
[New LWP 62760]
[New LWP 63801]
[New LWP 62776]
[New LWP 62678]
[New LWP 62615]
[New LWP 62685]
[New LWP 62737]
[New LWP 62599]
[New LWP 63742]
[New LWP 63808]
[New LWP 62755]
[New LWP 62707]
[New LWP 62694]
[New LWP 63729]
[New LWP 63755]
[New LWP 62711]
[New LWP 63725]
[New LWP 63732]
[New LWP 62745]
[New LWP 62596]
[New LWP 62608]
[New LWP 62735]
[New LWP 63721]
[New LWP 62748]
[New LWP 62736]
[New LWP 62712]
[New LWP 63756]
[New LWP 63793]
[New LWP 63787]
[New LWP 63803]
[New LWP 62602]
[New LWP 62743]
[New LWP 62733]
[New LWP 62742]
[New LWP 63710]
[New LWP 62744]
[New LWP 62677]
[New LWP 62739]
[New LWP 62713]
[New LWP 63789]
[New LWP 62601]
[New LWP 63812]
[New LWP 62725]
[New LWP 62724]
[New LWP 63709]
[New LWP 62718]
[New LWP 62759]
[New LWP 62686]
[New LWP 62715]
[New LWP 62740]
[New LWP 62655]
[New LWP 62749]
[New LWP 62722]
[New LWP 63708]
[New LWP 62716]
[New LWP 63800]
[New LWP 62687]
[New LWP 62723]
[New LWP 63733]
[New LWP 62609]
[New LWP 62738]
[New LWP 63707]
[New LWP 62719]
[New LWP 62714]
[New LWP 62691]
[New LWP 62780]
[New LWP 62625]
[New LWP 62778]
[New LWP 63788]
[New LWP 62717]
[New LWP 63802]
[New LWP 62681]
[New LWP 62692]
[New LWP 62730]
[New LWP 63736]
[New LWP 62679]
[New LWP 62693]
[New LWP 63728]
[New LWP 62697]
[New LWP 62729]
[New LWP 62746]
[New LWP 62698]
[New LWP 62747]
[New LWP 63734]
[New LWP 62727]
[New LWP 62695]
[New LWP 62675]
[New LWP 62676]
[New LWP 63711]
[New LWP 63713]
[New LWP 62699]
[New LWP 62752]
[New LWP 62700]
[New LWP 63723]
[New LWP 62706]
[New LWP 62756]
[New LWP 63706]
[New LWP 62702]
[New LWP 63751]
[New LWP 62658]
[New LWP 62779]
[New LWP 62754]
[New LWP 62771]
[New LWP 62701]
[New LWP 62751]
[New LWP 63730]
[New LWP 62612]
[New LWP 62696]
[New LWP 62611]
[New LWP 62757]
[New LWP 62761]
[New LWP 62732]
[New LWP 62772]
[New LWP 62741]
[New LWP 62777]
[New LWP 62775]
[New LWP 62770]
[New LWP 63792]
[New LWP 62586]
[New LWP 62584]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /opt/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/78/b091327a0bf6d146f8881f285955b4f7f2b712.debug
Missing separate debuginfo for /opt/jdk1.8.0_131/jre/lib/amd64/libverify.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/89/8659e6261ad966b4b638afc8e3dd214896253d.debug
Missing separate debuginfo for /opt/jdk1.8.0_131/jre/lib/amd64/libmanagement.so
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/88/c0c64eb685c329ad849281135cfe113f3812e8.debug
Core was generated by `/opt/jdk/bin/java -Dproc_nodemanager -Xmx4096m -Xms4g -Xmx4g -Xmn3g -server -XX'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f2f4460bfd7 in VMError::report_and_die() ()
   from /opt/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
Missing separate debuginfos, use: debuginfo-install glibc-2.17-105.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 sssd-client-1.13.0-40.el7.x86_64

그래서 가능한 이유는?

해결법

    from https://stackoverflow.com/questions/44800619/hadoop-nodemanager-exit-without-log by cc-by-sa and MIT license