[HADOOP] 하이브에서 XML 태그 반복 구문 분석
HADOOP하이브에서 XML 태그 반복 구문 분석
hivexmlserde를 사용하여 xml 파일을 구문 분석합니다. 내 XML에서 반복 된 태그를 파싱하고 배열
["completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed"] ["10160-0"] ["20140403","20151207","20160313","20101225","20100420","20110208","20100419","20110310","20100412","20120130","20110729"] ["20160306","20110822","20110822","20110822","20110321","20110608","20110822","20120326","20110822"] ["24","12","24","24","7","24","8","8","7","24","24","24","24","6"] ["h","h","h","h","d","h","h","h","d","h","h","h","h","h"]
그 결과가 맘에 들게하고 싶습니다.
---------------------------------------------------------------------------
| status code |code | startTime|endTime |strengthValue |strengthUnits |
---------------------------------------------------------------------------
| completed | 10160-0 | 20140403 | 20160306 | 24 | h |
| completed | 10160-0 | 20151207 | 20110822 | 12 | h |
| completed | 10160-0 | 20160313 | 20120326 | 24 | h |
| completed | 10160-0 | 20100412 | 20110608 | 24 | h |
| completed | 10160-0 | 20110310 | 20110822 | 7 | d |
| completed | 10160-0 | 20110822 | 20110822 | 8 | h |
----------------------------------------------------------------------------
hive xml ser de를 사용하여 이것을 달성하는 방법을 알려주십시오.
최신 정보:
견본:
<document>
<code>10160-0</code>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20110729</startTime>
<endTime>20110822</endTime>
<strengthValue>24</strengthValue>
<strengthUnits>h/strengthUnits>
</entryInfo>
<entryInfo>
<statusCode>completed</statusCode>
<startTime>20120130</startTime>
<endTime>20120326</endTime>
<strengthValue>12</strengthValue>
<strengthUnits>h</strengthUnits>
</entryInfo>
<entryinfo>
<statusCode>completed</statusCode>
<startTime>20100412</startTime>
<endTime>20110822</endTime>
<strengthValue>8</strengthValue>
<strengthUnits>d</strengthUnits>
</entryinfo>
</document>
해결법
from https://stackoverflow.com/questions/41462410/parse-repeating-xml-tags-in-hive by cc-by-sa and MIT license
'HADOOP' 카테고리의 다른 글
[HADOOP] mongo 's out과 동등 : hadoop의 옵션 감소 (0) | 2019.07.02 |
---|---|
[HADOOP] 아래 코드에서 객체가 어떻게 생성됩니까? (0) | 2019.07.02 |
[HADOOP] Google Bigquery : 호환되지 않는 테이블 분할 사양 (0) | 2019.07.02 |
[HADOOP] override SemanticException [오류 10001] (0) | 2019.07.02 |
[HADOOP] 이제 hadoop 및 spark가 IPv6을 지원합니까? (0) | 2019.07.02 |