복붙노트

[HADOOP] 하이브에서 XML 태그 반복 구문 분석

HADOOP

하이브에서 XML 태그 반복 구문 분석

hivexmlserde를 사용하여 xml 파일을 구문 분석합니다. 내 XML에서 반복 된 태그를 파싱하고 배열 으로 저장하고 있습니다. 내가 얻는 결과는 아래와 같습니다.

["completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed","completed"]   ["10160-0"] ["20140403","20151207","20160313","20101225","20100420","20110208","20100419","20110310","20100412","20120130","20110729"]  ["20160306","20110822","20110822","20110822","20110321","20110608","20110822","20120326","20110822"]    ["24","12","24","24","7","24","8","8","7","24","24","24","24","6"]  ["h","h","h","h","d","h","h","h","d","h","h","h","h","h"]

그 결과가 맘에 들게하고 싶습니다.

---------------------------------------------------------------------------
|  status code |code     | startTime|endTime |strengthValue |strengthUnits |
---------------------------------------------------------------------------
|    completed | 10160-0 | 20140403 | 20160306 | 24         | h            |
|    completed | 10160-0 | 20151207 | 20110822 | 12         | h            |
|    completed | 10160-0 | 20160313 | 20120326 | 24         | h            |
|    completed | 10160-0 | 20100412 | 20110608 | 24         | h            |
|    completed | 10160-0 | 20110310 | 20110822 | 7          | d            |
|    completed | 10160-0 | 20110822 | 20110822 | 8          | h            |
----------------------------------------------------------------------------

hive xml ser de를 사용하여 이것을 달성하는 방법을 알려주십시오.

최신 정보:

견본:

<document>
 <code>10160-0</code>
 <entryInfo> 
    <statusCode>completed</statusCode>
    <startTime>20110729</startTime>
    <endTime>20110822</endTime>
    <strengthValue>24</strengthValue>
    <strengthUnits>h/strengthUnits>
 </entryInfo> 
 <entryInfo>
    <statusCode>completed</statusCode>
    <startTime>20120130</startTime>
    <endTime>20120326</endTime>
    <strengthValue>12</strengthValue>
    <strengthUnits>h</strengthUnits>
 </entryInfo>
 <entryinfo>
    <statusCode>completed</statusCode>
    <startTime>20100412</startTime>
    <endTime>20110822</endTime>
    <strengthValue>8</strengthValue>
    <strengthUnits>d</strengthUnits>
 </entryinfo>  
</document>

해결법

    from https://stackoverflow.com/questions/41462410/parse-repeating-xml-tags-in-hive by cc-by-sa and MIT license