복붙노트

[HADOOP] 동일한 테이블의 다른 배열 열과 관련하여 Hive 정렬 배열 열

HADOOP

동일한 테이블의 다른 배열 열과 관련하여 Hive 정렬 배열 열

col1 array 및 col2 array 으로 2 개의 열이있는 hive 테이블이 있습니다. 출력은 아래와 같습니다

col1                col2
[1,2,3,4,5]         [0.43,0.01,0.45,0.22,0.001]

이 col2를 오름차순으로 정렬하고 col1도 색인을 적절하게 변경해야합니다.

col1                col2
[5,2,4,3,1]        [0.001,0.01,0.22,0.43,0.45]

해결법

  1. ==============================

    1.두 배열을 분해하고 정렬 한 다음 배열을 다시 집계하십시오. collect_list 앞의 하위 쿼리에서 sort를 사용하여 배열을 정렬하십시오.

    두 배열을 분해하고 정렬 한 다음 배열을 다시 집계하십시오. collect_list 앞의 하위 쿼리에서 sort를 사용하여 배열을 정렬하십시오.

    with your_data as(
    select array(1,2,3,4,5) as col1,array(0.43,0.01,0.45,0.22,0.001)as col2
    )
    
    select original_col1,original_col2, collect_list(c1_x) as new_col1, collect_list(c2_x) as new_col2
    from
    (
    select d.col1 as original_col1,d.col2 as original_col2, c1.x as c1_x, c2.x as c2_x, c1.i as c1_i  
     from your_data d
          lateral view posexplode(col1) c1 as i,x
          lateral view posexplode(col2) c2 as i,x
    where c1.i=c2.i 
    distribute by original_col1,original_col2
    sort by c2_x
    )s
    group by original_col1,original_col2;
    

    결과:

    OK
    original_col1   original_col2                   new_col1        new_col2
    [1,2,3,4,5]     [0.43,0.01,0.45,0.22,0.001]     [5,2,4,1,3]     [0.001,0.01,0.22,0.43,0.45]
    Time taken: 34.642 seconds, Fetched: 1 row(s)
    
  2. from https://stackoverflow.com/questions/57389401/hive-sort-array-column-with-respect-to-other-array-column-in-same-table by cc-by-sa and MIT license