이 쿼리에서 파티션 제거가 발생하지 않는 이유는 무엇입니까?

나는 년, 월, 일,시 단위로 나누어 진 하이브 테이블을 가지고있다. 마지막 7 일 데이터를 가져 오기 위해 쿼리를 실행해야합니다. 이것은 하이브 0.14.0.2.2.4.2-2에 있습니다. 내 쿼리는 현재 다음과 같습니다.

SELECT COUNT(column_name) from table_name 
where year >= year(date_sub(from_unixtime(unix_timestamp()), 7)) 
AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) 
AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7));

이것은 매우 오랜 시간이 걸립니다. 위의 실제 숫자를 다음과 같이 대체하십시오.

SELECT COUNT(column_name) from table_name 
where year >= 2017
AND month >= 2
AND day >= 13

그것은 몇 분 안에 끝납니다. 위의 스크립트를 변경하여 실제로 함수 대신 쿼리에 숫자가 포함되도록하는 방법이 있습니까?

나는 다음과 같이 설정을 시도했다.

set yearLimit = year(date_sub(from_unixtime(unix_timestamp()), 7));

SELECT COUNT(column_name) from table_name 
where year >= ${hiveconf:yearLimit}
AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) 
AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7));

그러나 이것은 문제를 해결하지 못합니다.

해결법

==============================

1.

select      count (column_name) 

from        table_name 

where       year  >= year  (date_sub (current_date,7)) 
        and month >= month (date_sub (current_date,7)) 
        and day   >= day   (date_sub (current_date,7))
;

(나는 문서를 약간 변경했다 :-))

unix_timestamp () 값은 실행 중에 변경 될 수 있으므로 표현식을 각 행에 대해 평가해야하므로 파티션 제거가 방지됩니다.

세트는 텍스트 대체 메커니즘 일뿐입니다. 세트 중에 아무 것도 계산되지 않습니다. 유일한 일은 변수에 텍스트가 할당된다는 것입니다. 쿼리가 실행되기 전에 변수 place holder ($ {hiveconf : ...})가 할당 된 텍스트로 대체됩니다. 그래야만 쿼리가 파싱되고 실행됩니다.

hive> set a=sele;
hive> set b=ct 1+;
hive> set c=1;
hive> ${hiveconf:a}${hiveconf:b}${hiveconf:c};
OK
2

create table table_name (column_name int) partitioned by (year int,month int,day int);

set hive.exec.dynamic.partition.mode=nonstrict;


insert into table_name partition (year,month,day) 

select  pos
       ,year(dt)
       ,month(dt)
       ,day(dt) 

from   (select  pe.pos
               ,date_sub (current_date,pe.pos) as dt

        from    (select 1) x 
                lateral view posexplode (split (space (99),' ')) pe
        ) t
;

explain dependency

select      count (column_name) 

from        table_name 

where       year  >= year  (date_sub (from_unixtime (unix_timestamp ()),7)) 
        and month >= month (date_sub (from_unixtime (unix_timestamp ()),7)) 
        and day   >= day   (date_sub (from_unixtime (unix_timestamp ()),7))
;

explain dependency

select      count (column_name) 

from        table_name 

where       year  >= year  (date_sub (current_date,7)) 
        and month >= month (date_sub (current_date,7)) 
        and day   >= day   (date_sub (current_date,7))
;

from https://stackoverflow.com/questions/42376268/why-partitions-elimination-does-not-happen-for-this-query by cc-by-sa and MIT license

'HADOOP' 카테고리의 다른 글

[HADOOP] Hadoop이 "비정상적인 노드 로컬 디렉토리 및 로그 디렉토리가 좋지 않습니다"라고보고하는 이유는 무엇입니까? (0)	2019.06.05
[HADOOP] Windows 10에서 Apache Zeppelin을 구성하는 동안 오류가 발생했습니다. (0)	2019.06.05
[HADOOP] 하이브의 암시 적 조인은 항상 내부 조인입니까? (0)	2019.06.05
[HADOOP] Hive 0.13에서 테이블을 업데이트하는 방법은 무엇입니까? (0)	2019.06.05
[HADOOP] Hadoop - 분산 캐시에있는 큰 파일 (0)	2019.06.05

복붙노트

[HADOOP] 이 쿼리에서 파티션 제거가 발생하지 않는 이유는 무엇입니까?

이 쿼리에서 파티션 제거가 발생하지 않는 이유는 무엇입니까?

해결법

1.

'HADOOP' 카테고리의 다른 글

티스토리툴바