카테고리의 큰 불확정 번호 교차

내 진짜 문제는 안티 바이러스 제품의 매우 많은 수의 주어진 샘플이 주어진 안티 바이러스 제품군의 구성원 동의 녹음과 관련이있다. 데이터베이스는 각 샘플에 투표 안티 바이러스 제품의 수만 샘플의 수백만을 가지고 있습니다. 내가 좋아하는 쿼리를 물어보고 싶은 "이름이 가장 많은 표하고있는 업체가 투표를했다 샘플 'XYZ'를 포함하는 악성 코드가?" 결과를 같이 얻을 :

"BadBadVirus"  
                     V1  V2  V3  V4  V5  V6  V7  
Sample 1 - 4 votes    1   0   1   0   0   1   1      
Sample 2 - 5 votes    1   0   1   0   1   1   1   
Sample 3 - 5 votes    1   0   1   0   1   1   1  

 total     14         3       3       2   3   3

어떤 공급 업체 2 및 공급 업체 (4) 중 하나 방법을 모르는 것을 말해 사용될 수 이 악성 코드를 탐지하거나 그것을 뭔가 다른 이름을 것으로합니다.

나는 희망이 나를 도울 수있는 능력을 파괴하지 않으면 서 약간 내 질문을 일반화하려고하는거야. 나는 다섯 개 사진 (P1, P2, P3, P4, P5)을보고 사진의 "주제"가 무엇인지 결정하는 요청을받은 오 유권자 (알렉스, 밥, 캐롤, 데이브 에드)가 있다고 가정합니다. 우리의 예를 들어, 우리는 단지 그들이 "고양이", "개"또는 "말"로 제한되었다 가정합니다. 아니 모든 유권자는 모든 일에 투표.

데이터는이 형태로 데이터베이스에 :

Photo, Voter, Decision
(1, 'Alex', 'Cat')
(1, 'Bob', 'Dog')
(1, 'Carol', 'Cat')
(1, 'Dave', 'Cat')
(1, 'Ed', 'Cat')
(2, 'Alex', 'Cat')
(2, 'Bob', 'Dog')
(2, 'Carol', 'Cat')
(2, 'Dave', 'Cat')
(2, 'Ed', 'Dog')
(3, 'Alex', 'Horse')
(3, 'Bob', 'Horse')
(3, 'Carol', 'Dog')
(3, 'Dave', 'Horse')
(3, 'Ed', 'Horse')
(4, 'Alex', 'Horse')
(4, 'Bob', 'Horse')
(4, 'Carol', 'Cat')
(4, 'Dave', 'Horse')
(4, 'Ed', 'Horse')
(5, 'Alex', 'Dog')
(5, 'Bob', 'Cat')
(5, 'Carol', 'Cat')
(5, 'Dave', 'Cat')
(5, 'Ed', 'Cat')

목적은 우리가 찾고있는 사진 주제 주어진, 우리는 많은 유권자가 유권자가 생각 목록은 또한 사진의 주요 지점 WAS 있지만, 생각하는 방법을 알고 싶습니다 것입니다.

Query for: "Cat"
      Total  Alex  Bob Carol Dave Ed
1 -     4      1    0    1     1   1
2 -     3      1    0    1     1   0 
3 -     0      0    0    0     0   0 
4 -     1      0    0    1     0   0 
5 -     4      0    1    1     1   1
------------------------------------
total  12      2    1    4     3   2 

Query for: "Dog"
      Total  Alex  Bob Carol Dave Ed
1 -     1     0      1   0    0    0
2 -     2     0      1   0    0    1
3 -     1     0      0   1    0    0 
4 -     0     0      0   0    0    0 
5 -     1     1      0   0    0    0 
------------------------------------
total   5     1      2   1    0    1

그게 내가이 저장된 것을 나는 형식의 데이터로 할 수있는 일입니까?

나는 그것을 수행하는 쿼리를 얻기 어려움을 겪고있어 - 그것은 밖으로 데이터를 덤프하고 그 작업을 수행하는 프로그램을 작성하는 간단한 정도는 비록, 내가 정말 좋아 내가 할 수있는 경우에 데이터베이스에 그것을 할 수있을 것입니다.

어떤 제안을 주셔서 감사합니다.

해결법

==============================

1.

create table vote (Photo integer, Voter text, Decision text);
insert into vote values
(1, 'Alex', 'Cat'),
(1, 'Bob', 'Dog'),
(1, 'Carol', 'Cat'),
(1, 'Dave', 'Cat'),
(1, 'Ed', 'Cat'),
(2, 'Alex', 'Cat'),
(2, 'Bob', 'Dog'),
(2, 'Carol', 'Cat'),
(2, 'Dave', 'Cat'),
(2, 'Ed', 'Dog'),
(3, 'Alex', 'Horse'),
(3, 'Bob', 'Horse'),
(3, 'Carol', 'Dog'),
(3, 'Dave', 'Horse'),
(3, 'Ed', 'Horse'),
(4, 'Alex', 'Horse'),
(4, 'Bob', 'Horse'),
(4, 'Carol', 'Cat'),
(4, 'Dave', 'Horse'),
(4, 'Ed', 'Horse'),
(5, 'Alex', 'Dog'),
(5, 'Bob', 'Cat'),
(5, 'Carol', 'Cat'),
(5, 'Dave', 'Cat'),
(5, 'Ed', 'Cat')
;

고양이에 대한 쿼리 :

select photo,
    alex + bob + carol + dave + ed as Total,
    alex, bob, carol, dave, ed
from crosstab($$
    select
        photo, voter,
        case decision when 'Cat' then 1 else 0 end
    from vote
    order by photo
    $$,'
    select distinct voter
    from vote
    order by voter
    '
) as (
    photo integer,
    Alex integer,
    Bob integer,
    Carol integer,
    Dave integer,
    Ed integer
);
 photo | total | alex | bob | carol | dave | ed 
-------+-------+------+-----+-------+------+----
     1 |     4 |    1 |   0 |     1 |    1 |  1
     2 |     3 |    1 |   0 |     1 |    1 |  0
     3 |     0 |    0 |   0 |     0 |    0 |  0
     4 |     1 |    0 |   0 |     1 |    0 |  0
     5 |     4 |    0 |   1 |     1 |    1 |  1

유권자의 수는 큰 여부는 다음 알려진 경우 동적으로 수행 할 수 있습니다 :

do $do$
declare
voter_list text;
r record;
begin

drop table if exists pivot;

voter_list := (
    select string_agg(distinct voter, ' ' order by voter) from vote
    );

execute(format('
    create table pivot (
        decision text,
        photo integer,
        Total integer,
        %1$s
    )', (replace(voter_list, ' ', ' integer, ') || ' integer')
));

for r in
select distinct decision from vote
loop
    execute (format($f$
        insert into pivot
        select
            %3$L as decision,
            photo,
            %1$s as Total,
            %2$s
        from crosstab($ct$
            select
                photo, voter,
                case decision when %3$L then 1 else 0 end
            from vote
            order by photo
            $ct$,$ct$
            select distinct voter
            from vote
            order by voter
            $ct$
        ) as (
            photo integer,
            %4$s
        );$f$,
        replace(voter_list, ' ', ' + '),
        replace(voter_list, ' ', ', '),
        r.decision,
        replace(voter_list, ' ', ' integer, ') || ' integer'
    ));
end loop;
end; $do$;

위의 코드는 모든 결정과 테이블 피벗을 만들어 :

select * from pivot where decision = 'Cat';

==============================
2.당신의 소원은 데이터의 일부를 transfering (이름)를 의미한다 열 제목, 생성 된 테이블, 즉 스키마로. 이 불편하고 불가능한 사이 어딘가에이기 때문에, 나는 단지 정렬 및 SQL에서 데이터를 합산 추천 것 데이터베이스의 나머지 외부를하고.

당신의 소원은 데이터의 일부를 transfering (이름)를 의미한다 열 제목, 생성 된 테이블, 즉 스키마로. 이 불편하고 불가능한 사이 어딘가에이기 때문에, 나는 단지 정렬 및 SQL에서 데이터를 합산 추천 것 데이터베이스의 나머지 외부를하고.
```
SELECT Photo, Voter
FROM data
WHERE Decision = '...'
ORDER BY Photo, Voter
```
과
```
SELECT Photo, COUNT(*) AS Total
FROM data
WHERE Decision = '...'
GROUP BY Photo
ORDER BY Photo;
```

==============================

3.( "... 표 투표를 만드는") 클로도 알도와 같은 샘플 데이터를 사용하고 plpythonu 기능 make_pivot_table (아래)를 사용하여, 당신은 실행할 수 있습니다 :

( "... 표 투표를 만드는") 클로도 알도와 같은 샘플 데이터를 사용하고 plpythonu 기능 make_pivot_table (아래)를 사용하여, 당신은 실행할 수 있습니다 :

create temp table pivot_data on commit drop as 
    select * from vote where decision = 'Cat' union select photo, null, null from vote;

select * from make_pivot_table('{photo}', 'voter',  'decision', 'count', 'pivot_data',
  'pivot_result', false);

select * from pivot_result order by photo;

make_pivot_table 함수 정의는 다음과 같습니다

-- make_pivot_table
-- python version 0.9
-- last edited 2015-08-11 

create or replace function
 make_pivot_table(row_headers text[], category_field text, value_field text,
  value_action text, input_table text, output_table text, keep_result boolean)
returns void as
$$
# imports
from collections import defaultdict
import operator
import string

# constants
BATCH_SIZE = 100
VALID_ACTIONS = ('count', 'sum', 'min', 'max')
NULL_CATEGORY_NAME = 'NULL_CATEGORY'
TOTAL_COL = 'total'

# functions
def table_exists(tablename):
    plan = plpy.prepare("""select table_schema, table_name from
        information_schema.Tables where table_schema not in ('information_schema',
        'pg_catalog') and table_name = $1""", ["text"])
    rows = plpy.execute(plan, [input_table], 2)
    return bool(rows)

def make_rowkey(row):
    return tuple([row[header] for header in row_headers])

def quote_if_needed(value):
    return plpy.quote_literal(value) if isinstance(value, basestring) else str(value)

# assumes None is never a value in the dct
def update_if(dct, key, new_value, op, result=True):
    current_value = dct.get(key)
    if current_value is None or op(value, current_value) == result:
        dct[key] = new_value

def update_output_table(output_table, row_headers, colname, value):
    pg_value = plpy.quote_literal(value) if isinstance(value, basestring) else value
    sql = 'update %s set %s = %s where ' % (output_table, plpy.quote_ident(colname), 
                                            pg_value)
    conditions = []
    for index, row_header in enumerate(row_headers):
        conditions.append('%s = %s' % (plpy.quote_ident(row_header),
                                       quote_if_needed(rowkey[index])))
    sql += ' and '.join(conditions)
    plpy.execute(sql)


# -----------------

if not table_exists(input_table):
    plpy.error('input_table %s dones not exist' % input_table)

if value_action not in VALID_ACTIONS:
    plpy.error('%s is not a recognised action' % value_action)

# load the data into a dict
count_dict = defaultdict(int)
sum_dict = defaultdict(float)
total_dict = defaultdict(float)
min_dict = dict()
max_dict = dict()
categories_seen = set()
rowkeys_seen = set()
do_total = value_action in ('count', 'sum')

cursor = plpy.cursor('select * from %s' % plpy.quote_ident(input_table))
while True:
    rows = cursor.fetch(BATCH_SIZE)
    if not rows:
        break
    for row in rows:
        rowkey = make_rowkey(row)
        rowkeys_seen.add(rowkey)
        category = row[category_field]           
        value = row[value_field]
        dctkey = (rowkey, category)

        # skip if value field is null
        if value is None:
            continue

        categories_seen.add(category)

        if value_action == 'count':
        count_dict[dctkey] += 1
        total_dict[rowkey] += 1
    if value_action == 'sum':
            sum_dict[dctkey] += value
            total_dict[rowkey] += value
        if value_action == 'min':
            update_if(min_dict, dctkey, value, operator.lt)
        if value_action == 'max':
            update_if(max_dict, dctkey, value, operator.gt)

plpy.notice('seen %s summary rows and %s categories' % (len(rowkeys_seen),
                                                        len(categories_seen)))

# get the columns types
coltype_dict = dict()
input_type_sql = 'select * from %s where false' % plpy.quote_ident(input_table)
input_type_result = plpy.execute(input_type_sql)
for index, colname in enumerate(input_type_result.colnames()):
    coltype_num = input_type_result.coltypes()[index]
    coltype_sql = 'select typname from pg_type where oid = %s' % coltype_num
    coltype = list(plpy.cursor(coltype_sql))[0]
    plpy.notice('%s: %s' % (colname, coltype['typname']))
    coltype_dict[colname] = coltype['typname']

plpy.execute('drop table if exists %s' % plpy.quote_ident(output_table))
sql_parts = []
if keep_result:
    sql_parts.append('create table %s (' % plpy.quote_ident(output_table))
else:
    sql_parts.append('create temp table %s (' % plpy.quote_ident(output_table))

cols = []
for row_header in row_headers:
    cols.append('%s %s' % (plpy.quote_ident(row_header), coltype_dict[row_header]))

cat_type = 'bigint' if value_action == 'count' else coltype_dict[value_field]

for col in sorted(categories_seen):
    if col is None:
        cols.append('%s %s' % (plpy.quote_ident(NULL_CATEGORY_NAME), cat_type))
    else:
        cols.append('%s %s' % (plpy.quote_ident(col), cat_type))

if do_total:
    cols.append('%s %s' % (TOTAL_COL, cat_type))

sql_parts.append(',\n'.join(cols))
if keep_result:
    sql_parts.append(')')
else:
    sql_parts.append(') on commit drop')
plpy.execute('\n'.join(sql_parts))

dict_map = {'count': count_dict, 'sum': sum_dict, 'min': min_dict, 'max': max_dict }
value_dict = dict_map[value_action]
for rowkey in rowkeys_seen:
    sql = 'insert into %s values (' % plpy.quote_ident(output_table)
    sql += ', '.join([quote_if_needed(part) for part in rowkey])
    sql += ')'
    plpy.execute(sql)

if do_total:
    for rowkey, value in total_dict.iteritems():
        update_output_table(output_table, row_headers, TOTAL_COL, value)

for (rowkey, category), value in value_dict.iteritems():
    # put in cateogry value
    colname = NULL_CATEGORY_NAME if category is None else category
    update_output_table(output_table, row_headers, colname, value)

$$ language plpythonu

from https://stackoverflow.com/questions/12988575/crosstab-with-a-large-or-undefined-number-of-categories by cc-by-sa and MIT license

'SQL' 카테고리의 다른 글

[SQL] 어떻게 교리 andWhere 및 orWhere를 사용 하는가? (0)	2020.07.04
[SQL] 약한 개체에 대한 데이터베이스 모델링 (0)	2020.07.04
[SQL] 실제 HTML 트리로 중첩 된 세트에서 모든 레코드를 렌더링하는 방법 (0)	2020.07.04
[SQL] 엔티티 프레임 워크는 2100 매개 변수 제한을 명중 (0)	2020.07.04
[SQL] SQL에서 선택 특정 행 번호 [중복] (0)	2020.07.04

복붙노트

[SQL] 카테고리의 큰 불확정 번호 교차

카테고리의 큰 불확정 번호 교차

해결법

1.

3.( "... 표 투표를 만드는") 클로도 알도와 같은 샘플 데이터를 사용하고 plpythonu 기능 make_pivot_table (아래)를 사용하여, 당신은 실행할 수 있습니다 :

'SQL' 카테고리의 다른 글

티스토리툴바