여러 연결 요청

파이썬 요청 라이브러리를 사용하여 큰 파일을 다운로드합니다 (예 :

r = requests.get("http://bigfile.com/bigfile.bin")
content = r.content

대용량 파일은 초당 +/- 30Kb로 다운로드됩니다. 약간 느립니다. 대용량 파일 서버에 대한 모든 연결이 제한되므로 다중 연결을 만들고 싶습니다.

하나의 파일을 다운로드하기 위해 동시에 여러 연결을 만드는 방법이 있습니까?

해결법

==============================
1.HTTP 범위 헤더를 사용하여 파일의 일부만 가져올 수 있습니다 (이미 여기에서 파이썬에 대해 다뤘습니다).

HTTP 범위 헤더를 사용하여 파일의 일부만 가져올 수 있습니다 (이미 여기에서 파이썬에 대해 다뤘습니다).

여러 스레드를 시작하고 각기 다른 범위를 가져 오면 완료됩니다.
```
def download(url,start):
    req = urllib2.Request('http://www.python.org/')
    req.headers['Range'] = 'bytes=%s-%s' % (start, start+chunk_size)
    f = urllib2.urlopen(req)
    parts[start] = f.read()

threads = []
parts = {}

# Initialize threads
for i in range(0,10):
    t = threading.Thread(target=download, i*chunk_size)
    t.start()
    threads.append( t)

# Join threads back (order doesn't matter, you just want them all)
for i in threads:
    i.join()

# Sort parts and you're done
result = ''
for i in range(0,10):
    result += parts[i*chunk_size]
```
또한 모든 서버가 Range 헤더를 지원하지는 않습니다 (특히 데이터 가져 오기를 담당하는 PHP 스크립트가있는 서버는 대개 처리를 구현하지 않음).

==============================

2.다음은 파일에 주어진 URL을 저장하고 여러 스레드를 사용하여 다운로드하는 Python 스크립트입니다.

다음은 파일에 주어진 URL을 저장하고 여러 스레드를 사용하여 다운로드하는 Python 스크립트입니다.

#!/usr/bin/env python
import sys
from functools import partial
from itertools import count, izip
from multiprocessing.dummy import Pool # use threads
from urllib2 import HTTPError, Request, urlopen

def download_chunk(url, byterange):
    req = Request(url, headers=dict(Range='bytes=%d-%d' % byterange))
    try:
        return urlopen(req).read()
    except HTTPError as e:
        return b''  if e.code == 416 else None  # treat range error as EOF
    except EnvironmentError:
        return None

def main():
    url, filename = sys.argv[1:]
    pool = Pool(4) # define number of concurrent connections
    chunksize = 1 << 16
    ranges = izip(count(0, chunksize), count(chunksize - 1, chunksize))
    with open(filename, 'wb') as file:
        for s in pool.imap(partial(download_part, url), ranges):
            if not s:
                break # error or EOF
            file.write(s)
            if len(s) != chunksize:
                break  # EOF (servers with no Range support end up here)

if __name__ == "__main__":
    main()

서버가 빈 본문 또는 416 HTTP 코드를 반환하거나 응답 크기가 정확히 청크 화되지 않으면 파일의 끝이 감지됩니다.

Range 헤더를 이해하지 못하는 서버를 지원합니다 (이 경우 모든 요청은 단일 요청으로 다운로드됩니다.) 큰 파일을 지원하려면 download_chunk ()를 변경하여 임시 파일에 저장하고 주 스레드에서 읽을 파일 이름을 반환하십시오. 파일 내용 자체).

독립적 인 연결 수 (풀 크기)와 단일 HTTP 요청에서 요청 된 바이트 수를 독립적으로 변경할 수 있습니다.

스레드 대신 여러 프로세스를 사용하려면 가져 오기를 변경하십시오.

from multiprocessing.pool import Pool # use processes (other code unchanged)

==============================

3.이 솔루션에는 "aria2c"라는 리눅스 유틸리티가 필요하지만 쉽게 다운로드를 다시 시작할 수 있다는 이점이 있습니다.

이 솔루션에는 "aria2c"라는 리눅스 유틸리티가 필요하지만 쉽게 다운로드를 다시 시작할 수 있다는 이점이 있습니다.

또한 다운로드하려는 모든 파일이 http 디렉토리 목록의 MY_HTTP_LOC 위치에 나열되어 있다고 가정합니다. Lighttpd / 1.4.26 http 서버의 인스턴스에서이 스크립트를 테스트했습니다. 그러나이 스크립트를 쉽게 수정하여 다른 설정에도 사용할 수 있습니다.

#!/usr/bin/python

import os
import urllib
import re
import subprocess

MY_HTTP_LOC = "http://AAA.BBB.CCC.DDD/"

# retrieve webpage source code
f = urllib.urlopen(MY_HTTP_LOC)
page = f.read()
f.close

# extract relevant URL segments from source code
rgxp = '(\<td\ class="n"\>\<a\ href=")([0-9a-zA-Z\(\)\-\_\.]+)(")'
results =  re.findall(rgxp,str(page))
files = []
for match in results:
    files.append(match[1])

# download (using aria2c) files
for afile in files:
    if os.path.exists(afile) and not os.path.exists(afile+'.aria2'):
        print 'Skipping already-retrieved file: ' + afile
    else:
        print 'Downloading file: ' + afile          
        subprocess.Popen(["aria2c", "-x", "16", "-s", "20", MY_HTTP_LOC+str(afile)]).wait()

from https://stackoverflow.com/questions/13973188/requests-with-multiple-connections by cc-by-sa and MIT license

'PYTHON' 카테고리의 다른 글

[PYTHON] Paramiko와 exec_command - 원격 프로세스를 종료합니까? (0)	2018.11.17
[PYTHON] 플라스크 유닛 테스트 : 로그인 한 사용자의 요청 테스트 방법 (0)	2018.11.17
[PYTHON] 어떻게하면 OS의 기본 프린터 인 Python 3 (크로스 플랫폼)에서 인쇄 할 수 있습니까? (0)	2018.11.17
[PYTHON] Pandas - Python을 사용하여 Excel에서 특정 열을 읽는 방법 (0)	2018.11.17
[PYTHON] 사용중인 NumPy의 버전을 확인하려면 어떻게합니까? (0)	2018.11.17

복붙노트

[PYTHON] 여러 연결 요청

여러 연결 요청

해결법

1.HTTP 범위 헤더를 사용하여 파일의 일부만 가져올 수 있습니다 (이미 여기에서 파이썬에 대해 다뤘습니다).

2.다음은 파일에 주어진 URL을 저장하고 여러 스레드를 사용하여 다운로드하는 Python 스크립트입니다.

3.이 솔루션에는 "aria2c"라는 리눅스 유틸리티가 필요하지만 쉽게 다운로드를 다시 시작할 수 있다는 이점이 있습니다.

'PYTHON' 카테고리의 다른 글

티스토리툴바