首先需要区分Python的并行和并发
什么场景应用并行和并发呢?
1、单线程示例
import requests
import time
def download_one(url):
resp = requests.get(url)
print('Read {} from {}'.format(len(resp.content),url))
def download_all(sites):
for site in sites:
download_one(site)
def main():
sites = [
'https://en.wikipedia.org/wiki/Portal:Arts',
'https://en.wikipedia.org/wiki/Portal:History',
'https://en.wikipedia.org/wiki/Portal:Society',
'https://en.wikipedia.org/wiki/Portal:Biography',
'https://en.wikipedia.org/wiki/Portal:Mathematics',
'https://en.wikipedia.org/wiki/Portal:Technology',
'https://en.wikipedia.org/wiki/Portal:Geography',
'https://en.wikipedia.org/wiki/Portal:Science',
'https://en.wikipedia.org/wiki/Portal:Computer_science',
'https://en.wikipedia.org/wiki/Portal:Python_(programming_language)',
'https://en.wikipedia.org/wiki/Portal:Java_(programming_language)',
'https://en.wikipedia.org/wiki/Portal:PHP',
'https://en.wikipedia.org/wiki/Portal:Node.js',
'https://en.wikipedia.org/wiki/Portal:The_C_Programming_Language',
'https://en.wikipedia.org/wiki/Portal:Go_(programming_language)'
]
start_time = time.perf_counter()
download_all(sites)
end_time = time.perf_counter()
print('Download {} sites in {} second'.format(len(sites), end_time - start_time))
if __name__ == '__main__':
main()
2、多线程示例
import concurrent.futures
import requests
import time
def download_one(url):
resp = requests.get(url)
print('Read {} from {}'.format(len(resp.content),url))
def download_all(sites):
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as exector:
exector.map(download_one, sites)
def main():
sites = [
'https://en.wikipedia.org/wiki/Portal:Arts',
'https://en.wikipedia.org/wiki/Portal:History',
'https://en.wikipedia.org/wiki/Portal:Society',
'https://en.wikipedia.org/wiki/Portal:Biography',
'https://en.wikipedia.org/wiki/Portal:Mathematics',
'https://en.wikipedia.org/wiki/Portal:Technology',
'https://en.wikipedia.org/wiki/Portal:Geography',
'https://en.wikipedia.org/wiki/Portal:Science',
'https://en.wikipedia.org/wiki/Portal:Computer_science',
'https://en.wikipedia.org/wiki/Portal:Python_(programming_language)',
'https://en.wikipedia.org/wiki/Portal:Java_(programming_language)',
'https://en.wikipedia.org/wiki/Portal:PHP',
'https://en.wikipedia.org/wiki/Portal:Node.js',
'https://en.wikipedia.org/wiki/Portal:The_C_Programming_Language',
'https://en.wikipedia.org/wiki/Portal:Go_(programming_language)'
]
start_time = time.perf_counter()
download_all(sites)
end_time = time.perf_counter()
print('Download {} sites in {} second'.format(len(sites), end_time - start_time))
if __name__ == '__main__':
main()
上述两段代码,单线程和多线程版的主要区别在于:
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as exector:
exector.map(download_one, sites)
上述代码表示,创建了个线程池,总共有5个线程可以分配使用,map高阶函数表示并发的对sites的每一个元素调用函数download_one()
也可以通过并行的方式运行上述代码
with futures.ThreadPoolExecutor(workers) as executor
=>
with futures.ProcessPoolExecutor() as executor
在上述代码中,ProcessPoolExecutor()表示创建进程池,使用多个进程并行,这里,通常省略参数workers,因为系统会自动返回CPU的数量作为可以调用的进程数,但是使用多进程效果并不一定显著,因为并行的方式一般用在CPU heavy的场景中,而上述场景是I/O heavy
3、详解Futures
4、常用方法
submit()
除了上述的map调用方式,还可以使用submit方法来调用:
def download_all(sites):
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
executor.submit(download_one,site)
result()
def download_all(sites):
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future = executor.submit(download_one,site)
result = future.result()
print(result)
如果download_one函数有返回值,那么result就是返回值,若没有,返回值为None
as_completed()
def download_all(sites):
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(download_one,site) for site in sites]
for future in concurrent.futures.as_completed(futures):
result = future.result()
print(result)