python进阶11并发之七多种并发方式的效率测试

测试map,apply_async,gevent协程爬虫
测试代码:网页爬虫

函数代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def thread_multi():
threads = list()
for url in urls:
threads.append(threading.Thread(target=process, args=(url,)))
[t.start() for t in threads]
[t.join() for t in threads]


def thread_map():
pool = ThreadPool(max(1, cpu_count() - 1))
results = pool.map(process, urls)
pool.close()
pool.join()
print(results)


def thread_async():
pool = ThreadPool(max(1, cpu_count() - 1))
results = list()
for url in urls:
results.append(pool.apply_async(process, args=(url,)))
pool.close()
pool.join()
print([result.get() for result in results])


def process_multi():
processes = list()
for url in urls:
processes.append(Process(target=process, args=(url,)))
[t.start() for t in processes]
[t.join() for t in processes]


def process_map():
pool = Pool(processes=max(1, cpu_count() - 1))
results = pool.map(process, urls)
pool.close()
pool.join()
print(results)


def process_async():
pool = Pool(processes=max(1, cpu_count() - 1))
results = list()
for url in urls:
results.append(pool.apply_async(process, (url,)))
pool.close()
pool.join()
print([result.get() for result in results])

测试结果concurrentOpt

thread_multithread_mapthread_asyncprocess_multiprocess_mapprocess_async
05.7320659.2367847.8310969.9540779.77872312.086315
14.86826114.9483478.4313479.67972217.0867326.354689
27.170749.83352812.2484466.58471117.40519117.600024
311.22375510.8481679.2356626.8413729.96999511.37249
410.3913038.5403738.23672610.9716458.9645629.265784
58.36939.9425659.5411388.7898228.26614810.571744
68.79913310.43675713.66956510.4970219.66878510.168379
75.6032229.045689.8434954.58727514.59614110.470989
89.0038436.431419.9418584.7381468.1707789.773284
99.41474915.568229.231528.2540238.78107614.082026
108.057637110.48318319.82108538.089781411.268813111.1745724
最后一行为均值

测试结果concurrentOptGevent

thread_multithread_mapthread_asyncprocess_multiprocess_mapprocess_asyncgevent_test
010.77062310.07216711.22029811.3083273.40E-051.00E-056.035623
15.6283678.8505318.9392887.6082353.20E-051.00E-056.700398
25.2143416.72645513.698066.8085653.20E-051.10E-056.868222
36.8493627.9764066.9225545.71321.70E-055.00E-063.650169
45.4427278.1795338.465565.0843513.00E-051.10E-057.655325
57.9493279.94219.2342884.7236013.30E-051.00E-054.739602
65.7658489.5348657.9563489.0047072.00E-056.00E-065.40825
73.8961310.0266866.811145.6855563.60E-051.00E-054.534598
88.83831610.2929496.7098029.3286481.80E-055.00E-065.724812
97.1443129.3213199.648215.8984143.30E-051.00E-058.251311
106.74993539.09230118.96055487.11636042.85E-058.80E-065.956831
最后一行为均值

总结

concurrentOpt进程or线程同步or异步(不大确定)阻塞or非阻塞(不大确定)平均时间
thread_multi多线程异步非阻塞8.0576371
thread_map线程池(批)同步阻塞10.4831831
thread_async线程池异步非阻塞9.8210853
process_multi多进程异步非阻塞8.0897814
process_map进程池(批)同步阻塞11.2688131
process_async进程池异步非阻塞11.1745724
concurrentOptGevent进程or线程同步or异步(不大确定)阻塞or非阻塞(不大确定)平均时间
thread_multi多线程异步非阻塞6.7499353
thread_map线程池(批)同步非阻塞9.0923011
thread_async线程池异步非阻塞8.9605548
process_multi多进程异步非阻塞7.1163604
process_map进程池(批)同步非阻塞卡住
process_async进程池异步非阻塞卡住
gevent_test协程异步非阻塞5.956831

结论:
01,启用gevent后,除了卡住的,线程和进程均加快1s左右时间
02,协程在线程程序中是最快的
03,多线程程序下载速度弱优于多进程
04,不论是进程还是线程,使用thread_async都快于map
05,不考虑协程时,多线程较线程池速度更快,多进程较进程池速度更快,这一点不大符合理论,个人感觉和url数量少有关.

至于进程池在启用gevent后卡住的问题,网上也没查到相关的靠谱资料,哪位大牛晓得的话,求解释~
测试代码:github的concurrentOpt.py和concurrentOptGevent.py

python进阶系列
python进阶01偏函数
python进阶02yield
python进阶03UnboundLocalError和NameError错误
python进阶04IO的同步异步,阻塞非阻塞
python进阶04IO的同步异步,阻塞非阻塞
python进阶05并发之一基本概念
python进阶05并发之一基本概念
python进阶06并发之二技术点关键词
python进阶07并发之三其他问题
python进阶08并发之四map, apply, map_async, apply_async差异
python进阶09并发之五生产者消费者
python进阶10并发之六并行化改造
python进阶11并发之七多种并发方式的效率测试
python进阶12并发之八多线程与数据同步
python进阶13并发之九多进程和数据共享
python进阶14变量作用域LEGB
python进阶15多继承与Mixin
python进阶16炫技巧
python进阶17正则表达式
python进阶18垃圾回收GC
python进阶19装饰器和闭包
python进阶20之actor
python进阶21再识单例模式

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×