程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python collaboration asyncio minimalist introduction and reptile practice

編輯:Python

In understanding  Python  After multithreading and multiprocessing of concurrent programming , Let's take a look at the  asyncio  The asynchronous IO Programming -- coroutines

01

A brief introduction to the project

coroutines (Coroutine) Also called tasklet 、 fibers , A contract is not a process or thread , The execution process is similar to  Python  Function call ,Python  Of  asyncio  Asynchrony of module implementation IO In the programming framework , A coroutine is a  async  Call of asynchronous function defined by keyword ;

A process contains multiple threads , Similar to a human tissue, there are many kinds of cells working , Again , A program can contain multiple collaborations . Multiple threads are relatively independent , Thread switching is controlled by the system . Again , Multiple processes are also relatively independent , But its switching is controlled by the program itself .

02

A simple example

Let's use a simple example to understand the coroutine , First look at the following code :

import time
def display(num):
time.sleep(1)
print(num)
for num in range(10):
display(num)

It's easy to understand , The program will output 0 To 9 The number of , every other 1 Output a number in seconds , Therefore, the execution of the whole program needs about 10 second Time . It is worth noting that , Because there is no use of multithreading or multiprocessing ( Concurrent ), There is only one execution unit in the program ( There is only one thread in the perform ), and  time.sleep(1)  The hibernation operation of will cause the entire thread to stall 1 Second ,

For the code above , In this period of time CPU It is idle without doing anything .

Let's take a look at what happens when we use a coroutine :

import asyncio
async def display(num): # Use before function async keyword , Becomes an asynchronous function await asyncio.sleep(1)
print(num)

Asynchronous functions are different from ordinary functions , Call a normal function to get the return value , Calling an asynchronous function will result in a coroutine object . We need to put the collaboration object into an event loop to achieve the effect of collaboration with other collaboration objects , Because the event loop is responsible for handling subprocesses Operation of sequence switching .

Simply put, let the blocked subroutine give up CPU To executable subroutines .

03

Basic concepts

asynchronous IO It means that the program initiates a IO operation ( Block waiting ) after , Don't have to etc. IO End of operation , You can continue with other operations ; Do something else , When IO At the end of the operation , You will be informed , And then go ahead and do it . asynchronous IO Programming is a way to achieve concurrency , Apply to IO Intensive task

Python  modular  asyncio  Provides an asynchronous programming framework , The overall flow chart is as follows :

The following describes each function from the code level

async: Define a method ( function ), This method will not be executed immediately in subsequent calls, but will return a coroutine object ;

async def test(): print('hello asynchronous ')
test() # Calling an asynchronous function
Output :RuntimeWarning: coroutine 'test' was never awaited

coroutine: Coroutine object , You can also add a coroutine object to a time loop , It will be called by the event loop ;

async def test():
print('hello asynchronous ')
c = test() # Calling an asynchronous function , Get the collaboration object -->c
print(c)
Output :<coroutine object test at 0x0000023FD05AA360>

event_loop: The event loop , It's like an infinite loop , You can add some functions to this event , The function does not execute immediately , But when certain conditions are met , The function will be executed in a loop ;

async def test():
print('hello asynchronous ')
c = test() # Calling an asynchronous function , Get the collaboration object -->c
loop = asyncio.get_event_loop() # Create an event loop
loop.run_until_complete(c) # Throw the coroutine object into the loop , And execute the asynchronous function internal code
Output :hello asynchronous

await: Used to suspend the execution of a blocking method ;

import asyncio
def running1():
async def test1():
print('1')
await test2()
print('2')
async def test2():
print('3')
print('4')
loop = asyncio.get_event_loop()
loop.run_until_complete(test1())
if __name__ == '__main__':
running1()

Output :

task: Mission , Further encapsulation of the collaboration object , Contains the status of the task ;

async def test():
print('hello asynchronous ')
c = test() # Calling an asynchronous function , Get the collaboration object -->c
loop = asyncio.get_event_loop() # Create an event loop
task = loop.create_task(c) # establish task Mission
print(task)
loop.run_until_complete(task) # Perform tasks
Output :
<Task pending coro=<test() running at D: /xxxx.py>> # task
hello asynchronous # Asynchronous functions are executed as internal code

future: Represents tasks that will or will not be performed in the future , Actually and task There is no essential difference ; There is no code presentation here ;

First create a function using the general method :

def func(url):
print(f' On right {url} Initiate request :')
print(f' request {url} success !')
func('www.baidu.com')

The results are shown below :

 On right www.baidu.com Initiate request :
request www.baidu.com success 

04

Basic operation

Create a collaboration object

adopt  async  Keyword defines an asynchronous function , Calling an asynchronous function returns a coroutine object .

Asynchronous functions are suspended during function execution , To execute other asynchronous functions , Wait for pending conditions (time.sleep(n)) After disappearing , Come back and execute , Then let's modify the above code :

async def func(url):
print(f' On right {url} Initiate request :')
print(f' request {url} success !')
func('www.baidu.com')

give the result as follows :

RuntimeWarning: coroutine 'func' was never awaited

This is what I mentioned earlier , Use async Keyword makes the function call get a coroutine object , The process cannot run directly , Need to Add to the event loop , The latter calls the coroutine when appropriate ;

establish task Task object

task A task object is a further encapsulation of a collaboration object ;

import asyncio
async def func(url):
print(f' On right {url} Initiate request :')
print(f' request {url} success !')
c = func('www.baidu.com') # The function call is written as an object --> c
loop = asyncio.get_event_loop() # Create a time cycle object
task = loop.create_task(c)
loop.run_until_complete(task) # Register and start
print(task)

give the result as follows :

 On right www.baidu.com Initiate request :
request www.baidu.com success !
<Task finished coro=<func() done, defined at D:/data_/test.py:10> result=None>

future Use

We mentioned earlier future and task There is no essential difference

async def func(url):
print(f' On right {url} Initiate request :')
print(f' request {url} success !')
c = func('www.baidu.com') # The function call is written as an object --> c
loop = asyncio.get_event_loop() # Create a time cycle object
future_task = asyncio.ensure_future(c)
print(future_task,' unexecuted ')
loop.run_until_complete(future_task) # Register and start
print(future_task,' performed ')

give the result as follows :

<Task pending coro=<func() running at D:/data/test.py:10>> unexecuted
On right www.baidu.com Initiate request :
request www.baidu.com success !
<Task finished coro=<func() done, defined at D:/data/test.py:10> result=None> performed

await Use of keywords

In asynchronous functions , have access to await keyword , For time-consuming operations ( For example, network request 、 File reading, etc IO operation ) Suspend , For example, asynchronous programs need to wait for a long time when they reach a certain step , Just hang this , To execute other asynchronous functions

import asyncio, time
async def do_some_work(n): # Use async Keyword defines an asynchronous function
print(' wait for :{} second '.format(n))
await asyncio.sleep(n) # Sleep for a while
return '{} Seconds later, return to end the operation '.format(n)
start_time = time.time() # Starting time
coro = do_some_work(2)
loop = asyncio.get_event_loop() # Create an event loop object
loop.run_until_complete(coro)
print(' The elapsed time : ', time.time() - start_time)

The operation results are as follows :

 wait for :2 second
The elapsed time : 2.001312017440796

05

Multitasking process

Mission (Task) Object is used to encapsulate the collaboration object , Save the status of the running process , Use  run_until_complete()  Method to register the task to the event loop ;

If we want to use multitasking , Then we need to register multiple task lists at the same time , have access to  run_until_complete(asyncio.wait(tasks)),

there tasks, Represents a task sequence ( It's usually a list )

Registering multiple tasks can also use run_until_complete(asyncio. gather(*tasks))

import asyncio, time
async def do_some_work(i, n): # Use async Keyword defines an asynchronous function
print(' Mission {} wait for : {} second '.format(i, n))
await asyncio.sleep(n) # Sleep for a while
return ' Mission {} stay {} Seconds later, return to end the operation '.format(i, n)
start_time = time.time() # Starting time
tasks = [asyncio.ensure_future(do_some_work(1, 2)),
asyncio.ensure_future(do_some_work(2, 1)),
asyncio.ensure_future(do_some_work(3, 3))]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
for task in tasks:
print(' Task execution results : ', task.result())
print(' The elapsed time : ', time.time() - start_time)

The operation results are as follows :

 Mission 1 wait for : 2 second
Mission 2 wait for : 1 second
Mission 3 wait for : 3 second
Task execution results : Mission 1 stay 2 Seconds later, return to end the operation Task execution results : Mission 2 stay 1 Seconds later, return to end the operation Task execution results : Mission 3 stay 3 Seconds later, return to end the operation The elapsed time : 3.0028676986694336

06

actual combat | Crawling LOL The skin

First, open the official website :

You can see the list of Heroes , I won't show it in detail here , We know that a hero has many skins , Our goal is to crawl all the skin of every hero , Save to the corresponding folder ;

Open a hero's skin page , As shown below :

The daughter of darkness , The rabbit below corresponds to the skin of the Cain brothers , And then by looking at network Find the corresponding skin data in js In the document ;

Then we found the hero skin store url The law of linking :

url1 = 'https://game.gtimg.cn/images/lol/act/img/js/hero/1.js'
url2 = 'https://game.gtimg.cn/images/lol/act/img/js/hero/2.js'
url3 = 'https://game.gtimg.cn/images/lol/act/img/js/hero/3.js'

We found that only id Parameters are dynamically constructed , The rule is :

'https://game.gtimg.cn/images/lol/act/img/js/hero/{}.js'.format(i)

But this id Only the first ones are in order , Find the corresponding hero on the page showing all heroes id,

Here we intercept the last few heroes id, So you have to crawl all over , It needs to be set first id, Because the previous ones are in order , Here we climb Take before 20 The skin of a hero ;

1. Get hero skin ulr Address :

The hero in front id Are all available in order range(1,21), Dynamic construction url;

def get_page():
page_urls = []
for i in range(1,21):
url = 'https://game.gtimg.cn/images/lol/act/img/js/hero/{}.js'.format(i)
print(url)
page_urls.append(url)
return page_urls

2. Request... For each page url Address

And analyze the web page to get the skin image url Address :

def get_img():
img_urls = []
page_urls = get_page()
for page_url in page_urls:
res = requests.get(page_url, headers=headers)
result = res.content.decode('utf-8')
res_dict = json.loads(result)
skins = res_dict["skins"]
for hero in skins:
item = {}
item['name'] = hero["heroName"]
item['skin_name'] = hero["name"]
if hero["mainImg"] == '':
continue
item['imgLink'] = hero["mainImg"]
print(item)
img_urls.append(item)
return img_urls

explain :

  • res_dict = json.loads(result) : What will be obtained json Format string converted to dictionary format ;

  • heroName: Hero's name ( This must be the same , It is convenient for us to create a folder according to the hero name later );

  • name: It means complete name , Including skin name ( This must be different ) yes , we have 'mainImg' It's empty. , We need to make a judgment ;

3. Create a co function

Here we create a folder based on the hero's name , Then pay attention to the naming of the pictures , Don't forget it /, The directory structure is established

The main function :

def main():
loop = asyncio.get_event_loop()
img_urls = get_img() print(len(img_urls))
tasks = [save_img(img[0], img[1]) for img in enumerate(img_urls)]
try:
loop.run_until_complete(asyncio.wait(tasks))
finally:
loop.close()

4. The program runs

if __name__ == '__main__':
start = time.time()
main()
end = time.time()
print(end - start)

Running results :

download 233 The picture cost 42s, You can see the speed is OK , The file directory results are as follows :

And requests contrast

After asynchronously crawling pictures , We need to use requests To perform synchronous data crawling , Compare efficiency , So in the original code On the basis of the above, we will modify , I'll skip it here , It's all the same , This is to replace the event loop in a part with a loop :

img_urls = get_img()
print(len(img_urls))
for i,img_url in enumerate(img_urls):
save_img(i,img_url)

We can see , The speed of using CO process is faster than  requests  Soon some .

The above is the whole content of this paper , Interested readers can type the code by themselves ~

Baidu security verification


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved