程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python zero basic crawler exercise: how to use Python to crawl Gaode map

編輯:Python

Hello everyone , I am ambitious

This time I introduce a super simple small page for crawling dynamic web pages demo.

When it comes to dynamic web pages , How much do you know about it ?

Simply speaking , To get the web page data of a static web page, just send the web page to the server url Address line , The data of dynamic web page is stored in the back-end database . So to get dynamic web page data , We need to send the request file to the server url Address , Not the page url Address .

ok, Let's get to the point .

One 、 Analyze the structure of the web page

This blog post starts with the map of Gaud :https://www.amap.com/

After opening , We found a bunch of div label , But there's no data we need , At this time, it can be determined as a dynamic web page , This is the time , We need to find the interface

Click on the web tab , We can see that the web page sends a lot of requests to the server , There's a lot of data , It takes too much time to find it

We click XHR classification , Can reduce a lot of unnecessary files , Save a lot of time .

XHR The type is passed XMLHttpRequest Method to send the request , It can exchange data with the server in the background , This means that you can... Without loading the entire web page , Update a part of the web page . in other words , The data requested from the database and the response is XHR Type of

And then we can go in XHR Under the type, start looking for , The following data are found

By looking at Headers get URL

After the open , We found that it was the weather condition of the last two days .

After opening it, we can see the above situation , This is a json File format . then , Its data information is kept in the form of a dictionary , And the data is stored in “data” In this key value .

ok, eureka json data , Let's compare and see if it's what we're looking for

by force of contrast , The data exactly corresponds to , That means we have the data .

Two 、 Get the relevant website

'''

ok, We've got the website , The following is the specific code implementation . How to do that ,

We know json Data can be used response.json() Turn Dictionary , And then operate the dictionary .

3、 ... and 、 Code implementation

After knowing the location of the data , We started to write code .

3.1 Query all city names and numbers

First grab the web page , By adding headers To disguise as a browser to access the database address , To prevent interception after being identified .

url_city = "https://www.amap.com/service/cityList?version=202092419"

After we get the data we want , We can find out by searching cityByLetter The number and name in it are what we need , Then we can dish it .

 if "data" in content:

3.2 Look up the weather according to the number

Got the number and the name , The following must be the weather query !

Let's look at the interface first

Pass the above figure , You can determine the maximum temperature , Minimum temperature, etc . So we can crawl data in this way .

url_weather = "https://www.amap.com/service/weather?adcode={}"

ok, Our vision has come true .

Four 、 Complete code

# encoding: utf-8

5、 ... and 、 Save results


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved