程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Collect meituan takeout data in Python~~

編輯:Python

Knowledge needed (https://jq.qq.com/?_wv=1027&k=Ap5XvyNN)

1、 Dynamic packet capture demo
2、json Data analysis
3、requests Use of modules
4、 preservation csv

Installation command :requests >>> pip install requests

Module installation problem :

If installed python Third-party module :

  1. win + R Input cmd Click ok , Enter the installation command pip install Module name (pip install requests) enter
  2. stay pycharm Click on the Terminal( terminal ) Enter the installation command

How to configure pycharm Inside python Interpreter ?

  1. choice file( file ) >>> setting( Set up ) >>> Project( project ) >>> python interpreter(python Interpreter )
  2. Click on the gear , choice add
  3. add to python The installation path

pycharm How to install plug-ins ?

  1. choice file( file ) >>> setting( Set up ) >>> Plugins( plug-in unit )
  2. Click on Marketplace Enter the name of the plug-in you want to install such as : Translation plug-ins Input translation / Chinese plug-in Input Chinese
  3. Select the corresponding plug-in and click install( install ) that will do
  4. After successful installation Yes, it will pop up restart pycharm The option to Click ok , Restart to take effect

Case realization ideas and processes : You can climb when you can see it …

One . Data source analysis (https://jq.qq.com/?_wv=1027&k=Ap5XvyNN)

Through developer tools for packet capture analysis , Analyze the data you want Where can I get

Analyze the data From the second page

Open developer tools
Click on the second page
Click the search button , Search for content
View the contents of the response data returned by the server

Two . Code implementation steps : Send a request >>> get data >>> Parsing data >>> Save the data

Use code to simulate the browser to send a request to obtain data

  1. Send a request , For the just analyzed url Address send request
  2. get data , Get the data returned by the server
  3. Parsing data , Extract the data we want Store basic information
  4. Save the data , Put the data in the table

Code ( You can scan the QR code at the end of the text to get )

1. The import module

import requests # Data request module 
import pprint # Format output module
import csv # Built-in module
import time
import re
def get_shop_info(html_url):
# url = 'https://www.meituan.com/xiuxianyule/193306807/'
headers = {
'Cookie': '_lxsdk_cuid=17e102d3914c8-000093bbbb0ed8-4303066-1fa400-17e102d3914c8; __mta=48537241.1640948906361.1640948906361.1640948906361.1; _hc.v=e83bebb5-d6ee-d90e-dd4b-4f2124f8f982.1640951715; ci=70; rvct=70; mt_c_token=2Tmbj8_Qihel3QR9oEXS4nEpnncAAAAABBEAAB9N2m2JXSE0N6xtRrgG6ikfQZQ3NBdwyQdV9vglW8XGMaIt38Lnu1_89Kzd0vMKEQ; iuuid=3C2110909379198F1809F560B5E33A58B83485173D8286ECD2C7F8AFFCC724B4; isid=2Tmbj8_Qihel3QR9oEXS4nEpnncAAAAABBEAAB9N2m2JXSE0N6xtRrgG6ikfQZQ3NBdwyQdV9vglW8XGMaIt38Lnu1_89Kzd0vMKEQ; logintype=normal; cityname=%E9%95%BF%E6%B2%99; _lxsdk=3C2110909379198F1809F560B5E33A58B83485173D8286ECD2C7F8AFFCC724B4; _lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; latlng=28.302546%2C112.868692; ci3=70; uuid=f7c4d3664ab34f13ad7f.1650110501.1.0.0; mtcdn=K; lt=9WbeLmhHHLhTVpnVu264fUCMYeIAAAAAQREAAKnrFL00wW5eC7mPjhHwIZwkUL11aa7lM7wOfgoO53f0uJpjKSRpO6LwCBDd9Fm-wA; u=266252179; n=qSP946594369; token2=9WbeLmhHHLhTVpnVu264fUCMYeIAAAAAQREAAKnrFL00wW5eC7mPjhHwIZwkUL11aa7lM7wOfgoO53f0uJpjKSRpO6LwCBDd9Fm-wA; unc=qSP946594369; firstTime=1650118043342; _lxsdk_s=18032a80c4c-4d4-d30-e8f%7C%7C129',
'Referer': 'https://chs.meituan.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36'
}
response = requests.get(url=html_url, headers=headers)
# print(response.text)
phone = re.findall('"phone":"(.*?)"', response.text)[0]
# \n It's not a newline , \n Just symbols \ The escape character is transferred
openTime = re.findall('"openTime":"(.*?)"', response.text)[0].replace('\\n', '')
address = re.findall('"address":"(.*?)"', response.text)[0]
shop_info = [phone, openTime, address]
return shop_info # Save the file Create folder encoding='utf-8' Specified encoding If I use utf-8 What if you mess with the code
# w Will be covered , a Will not cover
f = open(' The ultimate version of the invincible man's Secret .csv', mode='a', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=[
' Shop name ',
' Per capita consumption ',
' minimum consumption ',
' Business circle ',
' Store type ',
' score ',
' Telephone ',
' Business Hours ',
' Address ',
' latitude ',
' longitude ',
' Details page ',
])
csv_writer.writeheader() # Write header # html_url = 'https://apimobile.meituan.com/group/v4/poi/pcsearch/70?uuid=f7c4d3664ab34f13ad7f.1650110501.1.0.0&userid=266252179&limit=32&offset=64&cateId=-1&q=%E4%BC%9A%E6%89%80&token=9WbeLmhHHLhTVpnVu264fUCMYeIAAAAAQREAAKnrFL00wW5eC7mPjhHwIZwkUL11aa7lM7wOfgoO53f0uJpjKSRpO6LwCBDd9Fm-wA'

1. Send a request , For the just analyzed url Address send request Turn the page to analyze the request url The law of address change

for page in range(0, 321, 32): # from 0 32 64 96 128 160 192 .... 320
time.sleep(1.5) # Delay waiting for 1.5S
url = 'https://apimobile.meituan.com/group/v4/poi/pcsearch/70'
# pycharm function Fast batch replacement , ctrl + R Select the target you want to replace , Use regular expressions for batch replacement
data = {
'uuid': 'f7c4d3664ab34f13ad7f.1650110501.1.0.0',
'userid': '266252179',
'limit': '32',
'offset': page,
'cateId': '-1',
'q': ' The clubhouse ',
'token': '9WbeLmhHHLhTVpnVu264fUCMYeIAAAAAQREAAKnrFL00wW5eC7mPjhHwIZwkUL11aa7lM7wOfgoO53f0uJpjKSRpO6LwCBDd9Fm-wA',
}
# headers camouflage python Code coat
# User-Agent The user agent Basic identity information of browser .... The simplest means of anti climbing To prevent being identified as a crawler
# Referer Anti theft chain Tell the server that we request url Where does the address jump from
headers = {
'Referer': 'https://chs.meituan.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36'
}
response = requests.get(url=url, params=data, headers=headers)
# print(response) # <Response [403]> Status code Indicates no access rights Anti theft chain 200 The request is successful


2. get data response.text Get text data string data type response.json() Dictionary data type

 # print(response.json())
# pprint.pprint(response.json()) # The teacher's version is python 3.8
  1. Parsing data Dictionary values , Take values according to key values According to the content to the left of the colon ( key ) Extract the content to the right of the colon ( value )
 searchResult = response.json()['data']['searchResult']
for index in searchResult: # Put the data in the list One by one
# pprint.pprint(index)
href = f'https://www.meituan.com/xiuxianyule/{index["id"]}/'
shop_info = get_shop_info(href)
title = index['title'] # Shop name
price = index['avgprice'] # Per capita consumption
lost_price = index['lowestprice'] # minimum consumption
area = index['areaname'] # Business circle
shop_type = index['backCateName'] # Store type
score = index['avgscore'] # score
latitude = index['latitude'] # latitude
longitude = index['longitude'] # longitude ctrl + D Copy quickly
# tab Collective indent
# shift + tab remove indent
dit = {
' Shop name ': title,
' Per capita consumption ': price,
' minimum consumption ': lost_price,
' Business circle ': area,
' Store type ': shop_type,
' score ': score,
' Telephone ': shop_info[0],
' Business Hours ': shop_info[1],
' Address ': shop_info[2],
' latitude ': latitude,
' longitude ': longitude,
' Details page ': href,
}

4. Save the data

 csv_writer.writerow(dit)
print(dit)

Tail language

Okay , My article ends here !

There are more suggestions or questions to comment on or send me a private letter ! Come on together and work hard (ง •_•)ง

If you like, just pay attention to the blogger , Or like the collection and comment on my article !!!

python Collect the takeout data of meituan ~~ More articles about

  1. python collection websocket real-time data

    Most of the previous data collection is basically http Of , Also has been on how to collect websocket There are questions about the real-time data of , I don't know where to start , be nonplussed over sth , I saw a collection on Zhihu today websocket The article , Very thorough Finally put this question ...

  2. [ forward ]Android Video Technology Exploration Tour : The practice of meituan delivery business

    Meituan technical team  2019-09-12 20:02:11 background 2013 Meituan takeout was founded in , So far, it has been developing rapidly . As the takeout business grows in magnitude , A single text and picture can no longer meet the needs of businesses , Businesses urgently need more means to describe their products ...

  3. python Crawling “ Meituan food ” All store information in Shantou area

    One . Purpose Get all the comment information of each meituan food store , And save to the database and local Two . Implementation steps Get the... Of all stores poiId First look at the url, Followed by a string of numbers , And this string of numbers represents the unique characteristics of each store id Number , We call ...

  4. Meituan takeout app Feasibility analysis

    Meituan takeout app Feasibility analysis 1 introduction 1.1 Purpose of writing Young people pursue fashion , quick , Therefore, the takeaway industry has a broad consumer base : The rise of group buying , It also promotes people's consumption desire , People continue to have a takeout platform , To satisfy their desires .O2o The end of the model ...

  5. Meituan takeout iOS App Cold start treatment

    One . background The cold start time is App An important indicator of performance , As the first course of user experience “ door ”, It directly determines the user's attitude towards App First impression of . Meituan takeout iOS The client from 2013 year 11 Month begins , After dozens of iterations , The product form is constantly improving , Business skills ...

  6. Meituan takeout Android Platform reuse practice

    Meituan takeout platform reuse mainly refers to multi terminal code reuse , Just like meituan takeout iOS Promotion of multi terminal reuse . Support and think about , Multiterminal has two meanings : One is multi entry of the same business , Meituan take out business needs to take out in meituan App( It is hereinafter referred to as takeout App) And meituan ...

  7. WMRouter: Meituan takeout Android Open source routing framework

    WMRouter Is a Android Routing framework , Design ideas based on component , Flexible function , It's easy to use . WMRouter Originally used to solve meituan takeout C End App Practical problems in the process of business evolution , After that, it gradually extended to other parts of meituan App ...

  8. Meituan takeout Android Crash The road of governance

    Crash Rate is a measure of App One of the important indicators of good or bad , If you ignore it , It's going to get worse , In the end, a large number of users are lost , And bring immeasurable loss to the company . This article talks about meituan takeout Android The client team will App Of Cras ...

  9. Mobile development : Meituan takeout Android Lint Code checking practice

    summary Lint yes Google Provided Android Static code checking tool , You can scan and find potential problems in your code , Remind developers to fix it early , Improve code quality . except Android There are hundreds of them Lint The rules , You can also develop custom L ...

  10. Use Python take Excel Import data from to MySQL

    Use Python take Excel Import data from to MySQL Tools Python 2.7 xlrd MySQLdb install Python For different systems, the installation method is different ,Windows Platform has exe Installation package ,Ubunt ...

Random recommendation

  1. Detailed explanation Javascript The inheritance implementation of

    What I learned first was js The way to implement inheritance in w3school Learn how to mix prototype chains and object impersonation , At work , When it comes to inheritance , I use this method to achieve . Its implementation is simple , have a lucid brain : Impersonating objects as inheriting properties of the parent class constructor , use ...

  2. xinetd

    Minimal installation centos6.4 when ,xinetd The service is not installed , It's just /etc There are xinetd.d Catalog , No, xinetd.conf This configuration file xinetd is a secure replacemen ...

  3. C# MailMessage Attachment Send email with Chinese name attachment -Firefox Chinese display is normal , Web page open mail attachment Chinese name garbled

    One . The story First, through CDO.Message To get mail EML related data : Email title . Email content . Email attachment . Sender . The recipient .CC There are only a few , Secondly through MailMessage To organize mail through Python To send mail ! ...

  4. h.264 Refer to the image list 、 Decode image cache

    1. Refer to the image list (reference picture list) Generally speaking ,h.264 The images to be encoded are divided into three types :I.P.B, Among them B.P This type of image adopts the inter frame coding method , And interframe coding is ...

  5. java synchronized Reentry and analysis summary of built-in lock

    Recently read <<Java Concurrent programming practice >>, In Chapter 2, thread safety is reduced to re-entry of thread lock (Reentrancy) When a thread requests a lock that is already held by another thread , The request thread will be blocked . However, the internal lock is reusable ...

  6. php Forms modify data

    ( I'll take it from the front ) The first page xiugai.php <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" ...

  7. De sequencing reads Connector in :adaptor

    I used to use c Wrote a program , lookup reads Is it included in adaptor, If it is detected, it is filtered out adaptor Of reads, This time, after filtering the data, we found that there were many joint sequences , In order to improve the assembly effect , It can not greatly affect the amount of data , ...

  8. zz:linux Next rz,sz Method of installation

    zz:http://xukaizijian.blog.163.com/blog/static/1704331192011611104631875/ wget http://ohse.de/uwe/re ...

  9. java Powerful enumerations in ( Basically no one uses )

    The concept of enumeration is similar to that of multi - instance design patterns , For more design patterns, see : Multiple design pattern code models Example : Simple enumeration class adopt emu Keyword defines an enumeration package com.java.demo; enum Color{ RED,BL ...

  10. iOS AppsFlyer Precautions for use of

    AppFlyer It is a popular advertising tracking statistical tool recently , Of course, the statistical function of Youmeng can also be realized , and appsflyer It also has targeted delivery , yes app Jump to the corresponding page . The details of the : When you click on an advertisement , Assume no application is installed . Will jump ...


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved