程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

【python爬蟲】爬取唯品會商品信息

編輯:Python

唯品會商品信息采集步驟:

  1. 獲取品牌ID和品牌名稱;
  2. 獲取當前品牌商品列表的總頁數;
  3. 獲取每頁商品列表中商品的信息。

一、獲取品牌ID和品牌名稱

 def get_task(self,task_list=None):
'''
獲取任務
:return:
'''
try:
print("商品類型列表:",task_list)
for task_ in task_list:
key_dit = {
"keyword":task_
}
url_str = urlencode(key_dit)
start_api = f'''https://mapi-rp.vip.com/vips-mobile/rest/shop/search/brand_store/get/v3?app_name=shop_wap&app_version=4.0&api_key=8cec5243ade04ed3a02c5972bcda0d3f&mobile_platform=2&source_app=yd_wap&warehouse=VIP_NH&fdc_area_id=104104101&province_id=104104&mars_cid=1584322664117_812f182347fe5848add8d04b91257af6&mobile_channel=mobiles-adp%3Ag1o71nr0%3A%3A%3A%3A%7C%7C&standby_id=nature&channel_id=1&isAZSort=1&gPlatform=WAP&mvip=true&_=1599117093&{url_str}'''
print(task_," ",start_api)
task_resp = self.sc.get_html(start_api)
task_json = task_resp.json()
if task_json['code']==1:
brand_list = task_json['data']['list']
# print(brand_list)
task_list = []
for brand_dict in brand_list:
#品牌ID
brand_id = int(brand_dict['id'])
#品牌名
brand_name = brand_dict['name']
#插入時間
add_time = datetime.datetime.now()
task_list.append((task_,brand_id,brand_name,add_time,1))
if len(task_list):
sql = f'''insert into {self.task_tbl}(goods_type,brand_id,brand_name,add_time,is_state)
values(%s,%s,%s,%s,%s)'''
print("當前任務:",task_list)
self.sc.store_data(sql,data_list=task_list)
except:
self.sc.collect_error()

二、獲取當前品牌商品列表的總頁數

 def get_totalpage(self,id,brand_id,goods_type):
'''
提取總頁碼數
'''
try:
key_dit = {
"keyword": goods_type
}
url_str = urlencode(key_dit)
total_api = f'''https://mapi-rp.vip.com/vips-mobile/rest/shopping/search/product/rank?app_name=shop_wap&app_version=4.0&api_key=8cec5243ade04ed3a02c5972bcda0d3f&mobile_platform=2&source_app=yd_wap&warehouse=VIP_NH&fdc_area_id=104104101&province_id=104104&mars_cid=1584322664117_812f182347fe5848add8d04b91257af6&mobile_channel=mobiles-adp%3Ag1o71nr0%3A%3A%3A%3A%7C%7C&standby_id=nature&{url_str}&brandStoreSns={brand_id}&sort=0&pageOffset=0&channelId=1&wapConsumer=A1&gPlatform=WAP&functions=bsBrands%2CfavNumLabel%2CtotalLabel&mvip=true&_=1599122080'''
print("任務ID:",id,"商品類型:",goods_type,"品牌ID:",brand_id,"獲取總頁數:")
print(total_api)
html = self.sc.get_html(total_api)
if not html:
return 0
resp = html.json()
total_num = int(resp['data']['total'])
page_offset = int(resp['data']['batchSize'])
up_sql = f'''update {self.task_tbl} set total_num={total_num} where id={id}'''
print("更新總商品:",up_sql)
self.sc.store_data(up_sql)
return (total_num,page_offset)
except:
self.sc.collect_error()

三、獲取每頁商品列表中商品的信息

 def get_products(self,data_tuple):
try:
'''提取商品列表'''
id,goods_type,brand_id,brand_name,total_num,add_time,is_state = data_tuple
total_num,page_offset = self.get_totalpage(id,brand_id,goods_type)
time.sleep(3.2)
for i in range(0,total_num,page_offset):
key_dit = {
"keyword":goods_type
}
url_str = urlencode(key_dit)
brand_api = f'''https://mapi-rp.vip.com/vips-mobile/rest/shopping/search/product/rank?app_name=shop_wap&app_version=4.0&api_key=8cec5243ade04ed3a02c5972bcda0d3f&mobile_platform=2&source_app=yd_wap&warehouse=VIP_NH&fdc_area_id=104104101&province_id=104104&mars_cid=1584322664117_812f182347fe5848add8d04b91257af6&mobile_channel=mobiles-adp%3Ag1o71nr0%3A%3A%3A%3A%7C%7C&standby_id=nature&{url_str}&brandStoreSns={brand_id}&sort=0&pageOffset={i}&channelId=1&wapConsumer=A1&gPlatform=WAP&functions=bsBrands%2CfavNumLabel%2CtotalLabel&mvip=true&_=1599122080'''
print(f"獲取{i}商品",brand_api)
html = self.sc.get_html(brand_api)
if not html:
print("獲取商品ID頁面失敗")
continue
resp_json = html.json()
products_list = resp_json['data']['products']
if len(products_list):
insert_list = []
for products_dict in products_list:
#商品ID
goods_id = int(products_dict['pid'])
# 插入時間
add_time = datetime.datetime.now()
insert_list.append((goods_type,brand_id,brand_name,goods_id,add_time,1))
insert_sql = f'''insert ignore into {self.data_tbl}(goods_type,brand_id,brand_name,
goods_id,add_time,is_state) values (%s,%s,%s,%s,%s,%s)'''
print("數據插入","*"*50)
self.sc.store_data(insert_sql,data_list=insert_list)
time.sleep(random.uniform(1.7,4.2))
up_sql = f'''update {self.task_tbl} set is_state=0 where id={id}'''
print(f"{brand_id}——{brand_name}爬完商品ID",up_sql)
self.sc.store_data(up_sql)
except:
self.sc.collect_error()

以上就是我的分享,如果有什麼不足之處請指出,多交流,謝謝!

想獲取更多數據或定制爬蟲的請私信我。


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved