程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python crawler lesson 2

編輯:Python

for loop

name = 'neusoft'
for x in name:
print(x)
if x == 's':
print(' ha-ha ')
# result 
>>> n
e
u
s
ha-ha
o
f
t

Progress bar

Use pip install tqdm install tqdm library
Import tqdm Kuhe time library
from tqdm import tqdm
import time
mylist = []
for i in range(20):
mylist.append(i)
# Traverse mylist
for x in tqdm(mylist):
time.sleep(1)
# result 
>>> 35%|███▌ | 7/20 [00:07<00:13, 1.01s/it]

String manipulation

String replacement
# Common operations 
price = '¥9.9'
# String replacement 
price = price.replace("¥", '')
print(price)
# result 
>>> 9.9
# Price increase 10 times 
new_price = float(price) *10
print(new_price)
# result 
>>> 99.0
while True:
seg = input('')
seg = seg.replace(' Do you ?', '!')
print(seg)
# result 
>>> how are you ?
>>> Hello !
strip Go to the space operation
name = ' neuedu '
print(len(name))
# result 
>>> 13
name = name.strip()
print(len(name))
# result 
>>> 6
join Turn the list into a string
li = [' you ', ' good ', ' handsome ']
disk_path = ['C:','Users', 'Administrator', 'Desktop', 'CCF']
path = '\\'.join(disk_path)
print(path)
# result 
>>> C:\Users\Administrator\Desktop\CCF
li = ''.join(li)
print(li)
# result 
>>> you are handsome

Tuples

The function of tuples
  • Write protect , Security , Python The types returned by built-in functions are tuples
  • Relative to the list , Tuples save more space , More efficient
Creating a tuple
# Tuples are very similar to lists , It just can't be modified 
a = (1, '1', 3)
print(a)
print(type(a))
# result 
>>> (1, '1', 3)
>>> <class 'tuple'>
# A tuple that has an element 
a = (100)
print(type(a))
# result 
>>> <class 'int'>
b = (100,)
print(type(b))
# result 
>>> <class 'tuple'>
# The combination we often use :
list2 = [('a',22),('b', 33),('c',99)]
Access tuples
print(a[2])
# result 
>>> 3

Dictionaries

Create a dictionary key -value
info = {
'name': ' Li Bai ', 'age': 18, 'gender':'female'}
print(type(info))
# result 
>>> <class 'dict'>
# Visit the dictionary By creating access values 
print(info['name'])
# result 
>>> Li Bai
# Access to nonexistent keys 
print(info['addr'])
# result ( Report errors )
>>> KeyError: 'addr'
# When this key doesn't exist , You can return the default value ,
# With this key, you can return to normal 
print(info.get('addr', ' Fushun City '))
# result 
>>> Fushun City
# modify 
info['age'] = 3
print(info)
# result 
>>> {
'name': ' Li Bai ', 'age': 3, 'gender': 'female'}
# increase When this key does not exist in the dictionary , Will add 
info['addr'] = ' Anshan City '
print(info)
# result 
>>> {
'name': ' Li Bai ', 'age': 3, 'gender': 'female', 'addr': ' Anshan City '}
# Delete 
del info['age']
print(info)
# result 
>>> {
'name': ' Li Bai ', 'gender': 'female', 'addr': ' Anshan City '}
# Traverse 
for k, v in info.items():
print(k, '---->', v)
# result 
>>> name ----> Li Bai
gender ----> female
addr ----> Anshan City
# Get all keys 
print(list(info.keys()))
# result 
>>> ['name', 'gender', 'addr']
# Get all values 
print(list(info.values()))
# result 
>>> [' Li Bai ', 'female', ' Anshan City ']

function

The difference between function and method
  • function Process oriented
  • Method object-oriented
Python The function in
# Definition of function 
def say_hello(name):
print('hello', name)
say_hello('neusoft')
# result 
>>> hello neusoft
# 1 To The sum of any number 
def caculate_num(num):
sum_num = 0 # Save sum 
for i in range(1, num+1):
sum_num = sum_num + i
return sum_num
print(caculate_num(100))
# result 
>>> 5050

Reptiles

A crawler is a web crawler , English is Web Spider. A spider crawling on the Internet , If we regard the Internet as a big net , So a reptile is a spider crawling around on a big web , Meet the food you want , Just grab him out .

We enter a web address in our browser , Hit enter , See the page information of the website . This is the server where the browser requests the website , Get network resources . that , The crawler is also equivalent to simulating the browser to send requests , To obtain HTML Code .HTML Code usually contains labels and text messages , We extract the information we want from it .

Usually, a crawler starts from a certain page of a website , Crawl the contents of this page , Find other link addresses in the web page , Then climb from this address to the next page , So I kept climbing down , Go in and grab information in batches . that , We can see that a web crawler is a program that keeps crawling web pages to grab information .
Quote from https://blog.csdn.net/Computer_Hu/article/details/83351766
##### Web crawling

# Import *requests* library 
import requests
# Get the source code of the specified domain name 
response = requests.get('http://www.baidu.com')
# The encoding of the response 
# Set the encoding mode 
response.encoding = 'utf-8'
# Response status code 200 ok 404not found
print(response.status_code)
# result 
>>> 200
print(response.encoding)
# result 
>>> utf-8
# obtain string Type corresponding 
html_data = response.text
print(html_data)
# Write the crawled file locally html
# File path Read write mode Encoding mode 
with open('index.html', 'w', encoding='utf-8') as f:
f.write(html_data)
Image crawling
# Image crawling 
# Picture address 
url = 'http://www.cpnic.com/UploadFiles/img_0_57266514_681418918_26.jpg'
response2 = requests.get(url)
# obtain byte Type of response 
img_data = response2.content
# File path Read write mode write bingary Encoding mode 
if response2.status_code == 200:
with open('kebi.jpg', 'wb') as f:
f.write(img_data)

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved