程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python crawler-32-python font anti crawling, the page view is inconsistent with the actual download (Theory)

編輯:Python


A font created by the developer , But it is precisely because it was created by itself , So when we check the source code of the web page, we find that it is garbled , Because our system doesn't recognize .

But because it is a front-end display , If you recreate all the thousands of common Chinese characters , So every time you visit the website, you need to download more than ten megabytes of content , So it's often just numbers , Or some sensitive words are remade .

1、 describe

When we crawl too many websites , Will you find a situation , That is to say, Mingming sees the actual data on the web page , But when you use ​​python​​ After the reptile crawled down , The data becomes garbled , Can't identify what it is .

For example, we see that the data of the web page is : Price :100/ element

But when we use ​​python​​​ The crawler technology obtains the ​​html​​ After code , The data that can be seen can not be seen , It's garbled , And we use all common encoding formats , Can't recognize , This is a font created by the developer of this website for data confidentiality .

Our computer does not store this font library , All unrecognized garbled codes , So if we need to convert the garbled code into plaintext , We first need to understand how to create fonts . What kind of method is this .

2、 establish - Parsing the font

Here we use a software , The name is called ​​FontCreator​​, This software can create Fonts , You can also parse fonts , Because we mainly analyze , So let's demonstrate parsing ;

Software package acquisition method :

WeChat official account “ Operation and maintenance home ” The background to reply :​​FontCreator​

You can get the download address of the software ;

Then we install it step by step , Open the interface as follows :

Here we choose a font that comes with the system , Let's see what it looks like after opening it .

First we need to know which font library we want to parse , Here we choose ​​C:\Windows\Fonts​​ Any file in this directory , Because all the fonts of our own computers are stored here , Select one to copy to another location ;

Click on ​​FontCreator​​​ In the top right corner of the ​​File​​​--​​Open​​, Open the font library we just copied , The interface is as follows :

We can see about this font library , Information about each font .

If we want to create a font , The simplest way is to transform what already exists , Double click to change the font , A similar interface appears :

Debug the anchor point , After saving again , This font is unique to you .

3、 Deep cognition

After we pass the above cognition , We know that the font is a font created by the developer of the corresponding website for safety , So how can we solve this problem .

Here we come to understand four concepts ;

1、 The font itself

That is, a value of the Chinese character or the number itself ;

2、 form

That is to say, what does this Chinese character look like , It can be simply understood as a painting ;

3、name

What is the name of this Chinese character in the corresponding font library ;

4、code

The name of this Chinese character , What is the corresponding code ;

What we see on the browser page , What is shown is The font itself , When viewing the source code , According to the **code*;

4、 Common anti climbing

There are two ways :

First of all : Directly generate a new font library , The four elements are stable , It won't change ;

second : Each request generates a new font library randomly ;

The first method is relatively simple , Directly parse the value and store it ;

The second method is more complicated , But speaking of complexity , It's not very complicated , Because of the four elements , The font itself 、 form 、name, These three elements generally do not change , What has changed is only what we see ​​code​​, So can we deduce from it ?

This article is only theoretical knowledge , In the next article, let's practice hand in hand .

Support , In this paper, the end , Relevant contents are updated daily . 


For more information, go to VX official account “ Operation and maintenance home ” , Get the latest article .


------ “ Operation and maintenance home ”  ------

------ “ Operation and maintenance home ”  ------

------ “ Operation and maintenance home ”  ------


linux Under the system ,mknodlinux,linux Directory write permission , Chinese cabbage can be installed linux Do you ,linux How the system creates files , Led the g linux How to install software in the system ,linux Text positioning ;

ocr distinguish linux,linux Anchoring suffix ,linux System usage records ,u Dish has linux Image file , Fresh students will not Linux,linux kernel 64 position ,linux Self starting management service ;

linux Calculate folder size ,linux What are the equipment names ,linux Can I use a virtual machine ,linux The system cannot enter the command line , How to create kalilinux,linux Follow so Are the documents the same .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved