您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python - string encoding and decoding

編輯：Python

List of articles

- About codec
- - Type of code
- Code implementation encoding and decoding
- - Common string -- Byte conversion
  - Byte style string codec
  - url codec
  - Add bytes

About codec

code / Decoding is essentially a mapping
character a use ascii The encoding is 65, Stored in the computer as 00110101.
a Need to decode to 00110101, Can be used by the computer .

code ： The correspondence between real characters and binary strings , Real characters → Binary string
decode ： The correspondence between binary string and real character , Binary string → Real characters

Such as ：
UTF-8 --> decode decode --> Unicode
Unicode --> encode code --> GBK / UTF-8 etc.

Type of code

ASCII Occupy 1 Bytes , Only English is supported
GB2312 Occupy 2 Bytes , Support 6700+ Chinese characters
GBK GB2312 Upgraded version , Support 21000+ Chinese characters , chinese 2 Bytes .
Unicode 2-4 byte , Included 136690 Characters
UTF-8： Use 1、2、3、4 Bytes for all characters ;
priority of use 1 Characters 、 If it cannot be satisfied, a byte will be added , most 4 Bytes .
English 1 Bytes 、 The European language family accounts for 2 individual 、 East Asia accounts for 3 individual , Other and special characters occupy 4 individual , chinese 3 Bytes .
UTF-16： Use 2、4 Bytes for all characters ;
priority of use 2 Bytes , Otherwise use 4 Byte representation .

ASCII With 1 byte 8 individual bit Bit represents a character , The first is all 0, The character set represented is obviously not enough

unicode The coding system is designed to express any language , To prevent redundancy on storage （ such as , Corresponding ascii The part of the code ）, It uses variable length coding , But variable length coding makes decoding difficult , It can't be judged that several bytes represent a character

UTF-8 Is aimed at unicode A prefix for variable length coding design , According to the prefix, it can be judged that several bytes represent a character

Python Default encoding in

Python2 Default is ASCII code
Python3 Default is unicode

Code implementation encoding and decoding

Common string – Byte conversion

str = ' Hello ' # b'\xe4\xbd\xa0\xe5\xa5\xbd' gbk：b'\xc4\xe3\xba\xc3'
str = 'abc' # b'abc'
str = 'นั่ง' # b'\xe0\xb8\x99\xe0\xb8\xb1\xe0\xb9\x88\xe0\xb8\x87'
str = 'นั่' # b'\xe0\xb8\x99\xe0\xb8\xb1\xe0\xb9\x88'
# str = 2 # 'int' object has no attribute 'encode'
str = '*' # b'*'
a = str.encode('UTF-8')
a = str.encode('gbk')
print(a)
print(type(a)) # <class 'bytes'>

Byte style string codec

Mainly in the use of raw_unicode_escape code

str = '\xe5\x90\x8d\xe7\xa7\xb0'
str_b = str.encode("raw_unicode_escape") # b'\xe5\x90\x8d\xe7\xa7\xb0'
str_origin = str_b.decode("utf-8") # ' name '

url codec

Use urllib library
Reference resources ： https://www.cnblogs.com/miaoxiaochao/p/13705936.html

str = ' Hello '
a = urllib.parse.quote(str)
print(a) # %E4%BD%A0%E5%A5%BD
b = urllib.parse.unquote(a) # Hello

Add bytes

b = b''
b += b'a'
b += b' b'
print(b) b'a b'
print (b.decode('utf-8')) # a b

Yizhi 2022-06-24（ 5、 ... and ）

上一篇文章： How Python uses regular matching to match numbers after strings
下一篇文章： python_ Argparse & click Library (receiving user parameters from the command line)

Python

Random Role of seed()

random.seed(0) effect ： Make r

不用上班就月入過萬, 學好python後躺贏的4種方法

在這笑貧不笑娼的年代沒有錢就沒有尊嚴,人人都想賺錢,而且是輕

python爬蟲-31-python圖形驗證碼進階，識別中文（二）

有些驗證碼是中文的，使用的時候可能發現了，並不能支持識別中文

python 網絡爬蟲爬取網頁數據時網站字符集不是默認編碼“UTF-8”，導致爬取出來的網頁數據出現其它語言的亂碼情況，需要手動添加網頁相對應的字符集encoding=“ ”

在對網頁數據爬取時會出現字符集不對應而影響爬取出來的數據是一

Fundamentals of python (6)

List of articles Object orien

UE4 使用Websockets與Python通信

在UE4 的build.cs裡面加上WebSocket U

Installing the Python interpreter - detailed process

Pandas uses the split function to split the specific string data column of dataframe into two new data columns and generate a new dataframe

Python and fractal 0019 - [tutorial] stack of circles

python與分形0019 - 【教程】Stack of Circles

Python introductory self-study advanced web framework - 14. Djangos form verification

Python dry goods - \u_ slots__ attribute

List view - function based view Django

Django project - administrator module (middle)_ 08 [more readable version]

Projet Django - - - module administrateur (milieu) 08 [meilleure lisibilité]

Django project - order module (next) and data statistics_ 11 [more readable version]

熱門圖文

Python 學習筆記7 大話設計模式C++實現-第15章-抽象工廠模式 java書店體系卒業設計整體設計（1） C#：lock鎖與訂單號（或交易號）的生成， WebService大講堂之Axis2(10)：使用soapmonitor模塊監視soap請求與響應消息 javaweb-jsp怎麼和Oracle數據庫連接？ PHP網站開發中的變量作用域 Python3高級特性（五）之容器(container)

欄目導航