程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Description of Python encode and decode functions

編輯:Python

Common types of string encoding :utf-8,gb2312,cp936,gbk etc. .

python in , We use decode() and encode() To decode and encode

stay python in , Use unicode Type as the base type of encoding . namely

decode encode

str ---------> unicode --------->str

u = u' chinese ' # According to specified unicode Type object u
str = u.encode('gb2312') # With gb2312 Code pair unicode Encode images
str1 = u.encode('gbk') # With gbk Code pair unicode Encode images
str2 = u.encode('utf-8') # With utf-8 Code pair unicode Encode images
u1 = str.decode('gb2312')# With gb2312 Code for string str decode , In order to get unicode
u2 = str.decode('utf-8')# If the utf-8 Coding pairs for str The result of decoding , You will not be able to restore the original unicode type 

Like the code above ,str\str1\str2 Are of string type (str), It brings more complexity to string operation .

The good news is here. , That's it python3, In the new version of python3 in , To cancel the unicode type , In its place is the use of unicode Character string type (str), String type (str) Become the base type as follows , After encoding, it becomes byte type (bytes), But the use of the two functions does not change :

decode encode

bytes ------> str(unicode)------>bytes

u = ' chinese ' # Specifies a string type object u
str = u.encode('gb2312') # With gb2312 Code pair u Encoding , get bytes Type object str
u1 = str.decode('gb2312')# With gb2312 Code for string str decode , Get string type object u1
u2 = str.decode('utf-8')# If the utf-8 Coding pairs for str The result of decoding , You will not be able to restore the original string contents 

Inevitably , File reading problem :

Suppose we read a file , When the file is saved , The encoding format used , Determines the encoding format of the content we read from the file , for example , Let's create a new text file from Notepad test.txt, Edit content , Be careful when saving , The encoding format is optional , For example, we can choose gb2312, So use python Read file contents , The way is as follows :

f = open('test.txt','r')
s = f.read() # Read file contents , If it is unrecognized encoding Format ( Identification of the encoding The type depends on the system used ), Here, the read fails
''' Assume that the file is saved in gb2312 Encoding preservation '''
u = s.decode('gb2312') # Decode the content in a file save format , get unicode character string
''' Now we can perform various encoding transformations on the content '''
str = u.encode('utf-8')# Convert to utf-8 Encoded string str
str1 = u.encode('gbk')# Convert to gbk Encoded string str1
str1 = u.encode('utf-16')# Convert to utf-16 Encoded string str1

python Provided us with a package codecs Read the file , The... In this bag open() The function can specify the type of encoding :

import codecs
f = codecs.open('text.text','r+',encoding='utf-8')# The coding format of the document must be known in advance , Here, the file code is used utf-8
content = f.read()# If open The use of encoding And the document itself encoding In case of disagreement , Then there will be an error
f.write(' The information you want to write ')
f.close()

encode() and decode()

  • decode English means decode ,encode Original English meaning code
  • The string is in Python The internal expression is unicode code , therefore , When doing code conversion , It is usually necessary to unicode As an intermediate code , That is, decoding other encoded strings first (decode) become unicode, Again from unicode code (encode) Into another code .
  • decode Is used to convert other encoded strings into unicode code , Such as str1.decode('gb2312'), It means that you will gb2312 Encoded string str1 convert to unicode code .
  • encode The role of the unicode The encoding is converted to other encoded strings , Such as str2.encode('gb2312'), It means that you will unicode Encoded string str2 convert to gb2312 code .
  • Always mean : Want to convert other codes into utf-8 It must first be decoded into unicode Then recode it into utf-8, It is a unicode For the medium of transformation Such as : s=' chinese ' If it's in utf8 In the file of , The string is utf8 code , If it's in gb2312 In the file of , The code is gb2312. In this case , To do code conversion , You need to use it first decode Method to convert it to unicode code , Reuse encode Method to convert it into other encoding . Usually , When no specific encoding method is specified , They are all code files created using the system default encoding

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved