您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

The difference between Python | bytes and str

編輯：Python

This article mainly introduces in Python in bytes And str The difference between .¹
Updated: 2022 / 6 / 16

bytes And str The difference between

Reference link

1. Python There are two types that can represent character sequences

bytes
The example shown below contains the raw data , namely 8 The unsigned value of the bit （ Usually according to ASCII code Standard to show ）
str
The examples shown below contain Unicode Code points （code point, Also called code points ）, These code points correspond to text characters in human language

a = b'h\x6511o'
print(list(a))
# [104, 101, 49, 49, 111]
print(a)
# b'he11o' 
a = 'a\\u300 propos'
print(list(a))
# ['a', '\\', 'u', '3', '0', '0', ' ', 'p', 'r', 'o', 'p', 'o', 's']
print(a)
# a\u300 propos

2.Unicode Data and binary data conversion

hold Unicode Data into binary data , Must call str Of encode Method （ code ）

FileContent = 'This is file content.'
print(FileContent)
# 'This is file content.'
print(type(FileContent))
# <class 'str'>
FileContent = FileContent.encode(encoding='utf-8')
print(FileContent)
# b'This is file content.'
print(type(FileContent))
# <class 'bytes'>

Convert binary data into Unicode data , Must call bytes Of decode Method （ decode ）

FileContent = b'This is file content.'
print(FileContent)
# b'This is file content.'
print(type(FileContent))
# <class 'bytes'>
FileContent = FileContent.decode(encoding='utf-8')
print(FileContent)
# 'This is file content.'
print(type(FileContent))
# <class 'str'>

When you call these methods , You can specify the character set encoding , You can also use the system default solution , Usually UTF-8
The default character set encoding of the current operating system ,Python Check the default coding standard of the current operating system with a line of code : stay cmd In the implementation of :
python3 -c 'import locale; print(locale.getpreferredencoding())'
# UTF-8

3. Use the original 8 Bit value and Unicode character string
Use the original 8 Bit value and Unicode Two problems to pay attention to when string ( This problem is equivalent to using bytes and str Two problems that need to be paid attention to ):

3.1bytes and str Are incompatible with each other

Use + The operator

# bytes+bytes
print(b'a' + b'1')
# b'a1'
# str+str
print('b' + '2')
# b2
# bytes+str
print('c' + b'2')
# TypeError: can only concatenate str (not "bytes") to str

Binary operators can also be used to compare sizes between the same types

# bytes bytes
assert b'c' > b'a'
assert b'c' < b'a'
# AssertionError
print(b'a' == b'a')
# True
# str str
assert 'c' > 'a'
assert 'c' < 'a'
# AssertionError
print('a' == 'a')
# True
# bytes str
assert b'c' > 'a'
# TypeError: '>' not supported between instances of 'bytes' and 'str'
print('a' == b'a')
# False

In the format string %s
Both types of instances can appear in % The right side of the operator , Used to replace the format string on the left （format string） Inside %s. But if the format string is bytes type , Then it doesn't work str Instance to replace %s, because Python I don't know this str What character set should be encoded .
# bytes % str
print(b'red %s' % 'blue'
# TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str'
# str % bytes 
print('red %s' % b'blue')
# red b'blue'
# @ This will make the system bytes Instance above __repr__ Method . The call result replaces the... In the format string %s, So the program will directly output b'blue', Not the output blue

3.2 You need to use... When manipulating file handles Unicode String manipulation , You cannot use the original bytes

w Mode must be in ‘ Text ’ Mode writing , Otherwise, an error will be reported when writing binary data to the file ：
# Write binary data 
with open('test.txt', "w+") as f:
f.write(b"\xf1\xf2")
# TypeError: write() argument must be str, not bytes

wb Binary data can be written normally
# Write binary data 
with open('test.txt', "wb") as f:
f.write(b"\xf1\xf2")

r Mode must be in ‘ Text ’ Mode writing , Otherwise, an error will be reported when reading binary data from a file ：

# Read binary data 
with open('test.txt', "r+") as f:
f.read()
# UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 0: invalid continuation byte
# @ When manipulating a file handle in text mode , The system will use Default text encoding The scheme deals with binary data . therefore , The above way of writing will let the system pass `bytes.decode` Decode this data into `str` character string , Reuse `str.encode` Encode strings into binary values . But for most systems , The default text encoding scheme is `UTF-8`, So the system is likely to put `b'\xf1\xf2\xf3\xf4\xf5'` As a `UTF-8` Format string to decode , So there's a mistake like that .

rb Binary data can be read normally
# Write binary data 
with open('test.txt', "rb") as f:
print(b"\xf1\xf2" == f.read())
# True
Another modification , Set up encoding Parameter specifies the string encoding ：
with open('test.txt', "r", encoding="cp1252") as f:
print(f.read())

Additional explanation ：