程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

The difference between Python | bytes and str

編輯:Python

This article mainly introduces in Python in bytes And str The difference between .1
Updated: 2022 / 6 / 16


bytes And str The difference between

  • Reference link

1. Python There are two types that can represent character sequences

  • bytes
    The example shown below contains the raw data , namely 8 The unsigned value of the bit ( Usually according to ASCII code Standard to show )
  • str
    The examples shown below contain Unicode Code points (code point, Also called code points ), These code points correspond to text characters in human language
a = b'h\x6511o'
print(list(a))
# [104, 101, 49, 49, 111]
print(a)
# b'he11o' 
a = 'a\\u300 propos'
print(list(a))
# ['a', '\\', 'u', '3', '0', '0', ' ', 'p', 'r', 'o', 'p', 'o', 's']
print(a)
# a\u300 propos

2.Unicode Data and binary data conversion

  • hold Unicode Data into binary data , Must call str Of encode Method ( code )
FileContent = 'This is file content.'
print(FileContent)
# 'This is file content.'
print(type(FileContent))
# <class 'str'>
FileContent = FileContent.encode(encoding='utf-8')
print(FileContent)
# b'This is file content.'
print(type(FileContent))
# <class 'bytes'>
  • Convert binary data into Unicode data , Must call bytes Of decode Method ( decode )
FileContent = b'This is file content.'
print(FileContent)
# b'This is file content.'
print(type(FileContent))
# <class 'bytes'>
FileContent = FileContent.decode(encoding='utf-8')
print(FileContent)
# 'This is file content.'
print(type(FileContent))
# <class 'str'>

When you call these methods , You can specify the character set encoding , You can also use the system default solution , Usually UTF-8

The default character set encoding of the current operating system ,Python Check the default coding standard of the current operating system with a line of code : stay cmd In the implementation of :

python3 -c 'import locale; print(locale.getpreferredencoding())'
# UTF-8

3. Use the original 8 Bit value and Unicode character string
Use the original 8 Bit value and Unicode Two problems to pay attention to when string ( This problem is equivalent to using bytes and str Two problems that need to be paid attention to ):

  • 3.1bytes and str Are incompatible with each other

Use + The operator

# bytes+bytes
print(b'a' + b'1')
# b'a1'
# str+str
print('b' + '2')
# b2
# bytes+str
print('c' + b'2')
# TypeError: can only concatenate str (not "bytes") to str

Binary operators can also be used to compare sizes between the same types

# bytes bytes
assert b'c' > b'a'
assert b'c' < b'a'
# AssertionError
print(b'a' == b'a')
# True
# str str
assert 'c' > 'a'
assert 'c' < 'a'
# AssertionError
print('a' == 'a')
# True
# bytes str
assert b'c' > 'a'
# TypeError: '>' not supported between instances of 'bytes' and 'str'
print('a' == b'a')
# False

In the format string %s

Both types of instances can appear in % The right side of the operator , Used to replace the format string on the left (format string) Inside %s. But if the format string is bytes type , Then it doesn't work str Instance to replace %s, because Python I don't know this str What character set should be encoded .

# bytes % str
print(b'red %s' % 'blue'
# TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str'
# str % bytes 
print('red %s' % b'blue')
# red b'blue'
# @ This will make the system bytes Instance above __repr__ Method . The call result replaces the... In the format string %s, So the program will directly output b'blue', Not the output blue
  • 3.2 You need to use... When manipulating file handles Unicode String manipulation , You cannot use the original bytes

w Mode must be in ‘ Text ’ Mode writing , Otherwise, an error will be reported when writing binary data to the file :

# Write binary data 
with open('test.txt', "w+") as f:
f.write(b"\xf1\xf2")
# TypeError: write() argument must be str, not bytes

wb Binary data can be written normally

# Write binary data 
with open('test.txt', "wb") as f:
f.write(b"\xf1\xf2")

r Mode must be in ‘ Text ’ Mode writing , Otherwise, an error will be reported when reading binary data from a file :

# Read binary data 
with open('test.txt', "r+") as f:
f.read()
# UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 0: invalid continuation byte
# @ When manipulating a file handle in text mode , The system will use Default text encoding The scheme deals with binary data . therefore , The above way of writing will let the system pass `bytes.decode` Decode this data into `str` character string , Reuse `str.encode` Encode strings into binary values . But for most systems , The default text encoding scheme is `UTF-8`, So the system is likely to put `b'\xf1\xf2\xf3\xf4\xf5'` As a `UTF-8` Format string to decode , So there's a mistake like that .

rb Binary data can be read normally

# Write binary data 
with open('test.txt', "rb") as f:
print(b"\xf1\xf2" == f.read())
# True

Another modification , Set up encoding Parameter specifies the string encoding :

with open('test.txt', "r", encoding="cp1252") as f:
print(f.read())


Additional explanation :




Reference link


  1. Python bytes And str The difference between ︎


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved