您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

How to use Pythons struct and formatting characters

編輯：Python

Python Of struct And how to use formatting characters

This article introduces “Python Of struct And how to use formatting characters ” Knowledge about , During the operation of the actual case , Many people will encounter such difficulties , Next, let Xiaobian lead you to learn how to deal with these situations ！ I hope you will read carefully , Be able to learn ！

brief introduction

There are two ways to store the contents of a file , One is binary , One is the form of text . If it's stored as text in a file , When reading from a file, you will encounter a problem that converts the text to Python The problem of data type in . In fact, even in the form of text , The stored data is also structured , because Python The bottom layer is made of C To write the , Here we also call it C structure .

Lib/struct.py It's the module responsible for this kind of structural transformation .

struct The method in

Let's take a look at struct The definition of ：

__all__ = [
    # Functions
    'calcsize', 'pack', 'pack_into', 'unpack', 'unpack_from',
    'iter_unpack',
    # Classes
    'Struct',
    # Exceptions
    'error'
    ]

Among them is 6 A way ,1 Exceptions .

Let's mainly look at this 6 The use of two methods ：

Method name effect struct.pack(format, v1, v2, ...) Return to one bytes object , It contains a string based on the format format Packed values v1, v2, ... The number of arguments must exactly match the value required by the format string .struct.pack_into(format, buffer, offset, v1, v2, ...) According to the format string format pack v1, v2, ... And take the packed byte string from offset Write a writable buffer at the beginning buffer . Please note that offset Is a required parameter .struct.unpack(format, buffer) According to the format string format From the buffer buffer Unpack （ Let's suppose it's from pack(format, ...) pack ）. The result returned is a tuple , Even if it contains only one entry . The byte size of the buffer must match the size required by the format .struct.unpack_from(format, /, buffer, offset=0) From the position offset Began to buffer According to the format string format Unpack . The result is a tuple , Even if it contains only one entry .struct.iter_unpack(format, buffer) According to the format string format Iteratively from the buffer buffer Unpack . This function returns an iterator , It will read blocks of the same size from the buffer until its contents are exhausted .struct.calcsize(format) Return and format string format The size of the corresponding structure （ or pack(format, ...) The size of the generated byte string object ）.

These methods are mainly the operations of packing and unpacking , One of the most important parameters is format, Also known as the format string , It specifies the format in which each string is packaged .

Format string

Format strings are the mechanism used to specify the data format when packaging and unpacking data . They are packaged with the specified / Unpacking data type Format characters Build . Besides , There are also special characters to control Byte order , Size and alignment .

Byte order , Size and alignment

By default ,C Types are expressed in the machine's native format and byte order , And align it correctly by padding bytes if necessary （ according to C The rules used by the compiler ）.

We can also manually specify the byte order of the format string , Size and alignment ：

character Byte order size Alignment mode @ By original byte By original byte By original byte = By original byte standard nothing < The small end standard nothing > Big end standard nothing ! The Internet （= Big end ） standard nothing

Big end and small end are two ways of data storage .

The first one is Big Endian Store the high byte in the starting address

The second kind Little Endian Store the byte of status in the starting address

Actually Big Endian More in line with human reading and writing habits , and Little Endian More in line with the machine's reading and writing habits .

At present, the two main trends are CPU Camp ,PowerPC Series adoption big endian How to store data , and x86 The series uses little endian How to store data .

If different CPU Architecture communicates directly , Because of the different reading order, there may be problems .

Padding is only added automatically between consecutive structure members . Padding is not added to the beginning and end of the encoded structure .
When using non primitive byte size and alignment, that is '<', '>', '=', and '!' No padding will be added when .

Format characters

Let's look at the formats of characters ：

Format C type Python type Standard size （ byte ）x Fill bytes nothing cchar The length is 1 The byte string of 1bsigned char Integers 1Bunsigned char Integers 1?_Boolbool1hshort Integers 2Hunsigned short Integers 2iint Integers 4Iunsigned int Integers 4llong Integers 4Lunsigned long Integers 4qlong long Integers 8Qunsigned long long Integers 8nssize_t Integers Nsize_t Integers e(6) Floating point numbers 2ffloat Floating point numbers 4ddouble Floating point numbers 8schar[] Byte string pchar[] Byte string Pvoid * Integers

Format numbers

for instance , For example, we need to pack one int object , We can write this way ：

In [101]: from struct import *
In [102]: pack('i',10)
Out[102]: b'\n\x00\x00\x00'
In [103]: unpack('i',b'\n\x00\x00\x00')
Out[103]: (10,)
  
In [105]: calcsize('i')
Out[105]: 4

In the example above , We packed one int object 10, And then unpack it . And calculated i The length of this format is 4 byte .

You can see that the output is b'\n\x00\x00\x00' , Let's not go into the meaning of this output , At the beginning b It means byte, And then byte The coding .

The format character can be preceded by an integer repeat count . for example , Format string '4h' Meaning and 'hhhh' Exactly the same .

Let's see how to pack 4 individual short type ：

In [106]: pack('4h',2,3,4,5)
Out[106]: b'\x02\x00\x03\x00\x04\x00\x05\x00'
In [107]: unpack('4h',b'\x02\x00\x03\x00\x04\x00\x05\x00')
Out[107]: (2, 3, 4, 5)

White space between formats is ignored , But if it is struct.calcsize Method, there must be no white space in the format character .

When using an integer format ('b', 'B', 'h', 'H', 'i', 'I', 'l', 'L', 'q', 'Q') Packing value x when , If x Outside the valid range of the format, a struct.error.

Format characters

Besides the numbers , The most commonly used are characters and strings .

Let's see how to use format characters first , Because the length of the character is 1 Bytes , We need to do that ：

In [109]: pack('4c',b'a',b'b',b'c',b'd')
Out[109]: b'abcd'
In [110]: unpack('4c',b'abcd')
Out[110]: (b'a', b'b', b'c', b'd')
In [111]: calcsize('4c')
Out[111]: 4

Before the character b, Indicates that this is a character , Otherwise it will be treated as a string .

Format string

Let's look at the format of the string ：

In [114]: pack('4s',b'abcd')
Out[114]: b'abcd'
In [115]: unpack('4s',b'abcd')
Out[115]: (b'abcd',)
In [116]: calcsize('4s')
Out[116]: 4
In [117]: calcsize('s')
Out[117]: 1

You can see that for Strings calcsize Returns the length of the byte .

The effect of filling

The order of format characters can have an impact on size , Because the padding required to meet the alignment requirements is different :

>>> pack('ci', b'*', 0x12131415)
b'*\x00\x00\x00\x12\x13\x14\x15'
>>> pack('ic', 0x12131415, b'*')
b'\x12\x13\x14\x15*'
>>> calcsize('ci')
8
>>> calcsize('ic')
5

In the following example, we will show how to manually affect the fill effect ：

In [120]: pack('llh',1, 2, 3)
Out[120]: b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00'

In the example above , We pack 1,2,3 These three numbers , But the format is different , Namely long,long,short.

because long yes 4 Bytes ,short yes 2 Bytes , So it's essentially misaligned .

If you want to align , We can add... After that 0l Express 0 individual long, This allows manual filling ：

In [118]: pack('llh0l', 1, 2, 3)
Out[118]: b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
In [122]: unpack('llh0l',b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00')
Out[122]: (1, 2, 3)

Complex applications

Finally, let's look at the application of a complex point , This application comes directly from unpack The data is read into the tuple ：

>>> record = b'raymond   \x32\x12\x08\x01\x08'
>>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
>>> from collections import namedtuple
>>> Student = namedtuple('Student', 'name serialnum school gradelevel')
>>> Student._make(unpack('<10sHHb', record))
Student(name=b'raymond   ', serialnum=4658, school=264, gradelevel=8)

“Python Of struct And how to use formatting characters ” That's all for , Thanks for reading . If you want to know more about the industry, you can pay attention to Yisu cloud website , Xiaobian will output more high-quality practical articles for you ！