程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python Craftsman: doorway of container

編輯:Python

preface

This is a “Python craftsman ” Series No 4 An article

Containers ” These two words are rarely used Python Technical articles mention . At the sight of “ Containers ”, Most people think of the little blue whale :Docker, But this article has nothing to do with it . Containers in this article , yes Python An abstract concept in , It is a general term for the data types specially used to hold other objects .

stay Python in , There are four most common types of built-in containers : list (list) Tuples (tuple) Dictionaries (dict) aggregate (set). By using them individually or in combination , You can do a lot of things efficiently .

Python The internal implementation details of the language itself are also closely related to these container types . such as Python Class instance properties of 、 Global variables globals() And so on are stored by dictionary type .

In this article , I will start with the definition of container types , Try to summarize some of the best practices for everyday coding . Then the special functions provided around each container type , Share some programming tips .

Content catalog

1) Bottom view container
- Avoid frequently expanding the list / Create a new list
- Use in scenarios with multiple operations at the top of the list deque modular
- Use set / Dictionary to determine whether members exist
- Write faster code
2) High level viewing container
- Container oriented interface programming
- Write more extensible code
3) Common skills
- Use tuples to improve branching code
- Use dynamic unpacking in more places
- Better not “ Get permission ”, No need “ Ask for forgiveness ”
- Use next() function
- Use an ordered dictionary to eliminate duplication
4) Common misconceptions about
- Watch out for the exhausted iterators
- Don't modify the iterated object in the loop body
5) summary
6) Other articles in the series
7) annotation 

When we talk about containers , What are we talking about ?

I gave it to “ Containers ” A simple definition : The container is used to hold other objects . But this definition is too broad , It doesn't have any guiding value for our daily programming . To really master Python The container in , We need to start from two levels :

  • Underlying implementation : What data structure does the built-in container type use ? How an operation works ?
  • High level abstraction : What determines whether an object is a container ? Which behaviors define the container ?

below , Let's stand on these two different levels , Re understand the container .


Bottom view container

Python It's a high-level programming language , The type of built-in container it provides , It's all the result of a high degree of encapsulation and abstraction . and “ Linked list ”、“ Red and black trees ”、“ Hashtable ” Compared to these names , all Python The name of the built-in type , They only describe the functional features of this type , Other people can't understand even the slightest detail of these names .

This is a Python One of the advantages of programming languages . comparison C Programming languages that are closer to the bottom of the computer ,Python Redesigned and implemented a more programmer friendly built-in container type , Shielding out memory management and other extra work . Provides us with a better development experience .

But if this is Python The advantages of language , Why should we bother to understand the implementation details of container types ? The answer is : Attention to detail can help us write faster code .

Write faster code

1. Avoid frequently expanding the list / Create a new list

All built-in container types do not limit capacity . If you will , You can keep pushing increasing numbers into an empty list , Finally, the memory of the whole machine will burst .

stay Python Language implementation details , The memory of the list is allocated on demand notes 1, When a list currently has insufficient memory , The memory expansion logic will be triggered . Allocating memory is an expensive operation . Although in most cases , It will not have any serious impact on the performance of your program . But when you deal with a very large amount of data , It is easy for memory allocation to drag down the performance of the entire program .

not so bad ,Python I have long been aware of this problem , It also provides official guidelines for problem solving , That's it :“ Become lazy ”.

How to explain “ Become lazy ”?range() The evolution of functions is a very good example .

stay Python 2 in , If you call range(100000000), It takes several seconds to get the result , Because it needs to return a huge list , Spent a lot of time on memory allocation and calculation . But in Python 3 in , The same call will get the result right away . Because the function no longer returns a list , Instead, a type is range The lazy object of , Only when you iterate it 、 Or slice it , It will return the real number to you .

So , To improve performance , Built-in functions range “ Become lazy ” 了 . In order to avoid too frequent memory allocation , In everyday coding , Our functions also need to be lazy , This includes :

  • More use yield keyword , Returns the generator object
  • Try to use builder expressions instead of list derived expressions
    • Generator Expressions :(i for i in range(100))
    • List derived expressions :i for i in range(100)
  • Try to use lazy objects provided by the module :
    • Use re.finditer replace re.findall
    • Use iteratable file objects directly : for line in fp, instead of for line in fp.readlines()

2. Use in scenarios with multiple operations at the top of the list deque modular

Lists are based on array structures (Array) Realized , When you insert a new member at the head of the list (list.insert(0, item)) when , All other members behind it need to be moved , The time complexity of the operation is O(n). This results in inserting members at the head of the list rather than appending them at the end (list.append(item) The time complexity is O(1)) slower .

If your code needs to do this many times , Please consider using collections.deque Type to replace the list . because deque It is based on double ended queue , Whether you append elements to the head or tail , The time complexity is zero O(1).

3. Use set / Dictionary to determine whether members exist

When you need to determine whether a member exists in a container , Sets are more appropriate than lists . because item in [...] The time complexity of the operation is O(n), and item in {...} The time complexity of is O(1). This is because dictionaries and collections are based on hash tables (Hash Table) Data structure .

# This example is not particularly appropriate , Because when the target set is especially small , Using sets or lists has little impact on efficiency
# But that's not the point :)
VALID_NAMES = ["piglei", "raymond", "bojack", "caroline"]
# Conversion to collection type is used for member judgment
VALID_NAMES_SET = set(VALID_NAMES)
def validate_name(name):
if name not in VALID_NAMES_SET:
# Here we use Python 3.6 Added f-strings characteristic
raise ValueError(f"{name} is not a valid name!")

Hint: It is highly recommended to read TimeComplexity - Python Wiki, Learn more about the time complexity of common container types .

If you are interested in the implementation details of the dictionary , It is also highly recommended to watch Raymond Hettinger Speech Modern Dictionaries(YouTube)


High level viewing container

Python It's a door “ The duck type ” Language :“ When you see a bird walk like a duck 、 Swimming like a duck 、 It also sounds like a duck , Then this bird can be called a duck .” therefore , When we say what type an object is , It basically means : This object satisfies the specific interface specification for that type , Can be used as this type . For all built-in container types , The same is true of .

Open up collections Under the module of abc(“ abstract class Abstract Base Classes” An acronym for ) Sub module , All container related interfaces can be found ( abstract class )[ notes 2] Definition . Let's take a look at the interfaces that the built-in container types meet :

  • list (list): Satisfy Iterable、Sequence、MutableSequence Such as the interface
  • Tuples (tuple): Satisfy Iterable、Sequence
  • Dictionaries (dict): Satisfy Iterable、Mapping、MutableMapping [ notes 3]
  • aggregate (set): Satisfy Iterable、Set、MutableSet [ notes 4]

Each built-in container type , In fact, it is a composite entity that satisfies multiple interface definitions . For example, all container types meet “ Can be iterated ”(Iterable) This interface , This means that they are “ Can be iterated ” Of . But the other way around , Not all “ Can be iterated ” All objects are containers . Just like the string can be iterated , But we don't usually think of it as “ Containers ” To look at .

Knowing this fact , We will be in Python One of the most important principles of object-oriented programming : Programming for interfaces rather than concrete implementations .

Let's take an example , See how to understand Python Inside “ Interface oriented programming ”.

Write more extensible code

One day , We received a demand : There is a list , It contains a lot of user comments , For normal display on the page , All comments longer than a certain length need to be replaced by ellipsis .

This demand is easy to do , Soon we wrote the first version of the code :

# notes : To enhance the illustrative nature of the sample code , Some code snippets in this article use Python 3.5
# Version added Type Hinting characteristic
def add_ellipsis(comments: typing.List[str], max_length: int = 12):
""" If the list of comments exceeds max_length, The remaining characters are replaced by ellipsis
"""
index = 0
for comment in comments:
comment = comment.strip()
if len(comment) > max_length:
comments[index] = comment[:max_length] + '...'
index += 1
return comments
comments = [
"Implementation note",
"Changed",
"ABC for generator",
]
print("\n".join(add_ellipsis(comments)))
# OUTPUT:
# Implementati...
# Changed
# ABC for gene...

In the code above ,add_ellipsis The function takes a list as an argument , And then traverse it , Replace the members that need to be modified . All this seems reasonable , Because the original demand we received was :“ There is one list , Inside ...”. But if one day , The comments we get are no longer kept on the list , But in immutable tuples ?

In that case , The existing function design will force us to write add_ellipsis(list(comments)) This kind of slow and ugly code .

Container oriented interface programming

We need to improve the function to avoid this problem . because add_ellipsis Function strongly depends on the list type , So when the parameter type becomes tuple , The current function is no longer applicable ( reason : to comments[index] The assignment will throw TypeError abnormal ). How to improve the design of this part ? The secret is : Let functional dependency “ Iteratable object ” This abstract concept , Instead of entity list types .

Use generator properties , The function can be changed to :

def add_ellipsis_gen(comments: typing.Iterable[str], max_length: int = 12):
""" If the content in the iteratable comment exceeds max_length, The remaining characters are replaced by ellipsis
"""
for comment in comments:
comment = comment.strip()
if len(comment) > max_length:
yield comment[:max_length] + '...'
else:
yield comment
print("\n".join(add_ellipsis_gen(comments)))

In the new function , We changed the dependent parameter type from list to iterative abstract class . There are many advantages to doing so , One of the most obvious is : Whether the comment is from a list 、 Tuples or a file , New functions can be easily satisfied :

# Handle comments placed in tuples
comments = ("Implementation note", "Changed", "ABC for generator")
print("\n".join(add_ellipsis_gen(comments)))
# Deal with comments placed in documents
with open("comments") as fp:
for comment in add_ellipsis_gen(fp):
print(comment)

After changing the dependency from a specific container type to an abstract interface , Functions have become more widely applicable . besides , The new function also has more advantages in terms of execution efficiency . Now let's go back to the previous question . From the top , What defines a container

The answer is : The interface protocol implemented by each container type defines the container . Different container types in our eyes , Should be Can we iterate Is it possible to modify Is there any length And so on . We need to write the relevant code , Pay more attention to the abstract properties of the container , Not the container type itself , This will help us write more elegant 、 More extensible code .

Hint: stay itertools In the built-in module, you can find more treasures about dealing with iteratable objects .


Common skills

Use tuples to improve branching code

Sometimes , There will be more than three branches in our code if/else . It looks like this :

import time
def from_now(ts):
""" Receive a past timestamp , Returns a text description of the relative time from the current time
"""
now = time.time()
seconds_delta = int(now - ts)
if seconds_delta < 1:
return "less than 1 second ago"
elif seconds_delta < 60:
return "{} seconds ago".format(seconds_delta)
elif seconds_delta < 3600:
return "{} minutes ago".format(seconds_delta // 60)
elif seconds_delta < 3600 * 24:
return "{} hours ago".format(seconds_delta // 3600)
else:
return "{} days ago".format(seconds_delta // (3600 * 24))
now = time.time()
print(from_now(now))
print(from_now(now - 24))
print(from_now(now - 600))
print(from_now(now - 7500))
print(from_now(now - 87500))
# OUTPUT:
# less than 1 second ago
# 24 seconds ago
# 10 minutes ago
# 2 hours ago
# 1 days ago

The above function can't find too many problems , Many, many people will write similar code . however , If you look at it carefully , You can find some obvious in the branch code section “ The border ”. such as , When the function determines whether a certain time should be used “ Number of seconds ” Display time , Yes 60. And to judge whether it should take minutes , Yes 3600.

Extracting rules from boundaries is the key to optimizing this code . If we put all these boundaries in an ordered tuple , Then cooperate with the binary search module bisect. The control flow of the entire function can be greatly simplified :

import bisect
# BREAKPOINTS It must be sequenced , Otherwise, binary search cannot be performed
BREAKPOINTS = (1, 60, 3600, 3600 * 24)
TMPLS = (
# unit, template
(1, "less than 1 second ago"),
(1, "{units} seconds ago"),
(60, "{units} minutes ago"),
(3600, "{units} hours ago"),
(3600 * 24, "{units} days ago"),
)
def from_now(ts):
""" Receive a past timestamp , Returns a text description of the relative time from the current time
"""
seconds_delta = int(time.time() - ts)
unit, tmpl = TMPLS[bisect.bisect(BREAKPOINTS, seconds_delta)]
return tmpl.format(units=seconds_delta // unit)

In addition to using tuples, you can optimize too many if/else Outside the branch , In some cases dictionaries can be used to do the same thing . The key is to find repetitive logic and laws from existing code , And try more .

Use dynamic unpacking in more places

Dynamic unpacking refers to the use of * or ** Operator will iterate over the object “ Untie ” act , stay Python 2 Time , This operation can only be used in the function parameters section , And there are very strict requirements for the order and quantity of occurrence , The usage scenario is very simple .

def calc(a, b, multiplier=1):
return (a + b) * multiplier
# Python2 Only dynamic unpacking in function parameters is supported in
print calc(*[1, 2], **{"multiplier": 10})
# OUTPUT: 30

however ,Python 3 In especial 3.5 After version ,* and ** The usage scenario of is greatly expanded . for instance , stay Python 2 in , If we need to merge two dictionaries , It needs to be done :

def merge_dict(d1, d2):
# Because dictionaries are modifiable objects , To avoid modifying the original object , You need to copy one here d1 The shallow copy
result = d1.copy()
result.update(d2)
return result
user = merge_dict({"name": "piglei"}, {"movies": ["Fight Club"]})

But in Python 3.5 Later versions , You can use it directly ** Operator to quickly complete the dictionary merging operation :

user = {**{"name": "piglei"}, **{"movies": ["Fight Club"]}}

besides , You can also use in normal assignment statements * Operator to dynamically unpack iteratible objects . If you want to know more about it , You can read the following recommendations PEP.

Hint: Two methods to promote dynamic unpacking scenario expansion PEP:PEP 3132 -- Extended Iterable Unpacking | Python.orgPEP 448 -- Additional Unpacking Generalizations | Python.org

Better not “ Get permission ”, No need “ Ask for forgiveness ”

This subtitle may be a little confusing , Let me explain briefly :“ Get permission ” And “ Ask for forgiveness ” There are two different programming styles . If you use a classic requirement :“ Count the number of occurrences of each element in the list ” As an example , Two different styles of code would look like this :

# AF: Ask for Forgiveness
# To do do , If an exception is thrown , Then handle the exception
def counter_af(l):
result = {}
for key in l:
try:
result[key] += 1
except KeyError:
result[key] = 1
return result
# AP: Ask for Permission
# Before you do it , First ask if you can do it , You can do it again
def counter_ap(l):
result = {}
for key in l:
if key in result:
result[key] += 1
else:
result[key] = 1
return result

Whole Python Community to the first Ask for Forgiveness There is a clear preference for the exception - trapping programming style of . There are many reasons for this , First , stay Python Throwing an exception in is a very lightweight operation . secondly , The first method is also better than the second in performance , Because it doesn't have to do an extra member check every time the loop runs .

however , The two pieces of code in the example are very rare in the real world . Why? ? Because if you want to count the times , Direct use collections.defaultdict That's all right. :

from collections import defaultdict
def counter_by_collections(l):
result = defaultdict(int)
for key in l:
result[key] += 1
return result

Such code does not use either “ Get permission ”, No need “ Ask for forgiveness ”. The control flow of the whole code is more clear and natural . therefore , If possible , Please try your best to omit those Non core Exception capture logic for . Some tips :

  • When working with dictionary members : Use collections.defaultdict type
    • Or use dict[key] = dict.setdefault(key, 0) + 1 Built-in functions
  • If you remove a dictionary member , Don't care if it exists :
    • call pop Function to set the default value , such as dict.pop(key, None)
  • Specify the default value when the dictionary gets members :dict.get(key, default_value)
  • A non-existent slice access to the list does not throw IndexError abnormal :["foo"][100:200]

Use next() function

next() Is a very practical built-in function , It takes an iterator as a parameter , Then return the next element of the iterator . Use it in conjunction with generator expressions , Can efficiently achieve “ Find the first member in the list that meets the condition ” Such needs .

numbers = [3, 7, 8, 2, 21]
# Get and ** Return immediately ** The first even number in the list
print(next(i for i in numbers if i % 2 == 0))
# OUTPUT: 8

Use an ordered dictionary to eliminate duplication

The structural characteristics of dictionaries and collections ensure that their members do not repeat , So they are often used to remove weight . however , The order of the original list will be lost after using both of them . This is made up of the underlying data structure “ Hashtable (Hash Table)” Determined by the characteristics of .

>>> l = [10, 2, 3, 21, 10, 3]
# De duplication but lost order
>>> set(l)
{3, 10, 2, 21}

What if the weight needs to be removed and the order must be preserved ? We can use collections.OrderedDict modular :

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(l).keys())
[10, 2, 3, 21]

Hint: stay Python 3.6 in , The default dictionary type modifies the implementation , It has become orderly . And in Python 3.7 in , This function has been changed from Details of language implementation It became Dependable formal language features . But I think it makes the whole Python It will take some time for the community to get used to this , Now, after all, “ The dictionary is out of order ” It is still printed in countless books Python Book . therefore , I still recommend that you use it wherever you need an organized dictionary OrderedDict.


Common misconceptions about

Watch out for the exhausted iterators

In front of the article , We mentioned the use of “ lazy ” The benefits of generators . however , All things have two sides . One of the biggest drawbacks of generators is : It will dry up . When you have traversed them completely , After repeated traversal, you can't get any new content

numbers = [1, 2, 3]
numbers = (i * 2 for i in numbers)
# The first cycle will output 2, 4, 6
for number in numbers:
print(number)
# This loop will not output anything , Because iterators have dried up
for number in numbers:
print(number)

And it's not just generator expressions ,Python 3 Inside map、filter Built in functions have the same characteristics . Ignoring this feature can easily lead to some imperceptible problems in the code Bug.

Instagram Just before the project starts Python 2 To Python 3 Encountered this problem during the migration of . They are PyCon 2017 Share the story of how to deal with this problem . Visit article Instagram stay PyCon 2017 Summary of your speech , Search for “ iterator ” Can view details .

Don't modify the iterated object in the loop body

This is a lot Python Mistakes beginners make . such as , We need a function to delete all even numbers from the list :

def remove_even(numbers):
""" Remove all even numbers from the list
"""
for i, number in enumerate(numbers):
if number % 2 == 0:
# Code in question
del numbers[i]
numbers = [1, 2, 7, 4, 8, 11]
remove_even(numbers)
print(numbers)
# OUTPUT: [1, 7, 8, 11]

Notice the extra one in the result “8” Did you? ? When you go through a list and modify it , This will happen . Because the object being iterated numbers Modified during the loop . The subscript of traversal is increasing , The length of the list itself is also shrinking . This will cause some members in the list to not be traversed at all .

So for this kind of operation , Please use a new empty list to save the results , Or make use of yield Return to a generator . Instead of modifying the iterated list or the dictionary object itself .


summary

In this article , We start with “ Container type ” Starting from the definition of , The container types are discussed at the bottom and top levels . Then follow the tradition of the series , Provides some tips for writing container related code .

Let's finally summarize the main points :

  • Understand the underlying implementation of container types , Can help you write better code
  • Refine abstract concepts in requirements , Interface oriented rather than implementation programming
  • More use “ lazy ” The object of , Less generation “ urgent ” A list of
  • Using tuples and dictionaries can simplify the branch code structure
  • Use next() Functions with iterators can accomplish many things efficiently , But also need to pay attention to “ exhausted ” problem
  • collections、itertools There are many useful tools in the module , Go and have a look !

After reading the article, you , What do you want to make complaints about? ? Please leave a message or at project Github Issues Tell me .


annotation

1) Python This language has nothing but CPython Outside , There are many other versions that implement . If there is no special instruction , This article and “Python craftsman ” All that appears in the series Python All in particular Python Of C Language implementation CPython

2) Python There is no such thing as in other programming languages “Interface Interface ” type , Only similar “ abstract class ” Concept . For the convenience of expression , The following contents are used uniformly “ Interface ” To replace “ abstract class ”.

3) Have you only realized Mapping But it's not MutableMapping The type of ? try MappingProxyType({})

4) Have you only realized Set But it's not MutableSet The type of ? try frozenset()


Other articles in the series

Python craftsman : Make good use of variables to improve code quality

Python craftsman : Skills in writing conditional branch code

Python craftsman : Tips for using strings and numbers


Blue whale wisdom cloud

This article is edited and released by Tencent blue whale Zhiyun , Tencent blue whale Zhiyun ( Short for blue whale ) The software system is a set of systems based on PaaS Technology solutions for , Committed to building an industry-leading one-stop automatic operation and maintenance platform . At present, the community version has been launched 、 Enterprise Edition , Welcome to experience .

  • Official website :https://bk.tencent.com/
  • Download link :https://bk.tencent.com/download/
  • Community :https://bk.tencent.com/s-mart/community/question

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved