This is a “Python craftsman ” Series No 4 An article
Containers ” These two words are rarely used Python Technical articles mention . At the sight of “ Containers ”, Most people think of the little blue whale :Docker, But this article has nothing to do with it . Containers in this article , yes Python An abstract concept in , It is a general term for the data types specially used to hold other objects .
stay Python in , There are four most common types of built-in containers : list (list)、 Tuples (tuple)、 Dictionaries (dict)、 aggregate (set). By using them individually or in combination , You can do a lot of things efficiently .
Python The internal implementation details of the language itself are also closely related to these container types . such as Python Class instance properties of 、 Global variables globals() And so on are stored by dictionary type .
In this article , I will start with the definition of container types , Try to summarize some of the best practices for everyday coding . Then the special functions provided around each container type , Share some programming tips .
1) Bottom view container
- Avoid frequently expanding the list / Create a new list
- Use in scenarios with multiple operations at the top of the list deque modular
- Use set / Dictionary to determine whether members exist
- Write faster code
2) High level viewing container
- Container oriented interface programming
- Write more extensible code
3) Common skills
- Use tuples to improve branching code
- Use dynamic unpacking in more places
- Better not “ Get permission ”, No need “ Ask for forgiveness ”
- Use next() function
- Use an ordered dictionary to eliminate duplication
4) Common misconceptions about
- Watch out for the exhausted iterators
- Don't modify the iterated object in the loop body
5) summary
6) Other articles in the series
7) annotation I gave it to “ Containers ” A simple definition : The container is used to hold other objects . But this definition is too broad , It doesn't have any guiding value for our daily programming . To really master Python The container in , We need to start from two levels :
below , Let's stand on these two different levels , Re understand the container .
Python It's a high-level programming language , The type of built-in container it provides , It's all the result of a high degree of encapsulation and abstraction . and “ Linked list ”、“ Red and black trees ”、“ Hashtable ” Compared to these names , all Python The name of the built-in type , They only describe the functional features of this type , Other people can't understand even the slightest detail of these names .
This is a Python One of the advantages of programming languages . comparison C Programming languages that are closer to the bottom of the computer ,Python Redesigned and implemented a more programmer friendly built-in container type , Shielding out memory management and other extra work . Provides us with a better development experience .
But if this is Python The advantages of language , Why should we bother to understand the implementation details of container types ? The answer is : Attention to detail can help us write faster code .
1. Avoid frequently expanding the list / Create a new list
All built-in container types do not limit capacity . If you will , You can keep pushing increasing numbers into an empty list , Finally, the memory of the whole machine will burst .
stay Python Language implementation details , The memory of the list is allocated on demand notes 1, When a list currently has insufficient memory , The memory expansion logic will be triggered . Allocating memory is an expensive operation . Although in most cases , It will not have any serious impact on the performance of your program . But when you deal with a very large amount of data , It is easy for memory allocation to drag down the performance of the entire program .
not so bad ,Python I have long been aware of this problem , It also provides official guidelines for problem solving , That's it :“ Become lazy ”.
How to explain “ Become lazy ”?range() The evolution of functions is a very good example .
stay Python 2 in , If you call range(100000000), It takes several seconds to get the result , Because it needs to return a huge list , Spent a lot of time on memory allocation and calculation . But in Python 3 in , The same call will get the result right away . Because the function no longer returns a list , Instead, a type is range The lazy object of , Only when you iterate it 、 Or slice it , It will return the real number to you .
So , To improve performance , Built-in functions range “ Become lazy ” 了 . In order to avoid too frequent memory allocation , In everyday coding , Our functions also need to be lazy , This includes :
2. Use in scenarios with multiple operations at the top of the list deque modular
Lists are based on array structures (Array) Realized , When you insert a new member at the head of the list (list.insert(0, item)) when , All other members behind it need to be moved , The time complexity of the operation is O(n). This results in inserting members at the head of the list rather than appending them at the end (list.append(item) The time complexity is O(1)) slower .
If your code needs to do this many times , Please consider using collections.deque Type to replace the list . because deque It is based on double ended queue , Whether you append elements to the head or tail , The time complexity is zero O(1).
3. Use set / Dictionary to determine whether members exist
When you need to determine whether a member exists in a container , Sets are more appropriate than lists . because item in [...] The time complexity of the operation is O(n), and item in {...} The time complexity of is O(1). This is because dictionaries and collections are based on hash tables (Hash Table) Data structure .
# This example is not particularly appropriate , Because when the target set is especially small , Using sets or lists has little impact on efficiency
# But that's not the point :)
VALID_NAMES = ["piglei", "raymond", "bojack", "caroline"]
# Conversion to collection type is used for member judgment
VALID_NAMES_SET = set(VALID_NAMES)
def validate_name(name):
if name not in VALID_NAMES_SET:
# Here we use Python 3.6 Added f-strings characteristic
raise ValueError(f"{name} is not a valid name!")Hint: It is highly recommended to read TimeComplexity - Python Wiki, Learn more about the time complexity of common container types .
If you are interested in the implementation details of the dictionary , It is also highly recommended to watch Raymond Hettinger Speech Modern Dictionaries(YouTube)
Python It's a door “ The duck type ” Language :“ When you see a bird walk like a duck 、 Swimming like a duck 、 It also sounds like a duck , Then this bird can be called a duck .” therefore , When we say what type an object is , It basically means : This object satisfies the specific interface specification for that type , Can be used as this type . For all built-in container types , The same is true of .
Open up collections Under the module of abc(“ abstract class Abstract Base Classes” An acronym for ) Sub module , All container related interfaces can be found ( abstract class )[ notes 2] Definition . Let's take a look at the interfaces that the built-in container types meet :
Each built-in container type , In fact, it is a composite entity that satisfies multiple interface definitions . For example, all container types meet “ Can be iterated ”(Iterable) This interface , This means that they are “ Can be iterated ” Of . But the other way around , Not all “ Can be iterated ” All objects are containers . Just like the string can be iterated , But we don't usually think of it as “ Containers ” To look at .
Knowing this fact , We will be in Python One of the most important principles of object-oriented programming : Programming for interfaces rather than concrete implementations .
Let's take an example , See how to understand Python Inside “ Interface oriented programming ”.
One day , We received a demand : There is a list , It contains a lot of user comments , For normal display on the page , All comments longer than a certain length need to be replaced by ellipsis .
This demand is easy to do , Soon we wrote the first version of the code :
# notes : To enhance the illustrative nature of the sample code , Some code snippets in this article use Python 3.5
# Version added Type Hinting characteristic
def add_ellipsis(comments: typing.List[str], max_length: int = 12):
""" If the list of comments exceeds max_length, The remaining characters are replaced by ellipsis
"""
index = 0
for comment in comments:
comment = comment.strip()
if len(comment) > max_length:
comments[index] = comment[:max_length] + '...'
index += 1
return comments
comments = [
"Implementation note",
"Changed",
"ABC for generator",
]
print("\n".join(add_ellipsis(comments)))
# OUTPUT:
# Implementati...
# Changed
# ABC for gene... In the code above ,add_ellipsis The function takes a list as an argument , And then traverse it , Replace the members that need to be modified . All this seems reasonable , Because the original demand we received was :“ There is one list , Inside ...”. But if one day , The comments we get are no longer kept on the list , But in immutable tuples ?
In that case , The existing function design will force us to write add_ellipsis(list(comments)) This kind of slow and ugly code .
We need to improve the function to avoid this problem . because add_ellipsis Function strongly depends on the list type , So when the parameter type becomes tuple , The current function is no longer applicable ( reason : to comments[index] The assignment will throw TypeError abnormal ). How to improve the design of this part ? The secret is : Let functional dependency “ Iteratable object ” This abstract concept , Instead of entity list types .
Use generator properties , The function can be changed to :
def add_ellipsis_gen(comments: typing.Iterable[str], max_length: int = 12):
""" If the content in the iteratable comment exceeds max_length, The remaining characters are replaced by ellipsis
"""
for comment in comments:
comment = comment.strip()
if len(comment) > max_length:
yield comment[:max_length] + '...'
else:
yield comment
print("\n".join(add_ellipsis_gen(comments)))In the new function , We changed the dependent parameter type from list to iterative abstract class . There are many advantages to doing so , One of the most obvious is : Whether the comment is from a list 、 Tuples or a file , New functions can be easily satisfied :
# Handle comments placed in tuples
comments = ("Implementation note", "Changed", "ABC for generator")
print("\n".join(add_ellipsis_gen(comments)))
# Deal with comments placed in documents
with open("comments") as fp:
for comment in add_ellipsis_gen(fp):
print(comment)After changing the dependency from a specific container type to an abstract interface , Functions have become more widely applicable . besides , The new function also has more advantages in terms of execution efficiency . Now let's go back to the previous question . From the top , What defines a container ?
The answer is : The interface protocol implemented by each container type defines the container . Different container types in our eyes , Should be Can we iterate 、 Is it possible to modify 、 Is there any length And so on . We need to write the relevant code , Pay more attention to the abstract properties of the container , Not the container type itself , This will help us write more elegant 、 More extensible code .
Hint: stay itertools In the built-in module, you can find more treasures about dealing with iteratable objects .
Sometimes , There will be more than three branches in our code if/else . It looks like this :
import time
def from_now(ts):
""" Receive a past timestamp , Returns a text description of the relative time from the current time
"""
now = time.time()
seconds_delta = int(now - ts)
if seconds_delta < 1:
return "less than 1 second ago"
elif seconds_delta < 60:
return "{} seconds ago".format(seconds_delta)
elif seconds_delta < 3600:
return "{} minutes ago".format(seconds_delta // 60)
elif seconds_delta < 3600 * 24:
return "{} hours ago".format(seconds_delta // 3600)
else:
return "{} days ago".format(seconds_delta // (3600 * 24))
now = time.time()
print(from_now(now))
print(from_now(now - 24))
print(from_now(now - 600))
print(from_now(now - 7500))
print(from_now(now - 87500))
# OUTPUT:
# less than 1 second ago
# 24 seconds ago
# 10 minutes ago
# 2 hours ago
# 1 days ago The above function can't find too many problems , Many, many people will write similar code . however , If you look at it carefully , You can find some obvious in the branch code section “ The border ”. such as , When the function determines whether a certain time should be used “ Number of seconds ” Display time , Yes 60. And to judge whether it should take minutes , Yes 3600.
Extracting rules from boundaries is the key to optimizing this code . If we put all these boundaries in an ordered tuple , Then cooperate with the binary search module bisect. The control flow of the entire function can be greatly simplified :
import bisect
# BREAKPOINTS It must be sequenced , Otherwise, binary search cannot be performed
BREAKPOINTS = (1, 60, 3600, 3600 * 24)
TMPLS = (
# unit, template
(1, "less than 1 second ago"),
(1, "{units} seconds ago"),
(60, "{units} minutes ago"),
(3600, "{units} hours ago"),
(3600 * 24, "{units} days ago"),
)
def from_now(ts):
""" Receive a past timestamp , Returns a text description of the relative time from the current time
"""
seconds_delta = int(time.time() - ts)
unit, tmpl = TMPLS[bisect.bisect(BREAKPOINTS, seconds_delta)]
return tmpl.format(units=seconds_delta // unit) In addition to using tuples, you can optimize too many if/else Outside the branch , In some cases dictionaries can be used to do the same thing . The key is to find repetitive logic and laws from existing code , And try more .
Dynamic unpacking refers to the use of * or ** Operator will iterate over the object “ Untie ” act , stay Python 2 Time , This operation can only be used in the function parameters section , And there are very strict requirements for the order and quantity of occurrence , The usage scenario is very simple .
def calc(a, b, multiplier=1):
return (a + b) * multiplier
# Python2 Only dynamic unpacking in function parameters is supported in
print calc(*[1, 2], **{"multiplier": 10})
# OUTPUT: 30 however ,Python 3 In especial 3.5 After version ,* and ** The usage scenario of is greatly expanded . for instance , stay Python 2 in , If we need to merge two dictionaries , It needs to be done :
def merge_dict(d1, d2):
# Because dictionaries are modifiable objects , To avoid modifying the original object , You need to copy one here d1 The shallow copy
result = d1.copy()
result.update(d2)
return result
user = merge_dict({"name": "piglei"}, {"movies": ["Fight Club"]})But in Python 3.5 Later versions , You can use it directly ** Operator to quickly complete the dictionary merging operation :
user = {**{"name": "piglei"}, **{"movies": ["Fight Club"]}} besides , You can also use in normal assignment statements * Operator to dynamically unpack iteratible objects . If you want to know more about it , You can read the following recommendations PEP.
Hint: Two methods to promote dynamic unpacking scenario expansion PEP:PEP 3132 -- Extended Iterable Unpacking | Python.orgPEP 448 -- Additional Unpacking Generalizations | Python.org
This subtitle may be a little confusing , Let me explain briefly :“ Get permission ” And “ Ask for forgiveness ” There are two different programming styles . If you use a classic requirement :“ Count the number of occurrences of each element in the list ” As an example , Two different styles of code would look like this :
# AF: Ask for Forgiveness
# To do do , If an exception is thrown , Then handle the exception
def counter_af(l):
result = {}
for key in l:
try:
result[key] += 1
except KeyError:
result[key] = 1
return result
# AP: Ask for Permission
# Before you do it , First ask if you can do it , You can do it again
def counter_ap(l):
result = {}
for key in l:
if key in result:
result[key] += 1
else:
result[key] = 1
return result Whole Python Community to the first Ask for Forgiveness There is a clear preference for the exception - trapping programming style of . There are many reasons for this , First , stay Python Throwing an exception in is a very lightweight operation . secondly , The first method is also better than the second in performance , Because it doesn't have to do an extra member check every time the loop runs .
however , The two pieces of code in the example are very rare in the real world . Why? ? Because if you want to count the times , Direct use collections.defaultdict That's all right. :
from collections import defaultdict
def counter_by_collections(l):
result = defaultdict(int)
for key in l:
result[key] += 1
return resultSuch code does not use either “ Get permission ”, No need “ Ask for forgiveness ”. The control flow of the whole code is more clear and natural . therefore , If possible , Please try your best to omit those Non core Exception capture logic for . Some tips :
collections.defaultdict type dict[key] = dict.setdefault(key, 0) + 1 Built-in functions dict.pop(key, None)dict.get(key, default_value)IndexError abnormal :["foo"][100:200]next() Is a very practical built-in function , It takes an iterator as a parameter , Then return the next element of the iterator . Use it in conjunction with generator expressions , Can efficiently achieve “ Find the first member in the list that meets the condition ” Such needs .
numbers = [3, 7, 8, 2, 21]
# Get and ** Return immediately ** The first even number in the list
print(next(i for i in numbers if i % 2 == 0))
# OUTPUT: 8The structural characteristics of dictionaries and collections ensure that their members do not repeat , So they are often used to remove weight . however , The order of the original list will be lost after using both of them . This is made up of the underlying data structure “ Hashtable (Hash Table)” Determined by the characteristics of .
>>> l = [10, 2, 3, 21, 10, 3]
# De duplication but lost order
>>> set(l)
{3, 10, 2, 21} What if the weight needs to be removed and the order must be preserved ? We can use collections.OrderedDict modular :
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(l).keys())
[10, 2, 3, 21]Hint: stay Python 3.6 in , The default dictionary type modifies the implementation , It has become orderly . And in Python 3.7 in , This function has been changed from Details of language implementation It became Dependable formal language features . But I think it makes the whole Python It will take some time for the community to get used to this , Now, after all, “ The dictionary is out of order ” It is still printed in countless books Python Book . therefore , I still recommend that you use it wherever you need an organized dictionary OrderedDict.
In front of the article , We mentioned the use of “ lazy ” The benefits of generators . however , All things have two sides . One of the biggest drawbacks of generators is : It will dry up . When you have traversed them completely , After repeated traversal, you can't get any new content
numbers = [1, 2, 3]
numbers = (i * 2 for i in numbers)
# The first cycle will output 2, 4, 6
for number in numbers:
print(number)
# This loop will not output anything , Because iterators have dried up
for number in numbers:
print(number)And it's not just generator expressions ,Python 3 Inside map、filter Built in functions have the same characteristics . Ignoring this feature can easily lead to some imperceptible problems in the code Bug.
Instagram Just before the project starts Python 2 To Python 3 Encountered this problem during the migration of . They are PyCon 2017 Share the story of how to deal with this problem . Visit article Instagram stay PyCon 2017 Summary of your speech , Search for “ iterator ” Can view details .
This is a lot Python Mistakes beginners make . such as , We need a function to delete all even numbers from the list :
def remove_even(numbers):
""" Remove all even numbers from the list
"""
for i, number in enumerate(numbers):
if number % 2 == 0:
# Code in question
del numbers[i]
numbers = [1, 2, 7, 4, 8, 11]
remove_even(numbers)
print(numbers)
# OUTPUT: [1, 7, 8, 11] Notice the extra one in the result “8” Did you? ? When you go through a list and modify it , This will happen . Because the object being iterated numbers Modified during the loop . The subscript of traversal is increasing , The length of the list itself is also shrinking . This will cause some members in the list to not be traversed at all .
So for this kind of operation , Please use a new empty list to save the results , Or make use of yield Return to a generator . Instead of modifying the iterated list or the dictionary object itself .
In this article , We start with “ Container type ” Starting from the definition of , The container types are discussed at the bottom and top levels . Then follow the tradition of the series , Provides some tips for writing container related code .
Let's finally summarize the main points :
After reading the article, you , What do you want to make complaints about? ? Please leave a message or at project Github Issues Tell me .
1) Python This language has nothing but CPython Outside , There are many other versions that implement . If there is no special instruction , This article and “Python craftsman ” All that appears in the series Python All in particular Python Of C Language implementation CPython
2) Python There is no such thing as in other programming languages “Interface Interface ” type , Only similar “ abstract class ” Concept . For the convenience of expression , The following contents are used uniformly “ Interface ” To replace “ abstract class ”.
3) Have you only realized Mapping But it's not MutableMapping The type of ? try MappingProxyType({})
4) Have you only realized Set But it's not MutableSet The type of ? try frozenset()
Python craftsman : Make good use of variables to improve code quality
Python craftsman : Skills in writing conditional branch code
Python craftsman : Tips for using strings and numbers
This article is edited and released by Tencent blue whale Zhiyun , Tencent blue whale Zhiyun ( Short for blue whale ) The software system is a set of systems based on PaaS Technology solutions for , Committed to building an industry-leading one-stop automatic operation and maintenance platform . At present, the community version has been launched 、 Enterprise Edition , Welcome to experience .