程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Ten thousand words summary Python construction guide and design pattern overview

編輯:Python

The purpose of this article is to quickly understand Python Data structure and syntax sugar , Including how to use it Python Express those familiar design ideas and design patterns , then , Rapid development based on mature environmental management tools and excellent third-party libraries Python engineering . It can be roughly divided into four parts :

  1. Python Environment configuration ( Anaconda ) And basic grammar .
  2. Python Engineering content ( see Python Basics : engineering ).
  3. How to be in Python introduce OOP,FP Paradigm design , And metaprogramming .
  4. Brief introduction Numpy and Pandas Two basic numerical analysis libraries .

Environment and configuration

So-called : A good workman does his work well , You must sharpen your tools first , our Python The project requires the blessing of various software packages . Instead of manually managing dependent packages and running environments afterwards , You might as well leave these troublesome problems to more efficient tools in advance , So let's focus on engineering development . therefore , Introducing Python Before , It's necessary to understand conda Tools .

conda It is an open source software package management system and environmental management system . ad locum , Software package refers to Python The dependent package circulating in the ecosystem , It also includes those made by other languages ( such as C/C++) Developed , Binary programs that can be run directly ( There is no need for users to compile manually ), Such as mkl,cuda. These binary programs may not be directly reflected in the user's Python In the project , But the package that the project itself depends on may need to call these binaries locally at the bottom .

Currently, developers are familiar with Anaconda,Ana- It's English " analysis " The prefix of , It is equivalent to conda + Python + 180 Integration of scientific computing packages . see :Anaconda | The World's Most Popular Data Science Platform

Anaconda The installation package of is about 500 Mb. If you just want to use conda Core functions , Then install the lightweight version Miniconda that will do . see :Miniconda — Conda documentation

During installation , It is recommended not to choose to Anaconda Add directory to $PATH In the environment variables , To avoid being installed separately from this machine Python Path conflicts . The result is , Directly use terminals and conda The interaction will prompt that the command cannot be found .

Don't worry too much about it .Anaconda Will be provided separately Anaconda Prompt Tools , It will set the necessary environment variables at startup , This allows users and conda Tools interact .

The installation path should not contain spaces , You'd better not bring Chinese .

In the following ,*title The title of the style is the key part ,title* The title of the style is an optional part .

conda Environmental Science

The content of this section comes from :Anaconda introduction : Installation and package and environment management CSDN Blog .

complete Anaconda You can also refer to the installation tutorial :Anaconda Introduce 、 Installation and usage guide - You know (zhihu.com)

Mentally ,conda and docker This kind of container management tool is very similar to .conda The purpose of creating various environments is to isolate each Python The running environment of the project , So that they do not interfere with each other .

After installation , First, through conda --version confirm conda Version number information . Can pass update Yes conda Update yourself .

conda update conda

adopt env list Check current conda All environments under and their physical paths ,conda Will put the current ( It's officially called " Activate " ) The environmental identification of is * Number . Without activating any environment , The default point to base Environmental Science .

conda env list

Use list You can print out the list of software packages in the current environment . Without activating any environment , Default printing base Software package for environment .

conda list

Precisely because conda Bring their own Python The relationship between software packages , We don't need to go again Python The official website is installed separately .

adopt activite Command to switch to the specified environment , Then you can use various dependencies and software packages in this environment . for instance , Switch to base After the environment , You can type python The command is directly related to the current environment Python Interpreter interaction .

conda activite base

Use deactivate Command to leave the current base Environmental Science .

conda source

Create environment and management dependencies

Anaconda Put various scientific computing packages into base In the environment . Although in principle , all Python Projects can only run in this one base Environmental Science , But multiple projects may depend on different versions of the same package , This leads to dependency conflicts . therefore , In actual development , We are always for everyone Python Create a separate project conda Environmental Science

conda create <--name|-n> <env_name> [pkg1[=v1]] [pkg2[=v2]]

You can list one or more required software packages after the environment name , Separate with spaces , Each software package can be explicitly marked with the version number . such as , We created a new project named py3env Environment , And install Python as well as pandas package , meanwhile , take Python The version of is explicitly specified as 3.8.

conda create -n py3env python=3.8 pandas

If you don't explicitly specify Python Version of , that conda Default selection and Anaconda synchronous Python Release version .

If the imported package also depends on other more basic packages ,conda They will also be installed in the environment . For some Anaconda Software package not provided by itself ,conda You need to connect to the image source to download . Because the default image source is abroad , Therefore, the download speed may be slow or even fail , This is any library management tool ( No matter what conda,yum,docker,sdkman! still maven ) Is a common problem .

For domestic developers , Using the image source provided by Tsinghua University is a good choice .

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

Can pass config confirm conda Current image source configuration :

conda config --show-sources

Can pass search Make conda Find available software packages according to the current image source configuration . By default conda Fuzzy matching will be carried out according to the given name , It can also be done through --full-name Options for exact matching .

conda search <pkg_name>
conda search --full-name <full_pkg_name>

Can pass clone The way to make conda Copy a new environment according to the existing environment :

conda create -n <new_env_name> --clone <other_env_name>

Can pass install Install third-party packages . By default, it is installed in the current environment , It can also be done through --name Option specifies the installation environment .

conda install <pkg[=ver1]> [pkg[=ver2]] ...
conda install --name <target_env> <pkg[=ver1]> [pkg[=ver2]] ...

When some software packages are no longer needed , You can simply change the above command to remove To uninstall . When an environment is no longer needed , Can pass conda env remove Delete .conda You cannot delete the currently active environment , It needs to be done first conda deactivate sign out .

conda env remove -n <env_name>

stay Windows Under the system , In the installation Anaconda It will also be attached with the name Anaconda Navigator Software for . It provides a graphical conda interface , Allow users to realize the above operations in a simple way ; This reduces the user's starting cost .

export / Import conda Environmental Science

Python The version of the project to the software package Very sensitive . When we finally decide to open an open source Python Upload the project to Github when , The various dependent packages and version number information of the project should be clearly stated , Otherwise, it will be difficult for others to run our code smoothly .

First ,conda It can solve the problem by importing and exporting the environment as a whole . Now suppose we have entered py3env In the environment , We can export various dependent information of the current environment in text form through the following commands , Note that the extension name must be *.txt,*.yaml,*.yml One of them .

conda env export > imports.yml

Suppose another machine is installed conda. When creating a new environment on another machine, you can use -f Options will depend on the environment name recorded in the file , Dependent package and version number , Import all image sources .

conda env create -f imports.yml

The second way is through pip Tool import / Export project dependencies ( For example, some machines do not use conda management environment ).pip Is a dedicated download and management Python Library dependent tools ,conda It will always be built into various environments . In every environment pip Only manage the dependencies of the current environment . Use pip The exported dependent files are conventionally marked with requirements.txt Named after the .

pip list --format=freeze > requirements.txt

Most of the recommendations on the Internet are pip freeze command . The reason for not doing so directly here , see :pip freeze Export contains path (@ file:///) Notes on Problems -CSDN Blog

similarly , This document can be used in other environments pip Import tools :

pip install -r requirements.txt

Only from the function of download dependency ,conda and pip There is a little overlap , But use conda Dependency management is better than pip It is more convenient . Just for part Python rely on ,conda May not install , At this time, you can try again pip Installation .

development environment

PyCharm: the Python IDE for Professional Developers by JetBrains

After the environmental problems are solved , The next step is to choose a good one IDE Development projects , Choose here Jet Brains The company's PyCharm. In the use of PyCharm When creating a new project , choice New Conda environment, Then install this machine conda Set as interpreter ( interpreter ) .

We don't need to manually pass in advance conda Create an environment ,PyCharm With the help of local conda.exe Take care of it for us . By default, the newly created environment name is consistent with the project name .

As the development goes on , More dependencies may need to be introduced further . We can do it in PyCharm Of Settings Set directly conda Environment for package management :

After that , We can focus on Project Engineering , In most cases, there is no need to manually connect with conda Interaction .

Python Basics

Used in this chapter Python The version is 3.8.

Python Various specifications have been formulated for the writing format of the code , They are included in Python Enhancement Proposals ( PEP ) in . however , As the learning goes on , You will naturally adapt and abide by these writing formats , So I won't go into details here . stay PyCharm among , You can use Ctrl + Alt + L Quickly standardize code writing .

Basic data type

Numerical type

Here we simply divide the values into three types : integer int, Floating point numbers float, Boolean value bool, The plural complex. among , Floating point numbers do not distinguish between single precision and double precision .Python It's a dynamically typed language , All variables are dynamically typed . have access to type() Function to determine the current type of a variable . Such as :

# <class 'float'>
x = 1.00
# python Only built in print Perform console output , By default, bring your own carriage return .
print(type(x))
# <class 'int'>
x = 1
print(type(x))
# bool: True, False
# <class 'bool'>
x = True
print(type(x))
# <class 'complex'>
x = 3 + 2j
print(type(x))

In this case , The variables are printed four times x Data type of , And every time x Different types of . Can pass :type Actively declare the data type of the variable , But in fact, it will not affect the execution of the script .

x: int = 10
x = "hello"
print(x, end="\n")

Python It will automatically process the accuracy conversion of numerical calculation . for instance :

print(1/2)

The output of the program will be 0.5 , Instead of 0. However ,Python Provides int(),float(),str(),complex() Equal type conversion function , Can achieve the effect of forced type conversion . The output below will be 0

print(int(1/2))

character string

Python The string type of is str. Whether to use '' perhaps "" The included text can be considered as a string . Such as :

h = "hello" # str
w = 'world' # str
print(h, w)

You can use three quotation marks to represent a text block ( Still belong to str type ), Another use of it is to make long text comments in scripts . Such as :

""" 2022/6/17 author: Me This is the first python script written by myself. you can use text block as the code description. """
print("hello world")

Python Of str There are two practical operator overloads . among ,+ The operator can simply splice two strings , and * Operators can make the string itself repeat splicing .

x = "hello"
print(x + "world") # helloworld
print(x * 2) # hellohello

notes , The string is in Python Can be regarded as a list of characters composed of single characters list. All operations described later in the list also apply to strings .

Python There is another way to write embedded string templates , Such as :

age = 18
name = "me"
info = f""" studentInfo:{age} name: {name} """

The... Before the string f representative format.Python Will age and name The values of the two variables are embedded in info Inside the string .

Composite data type

list list And interval range

list list Is the most commonly used linear data structure , Use [] Statement .Python It is not required that all elements under a list keep the same type . such as :

xs = [1, "2", 3, 4.00, 5]
# len() yes Python Built in functions for , You can print the length of the list .
print(len(xs))

You can generate high-dimensional lists in the form of nested lists . however , We prefer to use numpy Library to generate high-dimensional arrays ( Or matrix ), The latter has higher performance in numerical calculation .

xxs = [[1, 2, 3], [3, 4, 5]]
print(xxs)

stay Python in , have access to 0 The initial nonnegative subscript n Represents the number from left to right in the list n + 1 A place , With -1 The initial negative subscript -m Indicates the number from right to left in the list m A place . such as :

xs = [1, "2", 3, 4.00, 5]
p1 = xs[-2] # 4.00
p2 = xs[2] # 3

stay Python in , such x[0] The underlying point of subscript access __getitem__() Method , It is essentially a kind of operator overloading .

The element references in the list are changeable . such as :

xs = [1, 2, 3]
xs[2] = 4
print(xs) # [1, 2, 4]

Lists can be used like strings + Operator splicing , Or use * The operator repeats .

xs = [1, 2, 3, 4] * 2
ys = [1, 2, 3, 4] + [5, 6, 7, 8]
print(xs) # [1, 2, 3, 4, 1, 2, 3, 4]
print(ys) # [1, 2, 3, 4, 5, 6, 7, 8]

Using this feature, you can quickly generate an element with an initial value of i, The length is n A list of . Such as :

i = 0
n = 10
xs = [i] * n
print(*xs)

Traversing lists is the most common program logic . stay Python Can be expressed as :

for x in xs:
print(x)

If xs It's a list of objects , Then in each iteration ,Python Will Reference copy Extract list elements to temporary variables in the form of x. let me put it another way , If it is modified in the circulatory system x References to , Then subsequent changes to its status will not be transferred to the original list , Because the reference sharing relationship is broken . such as :

# This object has a value v
class Foo:
def __init__(self, v_):
self.v = v_
xs = [Foo(1)]
for x in xs:
# Destroy reference sharing 
x = Foo(2)
x.v = 3
# 1, not 2 or 3
print(xs[0].v)

Without breaking the shared reference , Yes x The modification of the internal state of will be transferred to the original list . such as :

# This object has a value v
class Foo:
def __init__(self, v_):
self.v = v_
xs = [Foo(1)]
for x in xs:
x.v = 2
# 2.
print(xs[0].v)

There will be similar phenomenon in the slice introduced later . The opposite is , value type ( Include str ) All are immutable Of . Right now x Any changes will not be transferred to the original list .

xs = [1, 2, 3, 4, 5]
# Try to make xs The value in x Map all to 2x
for x in xs:
x = x * 2
# Still printing [1, 2, 3, 4, 5]
print(*xs)

If you want to implement it in a concise form list → list Mapping , You can refer to the following derivation to complete , Instead of racking your brains to think about how to reproduce for(i=0;i<n;i++) Such grammar .

If you want to generate an image [0, 1, 2,..., n] Such an arithmetic sequence , You can use it directly range() Function generates an interval , Support self setting step . Such as :

# The generated interval is left closed and right open .[0, 1, ... 9]
xs = range(0, 10)
# If the starting element is 0, It can be abbreviated .
xs = range(10)
# [10, 7, 4, 1]
xs = range(10, 0, -3)

Sum up , The reverse traversal of a list can also be written as :

# start: len(xs)-1 -> because 0 The existence of subscripts , The last subscript of the array is its length -1. 
# stop: -1 -> Traversing -1 Before subscript , namely 0 Subscript no. .
# step: -1 -> Every iteration subscript -1.
for x in range(len(xs) - 1, -1, -1):
print(xs[x])

Python Built in a return Reverse iterator Function of :recersed().

sx = reversed("hello") # String is also a kind of list 
s = "".join([x for x in sx]) # See the generative formula later 
# The most simplified version of the sliced form :
sx = "hello"[::-1]

Section range And list list It's two different types , Can pass type Function to check the difference .range Can be regarded as a Abstract immutable list , So it can also be iterated , however range Type does not provide subscript index access . Such as :

xs = range(10)
xs[1] = -1 # don't do this

If you want to use intervals to generate lists , have access to list() Function transformation .

section

Slicing is based on lists ( Or interval ) Intercepted subsequence ( Or subinterval ), It is not an independent data type . such as , The following code indicates that from xs Of [2,4) Cut the slices at the subscript position :

xs = [1, 2, 3, 4, 5]
ss = xs[2:4] # [3,4]

Slicing is also ok [start:stop:step] Specify the step size in the order of . among start <= stop.

rs = range(1,101)
# [51, 53, ... ,99]
ss = rs[50:100:2]
# *ss It means to slice the subinterval ss Each element of is passed in as an independent parameter , Otherwise, it will only print : range(51, 101, 2)
# See the variable parameters section later .
print(*ss)

start,stop,step Can be defaulted , The default values are 0,len(rs),1. Slices can also be divided into two directions :

  1. If step > 0, Then it means the sequential slice from left to right , The default value is start = 0,stop = len(rs).
  2. If step < 0 , Then it means the sequential slice from right to left , The default value is start = -1,stop = -len(rs)-1.

therefore , Slicing has a very flexible way of declaring , The following expressions are true :

rs = range(1, 10) # [1, 2,..., 9]
print(*rs[:2]) # [1, 2]
print(*rs[4:]) # [5, 6..., 9] == xs[4::]
print(*rs[::]) # [1, 2,..., 9] == xs
print(*rs[::2]) # [1, 3, 5, 7, 9] != xs[:2]
print(*rs[4::2]) # [5, 7, 9]
print(*rs[4::]) # [5, 6,..., 9] == xs[4:]

among , Special memory slices can be made rs[::-1] Writing , It is equivalent to rs In reverse order , The same applies to strings .

Python It's through Reference copy Intercept the object element . let me put it another way , Changes to the state of elements within the slice are passed .

class VV:
def __init__(self, v_):
self.v = v_
x = [VV(1)]
y = x[:]
y[0].v = 2
# 2 2
print(x[0].v, y[0].v)

Want to avoid this coupling , You can assign values using new instance references , So as to destroy the reference sharing .

class VV:
def __init__(self, v_):
self.v = v_
x = [VV(1)]
y = x[:]
y[0] = VV(2)
# 1 2
print(x[0].v, y[0].v)

For numerical lists, there will be no such problem , Because reference copy is not involved here .

a = [1]
b = a[:]
b[0] = 2
# [1] [2]
print(a, b)

Tuples tuple

Tuples can be regarded as a lightweight Reference immutable Data container , The standard way of writing is to use () Statement . such as :

t = (1, 2, 3)
# You can access elements by subscript index , But it can't be modified .
e = t[1]
print(e)

Based on tuples, a considerable number of features can be extended . such as , Multiple assignments can be made with tuples , perhaps Understood as the extraction of tuples . For discarded elements , have access to _ Symbols are simply ignored .

(x, y, _) = (1, 2, 3)
print(x, y) # x = 1, y = 2 

Python Function can also return tuples , Or it can be understood as like Go Language functions return multiple values . Such as :

def swap(x, y): return (y, x)
(x,y) = swap(1,2)
print(x, y) # x = 2, y = 1

Python Tuples of can be omitted (), Between multiple elements, only , Apart, . The above code can also be abbreviated as :

def swap(x, y): return y, x
a, b = swap(1, 2)
print(a, b)

Specifically , If you want to treat a single element as a tuple , Then add ,, Such as a,.

* aggregate set

About collections and dictionaries , We are actually discussing more profound topics :Python Equality of objects .

aggregate set The important difference between types and lists is : The elements in the collection will not repeat , Use {} Statement . First , The elements that can be put directly into the set have values , character string , Tuples . such as :

sets = {1, 1, 2}
# len(sets) == 2, Description duplicate 1 It was sifted out .
print(len(sets))

Let's talk about what a collection of saved objects looks like . The first is a code example :

class Foo:
def __init__(self, v_):
self.v = v_
ref = Foo(1)
sets = {ref, ref}
# len(sets) == 1
print(len(sets))

Python Inside with Calculate the hash value To determine whether the element is repeated . By default ,Python Will use the reference of the object to calculate the hash value . obviously , The same reference must have hash collision . If you understand that , The following running results are easy to explain : Two Foo(1) Are different references , So they can coexist under the same set .

sets = {Foo(1), Foo(1)}
# len(sets) == 2
print(len(sets))

However, we prefer to build a collection of objects with non duplicate values . An effective solution is based on all States of the instance ( Or attribute ) Calculate the hash value . It is obvious that : If the states of both objects are equal , Then their hash values must also be equal , Thus, it is further deduced that the two are repeated .

So , In the class definition, you need to override __eq__() and __hash__() Two methods .

class Foo:
def __init__(self, v_):
self.v = v_
# The official recommended practice is to mix all the attributes of the instance into a tuple , Use tuples to calculate hash values .
def __hash__(self): return hash(self.v,)
def __eq__(self, other): return self.v == other.v
st = {Foo(1),Foo(2),Foo(1)}
print(len(st)) # The actual elements of the set are only 2 individual .

The following abbreviation of such a class is computable hash hashable type.Python The regulation only rewrites __eq__() But not rewritten __hash__() The class is unhashable type, They cannot be put into a collection as elements , Nor can it be used as a dictionary key. It is worth mentioning that ,__eq__() The function itself is still == Operator overload .

Python Built in a lot of __XX__ Named built-in methods or functions , They are also called " magic " function .Python Rely on these functions to generate idioms or execute internal mechanisms .

Python Provide various operators for basic intersection and complement calculation of sets , Such as :

A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}
print(A - B) # {1, 2}
print(B - A) # {6, 7}
print(A ^ B) # {1, 2, 6, 7}, Symmetry difference 
print(A | B) # {1, 2, 3, 4, 5, 6, 7} Combine 
print(A & B) # {3, 4, 5} intersection 

* Dictionaries dictionary

A dictionary is a special collection , Its internal storage is key : value Represents the key value pair , Also use {} Statement , Write it down as dict type . Inside the same dictionary ,key There will be no repetition . Can act as key There are The number , character string , Tuples , as well as hashable type, The reason is the same as set .

dictionary = {"a": "abandon", "b": "banana", "c": "clap"}

Python There are two ways to get from dict Get... In the dictionary value value :

  1. Access by indexing . however , The dictionary cannot find the specified key Will throw an exception .
  2. call dict Of get Method . When searching key In case of failure, replace with the default value provided by the second parameter , Is a more secure way to access .
dictionary = {"a": "abandon", "b": "banana", "c": "clap"}
maybeKey = dictionary["d"] # error
""" Here is a special detail . Don't do that : dictionary.get("d", default="Nil") there get The method is C Realized at the language level ( For better performance ), And it is incompatible default=xxx This way of transferring parameters . """
getNilIfNull = dictionary.get("d","Nil")

have access to in Keyword query whether there is a key value , This keyword will be further explained later .

dictionary = {"a": "abandon", "b": "banana", "c": "clap"}
boolean = "d" in dictionary
print(boolean) # False

have access to del Keyword delete the specified key value pair in the dictionary . If this key There is no such thing as , An exception will be thrown .

dictionary = {"a": "abandon", "b": "banana", "c": "clap"}
del_key = "c"
if del_key in dictionary:
del dictionary["c"]
print(dictionary)

You can use another dict Update the current dictionary . Existing key value pairs in the original dictionary will be overwritten , Nonexistent key value pairs will be added . Such as :

dictionary = {"a": "abandon", "b": "banana", "c": "clap"}
dictionary.update({"a": "abuse", "d": "desk"})
# {'a': 'abuse', 'b': 'banana', 'c': 'clap', 'd': 'desk'}
print(dictionary)

A dictionary can be used for Loop traversal . In the following example , Every key in the dictionary key Extracted into temporary variables k. such as :

dictionary = {"a": "abandon", "b": "banana", "c": "clap"}
for k in dictionary:
print(dictionary[k], end=",")

* The derived type

The derivation is Python Distinctive , One of the effective ways to map composite data types . such as :

xs = [1, 2, 3, 4, 5]
""" xs.map{x => 2*x} """
x2s = [2*x for x in xs]
print(x2s)

The above code is equivalent to xs One step mapping ( map ) operation : First, through for Expression extraction xs Every value of , After transformation, it is collected into a new list .

Further , If you insert if The guards , This derivation will also be able to xs The filter ( fitler ) operation . Such as :

xs = [1, 2, 3, 4, 5]
""" xs.filter{_ % 2 == 0}.map{_ * 2} """
x2s = [2*i for i in xs if i % 2 == 0]
print(x2s)

The derivation applies to the list , Section , Tuples , aggregate , Dictionaries . such as , Generate a series of key value pairs :

words = ["hello", "world", "python"]
# {'h': 'hello', 'w': 'world', 'p': 'python'}
k = {w[0]: w for w in words}
print(k)

List derivation can be nested . such as :

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
# seq2d -> row -> num
def flatten(seq2d): return [num for row in seq2d for num in row]
# 1 2 3 ... 8 9
xs = flatten(matrix)
print(*xs)

Python keyword

or & and & not

In order to improve the readability of the code ,Python Separate use or Instead of " or ",and Instead of " And ",not Instead of " Not ", These operators are often used in conditional expressions . such as :

print(not False) # True
print(False or False) # False
print(False or True) # True
print(True and False) # Fale
print(True and True) # True
print(1 not in [1, 2, 3]) # False

besides ,or and and There is also an extended usage . such as , For two numerical types ,x or y You can return the smaller of the two , and x and y You can return the larger of the two . Such as :

print(2 or 3) # 2
print(3 and 5) # 5

If we think logically False < True, Then the design mechanism is not difficult to understand .

pass

pass Keywords act as Python Grammatical placeholder . such as :

if x >= 10:
pass
else:
print("x >= 10")

Or some functions that are only declared but not implemented foo add pass To maintain grammatical integrity . such as :

def foo(): # TODO waiting for implementing
pass

None

None stay Python As a special constant , Its type is NoneType. It represents a semantically null value , But itself is neither 0, Neither False.

None It can be used to design some functions ( partial function ). for instance , When a function f Choose not to process some input , It can choose not to throw exceptions or other preset default values , But simply None Instead of .

def div(x, y):
if y == 0:
return None
else:
return x / y
print(div(3, 0))

This design idea is widely used in functional programming . such as :Scala: Exception handling under functional programming - Nuggets (juejin.cn)

*is & ==

is Keywords are often associated with == Put them together to discuss . The main difference between the two is :

  1. == Value comparison is performed , Emphasize equality .
  2. is It's a reference comparison , Emphasize the same .

stay Python in , You can use the built-in id() Function to get the global identification number of an object , This identification number is equivalent to C Address of language . If the identification numbers of both are the same , It is considered that the references of the two are the same . At this time to use is The result of the comparison is True, Otherwise False.

And the object's == The bottom layer of the operator points to __eq__() Method , It and __hash__() Method Pairs appear . Empathy , You can also define >=,<= Equal operator overload .

class Obj:
# __init__ Equivalent to object constructors in other languages 
# self.x Represents the internal properties of the declared object .
def __init__(self, v_):
self.v = v_
def __eq__(self, other): return self.v == other.v
def __hash__(self): return hash(self.v,)
o1 = Obj(1)
o2 = Obj(1)
print(o1 == o2) # True
print(o1 is o2) # False

The comparison between values should use == , in addition , Compare whether a value is None When using is. because None It is equivalent to a global singleton , All are assigned None Variables of always point to the same reference .

*in

in Is a practical keyword . We can quickly use this keyword to verify whether an element is in an iterative data structure , As listing , section , Tuples , aggregate , Dictionaries . And the bottom of the search mechanism is still inseparable from comparison , namely Equality judgment . If the element you are looking for is object ,Python Will give priority to trying to call user rewritten __eq()__ Method , Otherwise, the comparison is still carried out by reference , See the following example .

class Foo:
def __init__(self, v_):
self.v = v_
def __eq__(self, other): return self.v == other.v
def __hash__(self): return hash(self.v,)
class Goo:
def __init__(self, v_):
self.v = v_
cond1 = Foo(1) in [Foo(1)]
print(cond1) # True
cond2 = Goo(1) in [Goo(1)]
print(cond2) # False

thus it can be seen , rewrite __eq__() Methods are important to clarify the semantics of classes . otherwise , Seemingly highly readable code will actually return completely contradictory results .

yield from*

Lazy loading is a bias Functional The topic of .

First ,yield Keywords can be used to generate a Lazy loading The flow of data , Avoid reading all the data to be processed into memory at one time , So as to reduce the waste of resources . such as , Below seq() The function is used to generate an infinite stream :

def seq(start: int = 0):
while True:
yield start
start += 1
gen = seq(0)

Once a function is used yield As return value ,Python It will be translated into a generator ( Generator ).

The above code calls seq(0) A generator instance is created and assigned to gen.next() Function can call the generator once and get a return value . Every time the generator is called , Will execute the function body to The next one yield sentence , Generate a value and return it to the outside world , Then stop and wait for the next call , Until the last one yield Then quit .

As you can see ,yield It can make the function be " Pause ". This feature can be used to design coroutines . Interested students can refer to :Python Key words of yield What are the usages and uses ? - You know (zhihu.com)

such as , Use the generator above gen, We can continuously generate increasing continuous sequences :

""" there for Loop is to call repeatedly next(gen) Generate 10 A continuous natural number xs = [0, 1, 2, ... , 9] ys = [10, 11, 12, ... , 19] """
n = 10
xs = [next(gen) for _ in range(n)]
ys = [next(gen) for _ in range(n)]
print(*xs)
print(*ys)

because seq() The function itself is an endless loop , therefore gen Always be able to return a steady stream of values . Here is a simple generator that is easier to understand , It does not contain any loop statements :

def finite_seq():
yield 1
yield 3
yield 5
finite_gen = finite_seq()
print(next(finite_gen)) # Return to the first yield value 1
print(next(finite_gen)) # Go back to the second yield value 3
print(next(finite_gen)) # Return to the first yield value 5
print(next(finite_gen)) # StopIteration

generator finite_gen Data will be generated in turn 1 3 5 Then close . If you try to generate more data at this time , The program will throw StopIteration abnormal .

Generators are also traversable objects . In this case , You can use it directly for The loop extracts all the elements in the stream , because finite_gen Elements will not be generated endlessly .

def finite_seq():
yield 1
yield 3
yield 5
finite_gen = finite_seq()
for x in finite_gen:
print(x,end=", ")

Don't do this in infinite flow , Otherwise, the program will fall into a dead cycle .

Some higher-order generators rely on other generators ( Or recursively call itself ) Generating elements , You need to introduce yield from keyword . Back to the first example : We can now define an increasing infinite flow recursively :

# This infinite flow is also called co recursion .
def seq(start):
yield start
yield from seq(start+1)
gen = seq(0)
xs = [next(gen) for _ in range(10)]
print(*xs)

Here is a slightly complicated case :

def flatten(xs: list):
for i in range(len(xs)):
if isinstance(xs[i], list):
yield from flatten(xs[i])
else:
yield xs[i]
xxs = [1,[2,3,[4,5]],6,[7,8]]
xs = [x for x in flatten(xxs)]
print(*xs) # 1 2 3 4 5 6 7 8

flattten The generator will detect xs Whether the element of also contains a list . if , Then recursively create a sub generator to extract the elements of the sub list . therefore ,flatten You can flatten any complex list into a one-dimensional list .

* Summary

stay Python In the design concept of , The equality of objects is a subproblem of repeatability :

  1. __eq__() Defines equality , To determine the == and in The result of the operator .
  2. __eq__() and __hash__() Defines the repeatability in hash calculation , This further determines whether it can be used as a set set The elements of , Or a dictionary dict Of key.

secondly , When traversing a list or slice , Avoid accidental reference sharing , Or inadvertently destroy it , As a result, the designed program is not in line with expectations .

Finally, the understanding of immutability . Immutability of values and strings , Equality , Repeatability is intuitive , The immutability of tuples refers to the immutability of references , But the state inside the element is still variable . In order to avoid unexpected trouble , If you want to use a tuple as a dictionary key, You need to make all internal object elements hashable type.

Extra , and Java Different ,Python The referential equality of , Is to rely on id() Determined by the global identity , To determine the is The result of the operator . It and hash __hash__() Functions are two different things .

Important grammar

Here are just some important contents , Other parts can go by themselves Python3 course | Novice tutorial (runoob.com) Search for .

Select branch

Python No, switch Branch , All multi select branches use if As a substitute for . among ,else if The grammar is simplified to elif. Such as :

identify = "Student"
if identify is "Student":
print("he is a student.")
elif identify is "Tutor":
print("he is a tutor")
elif identify is "Professor":
print("he is a Professor")
else:
print("unknown.")

Python Of if Statements have another purpose : Act as a ternary operator in other programming languages . The logic is : if if The expression of is true , Then assign the previous value , Otherwise, assign a value . Such as :

# a = if (10 > 1) ? true : false
# If 10 > 1 establish , be a = True. otherwise ,a = False.
a = True if 10 > 1 else False
print(a)

Best value judgment

Python Provides built-in max() and min() Function simplifies the lookup operation . such as :

seq = [3,6,7,8,1,4,2]
max(seq)
min(seq)

If the inner element is an object , You need to pass in an expression to specify the field to compare , such as :

class Foo:
def __init__(self,v_):
self.v = v_
list = [Foo(1),Foo(2),Foo(3)]
# The regulations are in accordance with Foo Of v Value comparison .
mx = max(list, key=lambda foo: foo.v)
print(mx.v)

Interval judgment

If we judge the value range of a numerical variable ,Python It provides a more readable writing method . Such as :

x = 100
# other lang: if(x >= 0 && x <= 100){...}
if 0 <= x <= 100:
print("x in [0,100]")
else:
print("x not in [0,100]")

Assertion

Assertion is a strict conditional judgment . Use assert Keyword to create an assertion , With the conditional formula cond and msg Composed of two tuples . When the conditional discriminant is False when , The program will throw one AssertionError abnormal , And will msg Messages are output to the console . such as :

x = 100
y = 0
assert y != 0, "y should not be 0."
# The code below is unreachable .
z = x / y

Exception trapping

Python Use try,except,finally ( It can be defaulted ) To guard a piece of code , And catch when the code block throws an exception , Avoid program interruption and exit .

try:
100 / 0
except ZeroDivisionError as e: # Assign the caught exception to e
print(f"error! => {e}")
finally: # finally It's optional 
print("done.")

If you want to catch multiple exceptions , Can be except It's written in :exception (ErrorType1, ErrorType2, ...) as e:.

Can pass raise Keyword actively throws an exception . such as :

raise Exception("throw a new Exception by user code")

In the functional style of data flow processing , Functions generally collect outliers , And uniformly throw it to the upper code . such as :

def f(x: int): return (x, None) if x > 0 else (None, ArithmeticError(f"{x} is a invalid value"))
nonNegative = [f(x) for x in xs]
right = map(lambda x: x[0], filter(lambda x: x[0] is not None, nonNegative))
left = map(lambda x: x[1], filter(lambda x: x[0] is None, nonNegative))
print(*right) # Output the data after normal processing 
print(*left, sep="\n") # What exceptions have you encountered during the processing of output data 

In this way , We can separate the logic of data flow processing from the logic of exception processing .

with Keyword realizes resource opening and closing

hold with Grammar is seen as more abstract try - catch Model .

When it comes to opening IO flow , Or when locking such scenes , Automating the means of shutting down resources will help us save a lot of effort , It's like Go Linguistic defer Mechanism .

f = open(filePath, mode="r", encoding="UTF-8")
f.readline()
f.close()

Python adopt with .. as Keywords provide General post notification operation . Now? , File closing can be rewritten into the following logic :

with open(filePath, mode="r", encoding="UTF-8") as f:
f.readline()
pass

with Statement blocks at the bottom with __enter__() and __exit__(). let me put it another way , Any class instance that implements these two magic functions can use with Sentence block . Here is a simple example :

class Daemon:
def __enter__(self):
# TODO
pass
def __exit__(self, exc_type, exc_val, exc_tb):
r = "success" if exc_type is None else "failure"
print(f"end.{r}")
d = Daemon()
with d:
print("do something")
pass

In this code , Simple expressions d Point to one Daemon Class instance . The internal code block will be __exit__() function guardian , Whether the code block is executed successfully or not , This function is always called . When an exception is thrown inside a statement block ,exc_type,exc_val,exc_tb The three parameters will be non None value .

d = Daemon()
with d:
# end.failure
print(1 / 0)
pass

besides , If __enter__() Function returned a meaningful non None value , We can go through as Keyword to receive . such as :

class Daemon:
def __enter__(self):
return 10, 5
def __exit__(self, exc_type, exc_val, exc_tb):
r = "success" if exc_type is None else "failure"
print(f"end.{r}")
d = Daemon()
# Daemon Of __enter__() The function returns 10, 5 Two values , Therefore, tuples are used here to extract x1, x2 Parameters .
with d as (x1, x2):
# 2.0
# end.success
print(x1 / x2)
pass

It's not hard to imagine. ,with Statement blocks can also be used to design hidden try ... catch ... finally Logic , In this way, various details of clearing resources are shielded , This can improve the readability of user code .

Details of function declaration

Function definitions that exist directly in modules are generally called function, Functions defined in classes are generally called methods method.

Python The function of does not strictly require the definition of the return value type , But it is strictly required to use explicit return Keyword declaration return value .

def add(a, b): return a + b

A function that specifies parameters and return value types can be declared as :

def add(a: int, b: int) -> int: return a + b

This type specification has only declarative meaning , because Python It is not a compiled language . therefore , Even if parameters of mismatched types are passed , The interpreter will not refuse to execute .

If a function does not return any meaningful value , Then the return value type is equivalent to None. such as :

def println(anything) -> None: print(anything)

In defining a function, you can set the default value of parameters . such as :

def f(v1=0, v2=0): return v1 + v2
print(f()) # 0

Functions can declare Variable parameters Indicates that the parameter position receives any number of values , The parameter name is preceded by * Make a decoration . such as :

# non-keyword arguments
def receive(*args):
print(type(args))
for i in args:
print(i, end=", ")
receive(1, 2, 3, 4, 5)
# Split the list into variable parameters and pass in 
xs = [1, 2, 3, 4, 5]
receive(*xs)

among , Input multiple parameters 1, 2, ... 5 Wrapped as a tuple tuple type . If the parameter name is preceded by two asterisks ** modification , It means that it receives any number of key value pairs . such as :

# keyword arguments
def config(**kwargs):
print(type(kwargs))
for k in kwargs:
print(kwargs[k], end=", ")
config(port=8080, server="tomcat", max_connetion=500)

The whole parameter list will be wrapped into a dict Dictionaries , The key is required here key Must be str type . The external dictionary can be passed into the function as a variable key value pair parameter . such as :

dictionary = {"a": "abandon", "b": "banana", "c": "clap", "d:": "12"}
conf(**dictionary)

To avoid confusion ,Python Specify common parameters , Variable parameters , The sequence of variable key value pair parameters is :

def foo(parm,*args,**kwargs):pass

Sometimes in order to improve the readability of the code , We will also choose to **kwargs Pass in parameters in the form of , This is no problem in most cases . such as :

def f(v1=0, v2=0): return v1 + v2
print(f(v2=3)) # 3

Like a dictionary get() Method is an exception , see :python - TypeError: get() takes no keyword arguments - Stack Overflow.

everything C-level Hierarchical Python API Do not support incoming **kwargs.

Functions can be defined inside functions , And the function itself can return another function . such as :

def hof(p1):
# function f Only in hof Available within the definition domain .
def f(p2): return 2 * p2 + p1
# Function identifier f Said to return to f In itself .
return f
ff = hof(5)
y = ff(1)
print(y)

This feature has many topics that can be extended , See function oriented programming in design patterns later .

engineering

stay Python in , A single *.py The file is called a module ( module ); Multiple modules are organized into a package ( package ); A huge project consists of packages at various levels . It's worth noting that , The parent package does not automatically import the modules of the child package .

Introduce dependencies

import You can import packages , You can also import specific modular . such as , Through the first conda Install... In the project environment numpy, And then through import Import it into the current script .

# Multiple packages can be introduced at the same time , Multiple packages are separated by commas , The same below .
import numpy
# It can also be done through . Operator to import a specific module under the package .
# Here's just a demonstration , In fact, we only import numpy That's enough .
import numpy.core.multiarray
# By fixing the data type ,numpy You can apply for a neat and compact continuous memory space to save data . 
arr = numpy.array([1, 2, 3], dtype=float)
print(*arr)

Imported packages or modules can be imported through as Keywords are aliased . such as :

import numpy as np
arr = np.array([1, 2, 3], dtype=float)
print(*arr)

Python There is also a from .. import sentence , It supports finer grained imports . There are two uses :

One : If from Behind it is a bag , Then you can import Modules under this package .

from pkg import module1 as m1
m1.var1 # ok
var1 # error

Two : If from Then there is a module , Then you can import Variables defined under this module , function , Class definition . These imported contents can be used directly , There is no need to specify the module name .

from pkg.module1 import var1 as v1, var2 as v2, func1 as f1,
v1 # ok

Script or Module

Any one of them Python Modules have two uses :

  1. As a component, it provides functions for other modules , Including variable , function , Class definition .
  2. Send as script Python The interpreter performs .

When a module is referenced , The interpreter will execute all internal lines of code from beginning to end to obtain variables , function , The definition of a class . however , This module is only used as a component , Its logic as a script part generally does not need to be executed . In order to distinguish these two responsibilities , You can write a special conditional branch in the top-level declaration of the module file :

# some definations
def func(): pass
var = 10
print("as module")
if __name__ == '__main__':
print("as script")
# ...

Code block in this branch Only when the module is used as a program entry Will execute , That is, print at the same time as module and as script. When it is referenced by other modules only as a component , The interpreter will only print as module.

Directory structure

A large Python Projects usually consist of multiple packages . such as , A simple Python The project can include the following levels :

Project/
|
|-- project/
| |-- test/
| | |-- __init__.py
| | |-- test_main.py
| |
| |-- __init__.py
| |-- main.py
|
|-- setup.py
|-- requirements.txt
|-- README.md

The module as the entry of the project program is generally named main.py, But this is not necessary . For example, some data processing projects use multiple executable modules to provide different functions . Here is a brief introduction __init__.py and setup.py These two modules .

_init_.py

A special module can be created under the subdirectory of each project :__init__.py,Python Any directory containing this module will be identified as a package Package. meanwhile , When other modules introduce this package ,Python Will load first __init__.py Declarative logic .

All in __init__.py Declarative , Or through from .. import Variables introduced by the statement block , function , Class definition , Will be included in the definition domain of the package . Any module that references this package will automatically get them . let me put it another way , If the project developer is declaring a package __init__.py File declares the import of the sub package module , So for project users , He only needs to import this package separately .

In addition to providing data definitions , Project developers usually write some initialization here , Or pre work such as function verification . Such as :

# Let users know if they're missing any of our hard dependencies
hard_dependencies = ("numpy", "pytz", "dateutil")
missing_dependencies = []
for dependency in hard_dependencies:
try:
__import__(dependency)
except ImportError as e:
missing_dependencies.append(f"{dependency}: {e}")
if missing_dependencies:
raise ImportError(
"Unable to import required dependencies:\n" + "\n".join(missing_dependencies)
)
del hard_dependencies, dependency, missing_dependencies

This is from pandas/__init__.py Part of the code of the module . During initialization ,pandas Will first try to introduce from the environment numpy,pytz,dateutil package , And throw when the import fails ImportError abnormal .

setup.py*

If you don't specialize in Python Development of tool library , Instead, it focuses on building runnable applications or data processing tasks , Then this module is not necessary .

The previous article has introduced how to use pip Export project dependencies to requirements.txt file , This document is equivalent to declaring the environment that can ensure the normal operation of the project env. Suppose we write something that can be run directly Python project , And open source it to remote Git On the warehouse , Then this document will help others clone Users of the project can quickly copy the startup environment of the project .

Of course , We can also develop more basic Python The basement , Other developers can use pip install pkg The command introduces our toolkit as a dependency and installs .

At this time, we need a person named setuptools Tools to package the project , It was also conda Pre installed . You need to carry the version number when packing , Open source licenses , Project home page , Developer email , Third party reliance ( important ) And so on , And we just need to be able to setup.py ( Another way is setup.cfg ) You can set it according to the format in .

Below setup.py It's a simple demonstration . Other libraries that the toolkit depends on can be accessed through install_requires Parameters to configure ,pip When installing this toolkit, the dependency problem will be solved by reading the configuration .

from setuptools import setup
setup(
name='pythonProject3',
version='1.0.0',
packages=['project', 'project.core'],
author='lijunhu',
author_email='[email protected]',
description='a sample',
# If you are sure that dependencies are backward compatible , You can also set it to 'numpy>=1.22.0'
install_requires=[
'numpy==1.22.0'
]
)

stay PyCharm UI At the top of the Tools > Create setup.py / Run setup task Can quickly generate a setup Templates , And choose the packaging method , such as .egg,.whl ( Can be pip Direct installation ), Or is it Windows The platform is operational .rmi, Even Linux Platform *.rpm.

The packaged file will be generated in the project root path /dist Catalog . Users can take it through pip install your_project.whi Install this toolkit into the environment , meanwhile pip Will download by itself 1.22.0 Version of numpy And install it into the environment .

You can refer to the following links :

setup.py And requirements.txt difference - SegmentFault Think no

Python Medium requirements.txt And setup.py_deephub The blog of -CSDN Blog _requirements.txt

It took two days , Finally put Python Of setup.py I get it - You know (zhihu.com)

Python Packaging tools for (setup.py) Actual combat - Yin Zhengjie - Blog Garden (cnblogs.com)

Command line parameter parsing

The entry program of a project usually comes with configuration items , The user can pass in the necessary parameters before the program starts , Or do additional configuration . such as :

python main.py --address "hadoop1" "hadoop2" "hadoop3" --num 3

Python Built in the standard library argparse library , It can help developers parse command line parameters . Inside the script , We just need to instantiate one ArgumentParser Parser object , Then add the set command line parameters to the object . By agreement , With - An identifier prefixed with is recognized as a configuration item , The following parameters are the value of the configuration item , See the implementation below :

import argparse
parser = argparse.ArgumentParser(description="config your cluster")
parser.add_argument("--address", "-a", help="your server ip", nargs="+", type=str,required=True)
parser.add_argument("--num", "-n", help="number of the slaves", type=int, default="3")

Here are some brief descriptions :

  1. adopt help Parameter gives the prompt information related to the configuration item .
  2. adopt type Specifies the type of parameter value .
  3. adopt default Specifies the default value when the user does not specify a configuration item .
  4. adopt nargs="+" Indicates that the configuration item receives at least one or more values . The value of this configuration item will be collected into a list list Inside .
  5. adopt required=True Indicates that the configuration is necessary .

Here, only common command line parameter configurations are listed . For complete content, please refer to the official website :argparse — Parser for command-line options, arguments and sub-commands — Python 3.10.5 documentation as well as :argparse Detailed explanation of module usage examples - You know (zhihu.com)

By calling the parser parse_args() Method to extract the parameters passed in by the user from the outside . For the purpose of testing , Here, manually pass in the command line parameters in the form of an array :

# conf = parser.parse_args()
conf = parser.parse_args(["-a", "hadoop1", "hadoop2", "hadoop3", "--num", "3"])

Successfully resolved conf by Namespace Namespace of type . Directly through conf.{arg} The parsed configuration item value can be extracted . Such as :

print(conf.num) # 3
print(conf.address) # ['hadoop1', 'hadoop2', 'hadoop3']

This approach is based on rewriting __getattr__() The attribute of magic function is dynamically injected , See metaprogramming techniques later .

argparse Reserved -h and --help Configuration item , It can print out help information according to the developer's settings . such as :

""" usage: main.py [-h] --address ADDRESS [ADDRESS ...] [--num NUM] config your cluster optional arguments: -h, --help show this help message and exit --address ADDRESS [ADDRESS ...], -a ADDRESS [ADDRESS ...] your server ip --num NUM, -n NUM number of the slaves """
parser.parse_args(["-h"])

Design patterns

Capability design

Competency based design is a feature of all dynamic languages , such as , The same script language Groovy. see : adopt Groovy Understand dynamic languages - Nuggets (juejin.cn). The following functions f It's stated that :

def f(anyone) -> None:
anyone.g()

In the absence of Context ( Context ) Under the environment of , We don't know anyone What kind of type is it , Nor is it guaranteed that it has g() Method ; This is for Python The same goes for the interpreter .anyone The type of parameter is determined dynamically , Scripting language sacrifices some performance in exchange for the ability of dynamic distribution .

But think positively , We could also say :f() The function is wrong anyone Make any constraints . Because of this , We don't need to define any interface definition specification from the top level in advance , just " Think " anyone Should be able to provide g() Method . such " Contractual development " The idea of is very suitable for agile development of lightweight projects , At the same time, expandability is preserved .

Of course , Must be in Python There is no reason to use interface programming to make strong constraints on types . It's just that the code may be written like this :

class Foo:
def g(self): raise NotImplementedError("declare a sub type and implement it.")
class Foo1(Foo):
def g(self): print("method from Foo1")
class Foo2(Foo):
def g(self): print("method from Foo2")
def f(anyone: Foo) -> None:
assert isinstance(anyone, Foo), "anyone should implement class: Foo"
anyone.g()
f(Foo1())
f(Foo2())

Variables without any type identification are also jokingly called " The duck type ". Its allusions come from Python This design concept :" If it walks like a duck , It sounds like a duck, too , So it's a duck ".

Through here isinstance Determine whether the object meets the type . We have other means to dynamically judge ( Even ) Modify the properties and methods of an object , See metaprogramming .

object-oriented programming

Accurately speaking ,Python Class ( class ) Closer to the characteristics of other languages ( Trait ), Or rich interface ( Interface ) The concept of , because Python The class of supports multiple inheritance , We can get through Combine Let a class get powerful functions . such as :

class Swim: # trait
def swim(self):print(f"{self.__class__.__name__} swim")
class Quack: # trait
def quack(self):print(f"{self.__class__.__name__} quack")
# Brackets indicate inheritance , Allow multiple inheritance 
class Duck(Swim,Quack): pass
duck = Duck()
duck.swim() # Duck swim
duck.quack() # Duck quack

The previous article has covered some Python Class definition and instance creation , Here is a further supplement to the details .

Method recipient self

Python Instance method of ( method ) The first parameter of must be self, It refers to the called object itself , It can be understood as similar Go Linguistic " Method recipient ". But when called , Should be ignored self Parameters . If the parameter list of the method does not contain self Parameters , You need to pass a @staticmethod Decorator ( This symbol is in Java Notes in , See below ) Label it as a static method .

class Foo:
def methood(self): print("a method")
 @staticmethod
def functioon(): print("a function")
foo = Foo()
foo.methood() # Invoking an instance method 'methood'
# No need to create 'Foo' Example 
Foo.functioon() # Call static methods : 'functioon'

The design of static method mainly considers the standardized management of module namespace . in addition , Calling a static method through an instance of a class will not report an error . such as :

foo.functioon()

Another similar decorator is @classmethod, Used to label class methods . It and @staticmethod The difference is that : It will carry one cls The parameter represents the type . It can be used to do many things , For example, metaprogramming .

* Underscore prefix identifier

First , Underline _xx Named identifiers indicate Static domain of module space or class Private declaration under , These statements are not public . For example, the following statement :

def _private_func(): print("only accessible in this module")
_private_var = 100

Double underline __xx The named identifier represents the private properties of the instance , These properties are not public . For example, the following statement :

class Foo:
def __init__(self,v_):
self.__private_v = v_
def __private_method(self): print("private method.")

With __xx__ The method of naming is called Python Magic function of ,Python Use them to realize various grammatical sugars , Or is it trick Mechanism . up to now , The magic function we use most is __init__() , It can be considered as the initializer of a class . It passes through self.xxx Declare the properties of class instances . But in fact , We implement the attribute declaration through meta information injection , See metaprogramming later .

operators overloading

stay Python In the magic function provided , One part is for operator overloading . These function names correspond to operators one by one , such as :__add__() Corresponding * The operator ,__sub__() Corresponding - The operator ,__getitem__ Corresponding [] Access operators and so on , Here's not a list . The operator overloading mechanism is greatly enriched Python The expressive ability of the program . such as :

class Pipeline:
def __init__(self, seq_):
self.seq = seq_
def __getitem__(self, lamb):
stream = [lamb(x) for x in self.seq if x is not None]
return Pipeline(stream)
def __iter__(self): return iter(self.seq)
# This design can be optimized , See later " Free theorem ".
pipe = Pipeline([1, 2, 3, 4, 5])[lambda x: x + 1][lambda x: x * 3] \
[lambda x: x * 2 if x % 2 == 0 else x] # Realize a partial function that doubles if it is an even number 
# [2, 4, 6, 8, 10]
print(*pipe)

Here we use instruction linking and operator overloading to create a symbolic data flow pipeline , Users can use the compact [] Operators pass continuously lambda expression . The array in the pipeline will perform mapping transformation in turn .

except __getitem__() Can overload x[] Besides the operator ,Python It also provides reloading x() Operator __call__() Method . Or say : The object that overloads this method will become Callable object .

class Foo:
def __call__(self, *args, **kwargs):
param = kwargs.get("param", "none")
print(f"callable test:{param}")
foo = Foo()
print(callable(foo)) # True
foo(param="test") # Parameters can be passed in 

thus it can be seen , Operators we are familiar with , stay Python There may be completely different semantics in different contexts .

The singleton pattern

Python Every one of *.py Modules are naturally singleton patterns . such as , We can do it in the first module A It is defined as follows :

class Foo: pass
foo = Foo()

Then in another module B Only foo This is a reference .

from moduleA import foo

If we search the Internet "Python The singleton pattern ", You usually get a variety of answers . But in any case ,Python The singleton mode of is only based on the Convention , Just like ability design . The essential reason is : We cannot fundamentally prohibit other users from calling constructors .

Metaprogramming *

be based on Python The characteristics of dynamic execution , What attributes should an instance of a class have ( field ) And methods ( These definitions are called meta information ), Unlike other compiled languages, it is not fixed before the program runs , It may be temporarily modified or even created as the script runs . let me put it another way ,Python You can modify the meta information of a class or instance at any time at run time , Metaprogramming for short .

Metaprogramming greatly expands the flexibility of scripting language , Maybe you can go through another door Groovy Linguistic MOP Get some interesting inspiration from meta object protocol . see : Read through Groovy Meta object protocol MOP - Nuggets (juejin.cn).

Meta information check

First of all, let's start with meta information check . This can be done through two built-in functions :

  1. vars() Function can print out the attributes of an instance in the form of a dictionary .
  2. dir() Function can print out all properties and methods of an instance in list form ( Including slave classes object Magic function obtained ) Identifier . such as :
class Foo:
def __init__(self,v_):
self.v = v_
def f(self):pass
 @staticmethod
def g():pass
foo = Foo(1)
print(dir(foo)) # ['__class__', '__delattr__', ... , 'f', 'g', 'v']
print(vars(foo)) # {'v': 1}

besides ,Python There are commonly built-in attributes or methods inside the object to reflect meta information :

  1. foo.__dict__ attribute , Equivalent to calling vars(foo).
  2. foo.__dir__() Method , Equivalent to calling dir(foo).

Use hasattr() Function can detect whether the meta information of an object contains an identifier . To get identifiers dynamically , have access to getattr() function .

print(hasattr(foo,"f"))
print(getattr(foo,"f"))

If getattr() The identifier returned by the function is an attribute , Then the method will directly return its value or reference . If the returned identifier is a method , It can be regarded as a callable object ( It is method type ) call , Such as :

foo = Foo(1)
invokable = getattr(foo,"f")
invokable()

Python Of inspect The module provides built-in ismethod() function , Used to detect whether the identifier is a method . such as :

mthd = inspect.ismethod(getattr(foo,"f"))
print(mthd) # True
mthd()

Meta information injection

Use setattr() Function can insert new attribute values into the constructed object . such as :

foo = Foo(1)
setattr(foo, "a", 2)
print(getattr(foo,"a")) # 2

Or inject attributes directly by hard coding . such as :

class Goo: pass
goo1 = Goo()
goo1.i = 20
print(goo1.i)

adopt MethodType You can directly inject expressions into objects as methods . such as :

goo1 = Goo()
# Keep the first parameter as the method receiver .
def method_(this): print(this.i)
# lambda The expression also needs to keep the first parameter as the method receiver 
lambda_ = lambda this: print(this.i)
goo1.invocable = MethodType(lambda_, goo1)
goo1.invocable()

It's not difficult to understand. , If the method is injected directly into Goo Within type , Then the method will be used as a class method . here , Of expression this Will refer to cls Instead of self.

Access interceptor

Python Object built in __getattr__() ,__getattribute__() and __setattr__() Three magic functions :

  1. When accessing Undefined property or method when , This behavior will be __getattr__() Method intercept .
  2. Whether the accessed property or method is defined or not , The behavior will always be targeted __getattribute__() Method intercept . stay __getattribute__() and __getattr__() Occasions that have been rewritten at the same time , Call the former first . The current one throws AttributeError when , Try the latter again .
  3. When injecting properties or methods into a program , The behavior is always by the object __setattr__() Method intercept .

Through the rational use of these three magic functions , We can create a safe , Dynamic objects that can be safely accessed . Here is a simple example :

class Foo:
def __init__(self):
self.v_ = 10
def __getattribute__(self, item):
try:
v = object.__getattribute__(self, item)
except AttributeError:
return None
return v
foo = Foo()
foo.a = 100
print(foo.a) # 100
print(foo.b) # None

Based on this implementation , When accessing undefined attributes ,Foo The instance will return None Instead of throwing AttributeError.

Decorator mode

Functions can be defined inside functions , And the function itself can also return the function . Decorator pattern is to realize aspect oriented programming Aspect Oriented Programming ( AOP ) One of the effective means .

Python The decorator itself is a function . Here is a simple example :

def before(ff):
print("before")
return ff
@before
def f(): print("f")
f() # before, f

function f() By before() Function modification . When calling f() Function time , This function will be passed into before() Internal function . according to before The logic of , It will perform some pre operations first , Then the intercepted operation ff ( The decorated function f ) Return and execute , So as to realize the pre operation , such as , Insert unified logging logic here .

further , How to realize the wrapping operation of the objective function ? Next, by declaring nested functions ( Or closure ) The way to achieve .

def around(ff):
def wrap():
print("before")
ff()
print("after")
return wrap
@around
def f(): print("f")
f() # before, f, after

Further , Considering that the objective function also has parameters and return values . therefore , We are wrap Set inside the closure *args and **kwargs Parameters are passed into the objective function .

def around(ff):
def wrap(*args, **kwargs):
print("before")
r = ff(*args, **kwargs)
print("after")
return r
return wrap
@around
def f(v1, v2): return v1 + v2
result = f(3, 4)
print(result) # before, after, 7

Further more , Decorator around It can also carry parameter values . To achieve this goal , The following sections create deeper nesting .

def around(param):
def deepwarp(ff):
def wrap(*args, **kwargs):
print(param)
print("before")
r = ff(*args, **kwargs)
print("after")
return r
return wrap
return deepwarp
@around(param="decorator param")
def f(v1, v2): return v1 + v2
result = f(3, 4)
print(result) # before, after, 7

We can summarize the order of parameter passing from the nesting level : Decorator function parameters > The objective function itself > Objective function parameters . This nesting has a more professional term , That is, function coritization ( currying ).

Functional programming *

The idea of functional programming is very suitable for writing stateless stream data processing system .

recursive

Like a generalized linked list , Trees can be considered recursively defined , Using recursive logic to deal with recursive data types is perfect . besides , Some backtracking problems , Dynamic programming is also very suitable for recursion . Here is a simple example : combination Python Section function , We only need three sentences to describe the logic of quick sorting .

# Each iteration focuses on only three parts : Than seq[mid] Smaller numbers ,seq[mid], Than seq[mid] A larger number .
def quicksort(seq: list) -> list:
if len(seq) <= 1: return seq
mid = int(len(seq) / 2)
return quicksort(seq[:mid]) + [seq[mid]] + quicksort(seq[mid + 1:])

The core idea of recursion is to decompose seemingly complex problems into combinations of repeated sub problems . meanwhile , Critical conditions have indicated when the program will exit , Therefore, semaphores are no longer needed ,break,continue Such a mechanism .

If recursion returns either a simple value , Or it is a recursive call to itself without other actions , Then it can be called tail recursion . Current quicksort Implementation is not tail recursion , Because it also carries out list combination when returning . Theoretically , Tail recursive function only needs one stack frame , Therefore, there will be no stack overflow problem . All tail recursion can be parsed as equivalent while perhaps for loop , This part of research can be called tail recursive optimization .

all Non tail recursive function Theoretically, there is a risk of stack overflow , This is compared to for, while An obvious drawback of this imperative iteration , But recursion is better than a more concise expression .

lambda expression

stay Python in , Function identifiers can be thought of as expression , Expressions can be passed anywhere in a program like variables . such as :

def f(): print("method")
invokable = f
print(callable(invokable)) # True
invokable()

And for some Simple expressions , Usually use lambda Expression declaration , Use lambda: keyword . Such as :

f1 = lambda: print("hello") # Parameters are not accepted lambda expression 
f2 = lambda x, y: print(f"x={x}, y={y}") # Accept parameters lambda expression 
f3 = lambda x, y: x + y # With return value lambda expression 

The operation result of the simple expression itself is the return value ( If there is no, it is considered to return None ), Therefore, there is no need to explicitly declare return keyword .

Functional converter

In addition to the list derivation above ,list -> list There is another writing style for the mapping of . Here we mainly introduce five commonly used converters :map,reduce,set,filter,zip.

map() and filter() Is a basic and commonly used operator . The former is used for list mapping , The latter is used for list element filtering . such as :

list = [1,2,3,4,5]
xs = map(lambda x: x + 1,list)
print(*xs) # 2, 3, 4, 5, 6
ys= filter(lambda x: x > 1,list)
print(*ys) # 2, 3, 4, 5

zip(), So it's called Siyi , It can stitch the elements of two lists into two tuples in order , And generate a new binary list . When the length of two lists is inconsistent , The elements behind the longer list are discarded .

key = ["c", "j", "g", "p"]
value = ["c++", "java", "golang"]
kws = zip(key,value)
print(*kws) # ('c', 'c++') ('j', 'java') ('g', 'golang')

Merge operator reduce() Need from functools The library introduces . If what we do is Collapsible data structure ( For example, record the type as A,B , And we defined [A] + [A] = [B] The law of ), So use reduce() Operators can quickly implement the specification ( use SQL In other words, aggregate function ). in fact ,zip() operator , Including the above mentioned sum(),max(),min() Can be considered as a special case of the merging operator .

class Foo:
def __init__(self, v_):
self.v = v_
from functools import reduce
xs = [Foo(1), Foo(2), Foo(3)]
merge = lambda foo1, foo2: Foo(foo1.v + foo2.v)
fooN = reduce(merge, xs)
print(fooN.v)

set() Operator is used to remove duplicate elements in the list , Or understand this method as a cast function from list to set . The element put in must be hashable type, See the collection above .

class Foo:
def __init__(self,v_):
self.v = v_
def __hash__(self): return hash(self.v,)
def __eq__(self, other): return self.v == other.v
xs = [2,2,3,4,5,Foo(1),Foo(1)]
x = set(xs)
print(x)

Coriolism and closure

Corrierization is an important means of memorizing parameters and delaying the execution of functions . Its core idea is to use nested closures to build a high-order function . such as :

def telnet(host):
def _port(port):
print(f"connect to {host}:{port}")
return _port
# telnet("192.168.229.140")(6379)
server = telnet("192.168.229.140")
server(6379)

Assume telnet Is a function for server connections , When the program needs to connect to different ports of the same machine , Function coritization can realize the reuse of configuration paths ( Or reuse input parameters ) The effect of :

master = telnet("192.168.229.140")
master(6379) # host = 192.168.229.140, port = 6379
master(8080) # host = 192.168.229.140, port = 6379
follower = telnet("192.168.229.141")
follower(6379) # host = 192.168.229.141, port = 6379
follower(8080) # host = 192.168.229.141, port = 6379 

It's troublesome to realize currying by nested closure definitions , And the order of parameter confirmation is fixed .python It provides a simpler way to curry functions , This requires the introduction of functools.partial() Function turns a strictly evaluated function into thunk.

def telnet(host, port): print(f"connect to {host}:{port}")
# The first parameter is the function ID , Represents the transfer function itself .
# partial(telnet, host="192.168.229.150")(port=6379)
leader = partial(telnet, host="192.168.229.150")
leader(port=6379)

In this case ,telnet() It's a normal function , and partial() Function will allow it to be coriarized in any order . Extra , The parameters here need to be in **kwargs In the form of .

Function combination

Given two functions f and g, They have two basic combinations :

  1. compose(f,g), Equivalent to f(g()), among g As f Input .
  2. andThen(f,g), Equivalent to g(f()), among f As g Input .

Its name is borrowed from Java/Scala The functional interface of , The laws of compose(f,g) == andThen(g,f) Always true . They are Python Can be defined as follows :

def compose(f, g):
def closure(*args, **kwargs): return f(g(*args, **kwargs))
return closure
def andThen(f, g):
def closure(*args, **kwargs): return g(f(*args, **kwargs))
return closure
f = compose(lambda t: t[0] + t[1], lambda x, y: (x + 1, y + 1))
print(f(2, 3)) # 7
g = andThen(lambda x, y: (x + 1, y + 1), lambda t: t[0] + t[1])
print(g(2,3))

This idea is often used in map Operator fusion , come from 1989 year Philip Wadler The paper of 《 Theorems for free 》, It's called the free theorem , Papers can be referred to free.dvi (ttic.edu).map Fusion can effectively avoid generating a large number of intermediate results during list conversion . such as :

lambda_1 = lambda x: x + 1
lambda_2 = lambda x: x * 2
xs = [1, 2, 3, 4, 5]
y1s = map(lambda_2, map(lambda_1, xs)) # Execute first λ1, Re execution λ2, The list was converted twice , Additionally, a list of intermediate results is generated .
y2s = map(andThen(lambda_1,lambda_2),xs) # First combine andThen(λ1, λ2), Another one-time conversion , There is no need to save intermediate results .
print(*y1s)
print(*y2s)

attach : Numerical analysis library

Numpy preview

The complete content can be moved to the official guide :Numpy and Scipy Documentation — Numpy and Scipy documentation, Here we only introduce the basic functions .

Numpy Is in the 2005 The library created and open source in , Designed to provide more than traditional Python List fast 50 Multiple array objects . To ensure higher performance ,Numpy There are almost 35% The part of is made up of C The realization of language . see :GitHub - numpy/numpy: The fundamental package for scientific computing with Python.

Numpy Commonly used in scientific computing , Therefore, in this chapter, the elements of the default array are all immutable values , And only discuss one-dimensional array and two-dimensional array .

Arrays and matrices

Use Numpy You can create an array ( array ). It can be regarded as the original Python Memory optimized version of list type ( adopt dtype Specify the element type to allocate more compact space from memory ), Therefore, it is compatible with various operations of the original list .

import numpy as np
arr1d = np.array([1, 2, 3, 4], dtype="int")
# Support list derivation 
print(*[x*2 for x in arr1d])
arr2d = np.array(
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]], dtype="int"
)
# Support ( High dimensional ) section 
sub = arr2d[0:2, 0:2]
""" [[1,2], [4,5]] """
print(*sub)

among , A two-dimensional array can be regarded as a matrix ( Matrix ) Or column vectors ( Vector ). You can call the array's shape Attribute determines the dimension . such as :

print(arr1d.shape) # (4,) -> The length is 4 One dimensional array of 
print(sub.shape) # (2, 2) -> 2×2 Matrix of 

Use reshape() Method can stack a one-dimensional array into a two-dimensional array .-1 Parameters have special semantics , Express " Any length ", It really depends on the shape of the original array . For example, do the following operations on a two-dimensional array :

  1. reshape(-1) Represents a one-dimensional array flattened to any length .
  2. reshape(-1, 1) Indicates that it becomes an arbitrary number of lines , But the number of columns is 1 The column vector , Is a two-dimensional array .
  3. reshape(1, -1) It means that the number of lines is 1, But a row vector with any number of columns , Is a two-dimensional array .
""" np.arange(n) amount to range(n) Of numpy edition . matrix: [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] """
matrix = np.arange(12).reshape(3, 4)
print(matrix)
# [ 0 1 2 3 4 5 6 7 8 9 10 11]
vect_1d = matrix.reshape(-1)
print(vect_1d)
# [[ 0 1 2 3 4 5 6 7 8 9 10 11]]
vect_v = matrix.reshape(1,-1)
print(vect_v)
""" [[ 0] [ 1] : : [11]] """
vect_h = matrix.reshape(-1,1)
print(vect_h)

For the stacking from one-dimensional array to two-dimensional array , Can pass order Parameter specifies the filling order . choose "C" Fill in horizontal order ( default ) perhaps "F" Fill in vertical order , such as :

 """ [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] """
m1 = np.arange(12).reshape([3,4],order="C")
print(m1)
""" [[ 0 3 6 9] [ 1 4 7 10] [ 2 5 8 11]] """
m2 = np.arange(12).reshape([3,4],order="F")
print(m2)

Through to the np.where() Function to pass in search criteria , You can get the subscript of the element that meets the condition . such as :

v1 = np.array([1, 2, 3, 3, 6, 3])
# The predicate 'v1 == 3' Medium 'v1' Express v1 Every element in the array .
indexes = np.where(v1 == 3)
# Express [2, 3, 5] The subscript has a value of 3 The elements of . 
print(indexes)

Array splicing

Numpy Support the splicing of arrays . For the splicing of one-dimensional arrays , According to the The stack ( stack ). Stacking can be divided into horizontal stacking ( hstack() function ) And vertical stacking ( stack() function ). The result of horizontal stacking is still a one-dimensional array , The result of vertical stacking will be a matrix .

v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
v3 = np.hstack([v1,v2]) # [1, 2, 3, 4, 5, 6]
print(v3)
""" [[1 2 3] [4 5 6]] """
v4 = np.stack([v1,v2])
print(v4)

and np.concatenate() Function supports matrix splicing , Similarly , Use here axis The splicing direction is marked . Default if no parameters are specified axis=1.

m1 = np.array([
[1, 2],
[3, 4]
])
m2 = np.array([
[5, 6],
[7, 8]
])
""" The number of columns remains the same , Append to the line ( Longitudinal splicing ) [[1 2] [3 4] [5 6] [7 8]] """
m3 = np.concatenate([m1, m2], axis=0)
print(m3)
""" The number of rows remains the same , Add to columns ( Horizontal splicing ) [[1 2 5 6] [3 4 7 8]] """
m4 = np.concatenate([m1, m2], axis=1)
print(m4)

linear algebra

Because two-dimensional arrays can be regarded as matrices or vectors , Therefore, we can extend the related calculation of linear algebra . such as , adopt matrix.T Can realize transpose operation :

matrix = np.arange(12).reshape(3, 4)
""" [[ 0 4 8] [ 1 5 9] [ 2 6 10] [ 3 7 11]] """
print(matrix.T)

Numpy Four functions are also provided to generate the identity matrix , whole 0 matrix , whole 1 matrix , And random number matrix . see :

n, m = 3, 4
E = np.eye(n) # Generate m Order unit matrix 
I = np.zeros([m, n]) # Generate m × n Weiquan 0 matrix 
K = np.ones([m, n]) # Generate m × n Weiquan 1 matrix 
H = np.empty([m, n]) # Generate m × n Random number matrix 

np.linalg.det() Function can expand the determinant of a matrix and calculate the evaluation , and np.linalg.inv() The inverse of a function solvable matrix ( The bottom layer is to solve Ax = E Solution x, Therefore, there is a problem of accuracy ). such as :

A = np.array([
[1,2],
[3,4]
],dtype=int)
det = int(np.linalg.det(A))
print(det) # 1*4 - 2*3
A_inverse = np.linalg.inv(A)
print(A_inverse)

Now suppose there is another matrix B. adopt np.matmul() Function or @ Operator ( Operator overloading ) You can calculate the product of matrices ( Does not satisfy the law of exchange ):

B = np.array([
[1,3],
[2,4]
], dtype=int)
""" [[1*1 + 2*2, 1*3 + 2*4], [3*1 + 4*2, 3*3 + 4*4]] """
print(np.matmul(A, B))
print(A @ B)

Specifically , Suppose there are two one-dimensional arrays a and b, At this time @ Treat as vector inner product ( Satisfying the commutative law ), The result is a numerical value . such as :

a = np.array([1, 2, 3], dtype=int)
b = np.array([4, 5, 6], dtype=int)
c = np.arrat([1, 2], dtype=int)
print(np.matmul(a,b))

adopt np.multiply() Function or * Operator can calculate the product of matrix and scalar ( Satisfying the commutative law ):

""" [[1*3, 2*3], [3*3, 4*3]] """
print(np.multiply(A,3))
print(A * 3)

The above method can use one np.dot() Function generalization . According to different input , It will represent different behaviors :

  1. When one of the two parameters is scalar , Equivalent to A * n.
  2. When both parameters are two-dimensional matrices , Equivalent to A @ B.
  3. When the two parameters are a two-dimensional matrix and a one-dimensional array , Promote a one-dimensional array to a two-dimensional column vector , Then make matrix product @.
  4. When both parameters are one-dimensional matrix , As vector inner product , namely a·b.

see :numpy in dot()、outer()、multiply() as well as matmul() The difference between - Simple books (jianshu.com)

# np.array_equal Used to compare whether two array elements are bitwise the same .
assert np.array_equal(np.dot(a, b), a @ b)
assert np.array_equal(np.dot(A, B), A @ B)
assert np.array_equal(np.dot(A, c), A @ c)
assert np.array_equal(np.dot(A, 3), A * 3)
print("test passed")

To distinguish , The operation of two-dimensional matrix is more recommended np.matmul() perhaps @.

Pandas preview

If you are familiar with SQL operation , that Pandas It will be very easy to get started . See the official guide for the complete content :User Guide — pandas 1.4.3 documentation (pydata.org), Here we only introduce the basic functions .

We mainly use Pandas Library processing Table structure The data of , They are abstracted as Dataframe type . First demonstrate the basic IO operation : First the Python The native data structure is transformed into Dataframe, Then output it to music.csv file , The procedure is demonstrated as follows :

import pandas as pd
# stay Pandas in , A single column of table data is called Series.
artist = ["Billie Holiday", "Jimi Hendrix", "Miles Davis", "SIA"]
genre = ["Jazz", "Rock", "Jazz", "Pop"]
listeners = [1_300_000, 2_700_000, 1_500_000, 2_000_000]
plays = [27_000_000, 70_000_000, 48_000_000, 74_000_000]
dict_ = {"artist": artist,
"genre": genre,
"listeners": listeners,
"plays": plays
}
# dict Of kw Yes, the column name information has been implied , Therefore, there is no need to explicitly specify columns.
df = pd.DataFrame(dict_)
# There is no need to generate additional index subscript sequences , So make the following settings : index=False.
df.to_csv(path_or_buf="music.csv",index=False)

pd.DataFrame Can be Python Native dict The dictionary structure is transformed into a table . among key Indicates the column name , and value Represents the data of this column . Another more natural idea is : Organize in rows DataFrame, Here, each line is abstracted into tuples .

table = [
("Billie Holiday", "Jazz", 1_300_000, 27_000_000),
("Jimi Hendrix", "Rock", 2_700_000, 70_000_000),
("Miles Davis", "Jazz", 1_500_000, 48_000_000),
("SIA", "Pop", 2_000_000, 74_000_000)
]
df = pd.DataFrame(data=table, columns=["artist", "genre", "listeners", "plays"])
df.to_csv(path_or_buf="music.csv", index=False)

The structure of the generated table is as follows :

artistgenrelistenerplaysBillie HolidayJazz130000027000000Jimi HendrixRock270000070000000Miles DavisJazz150000048000000SIAPop200000074000000

next step , Try pd.read_csv(path) Replace it with Dataframe Read into memory in the form of . By default ,Pandas Take the first row read as the header Header .

df = pd.read_csv("music.csv")
""" artist genre listeners plays 0 Billie Holiday Jazz 1300000 27000000 1 Jimi Hendrix Rock 2700000 70000000 2 Miles Davis Jazz 1500000 48000000 3 SIA Pop 2000000 74000000 """
print(df)

Data Extraction

Let's start with the most basic operation . To extract df A column of data , You can use overloaded [] Operator ( See operator overloading in the previous chapter ) Make a designation , Support to pass in a single column name , Or a list of multiple column names . The following shows how to extract a single column and multiple columns :

df["artist"] # Extract a column 
df[["artist","plays"]] # Extract multiple columns 

It is specially emphasized that , If only one column is extracted , What you get will be Series type ; If you extract multiple columns , What you get will be Dataframe type .

If you want to from df Extract the n That's ok , Then use df.loc[n] Realization . in fact , there n For index , By default it is 0 Starting data subscript . For example, extract the line number of the first line :

# Extract subscript 0 Yes row
# 0 Billie Holiday Jazz 1300000 27000000
row = df.loc[0]
# from row And then extract artist Column 
# Billie Holiday
print(row["artist"])

Pandas Provides a pre view of data n Line or last n The function of line :

n = 2
print(df.head(n)) # See the former n Row data 
print(df.tail(n)) # After viewing n Row data 

Filter ( WHERE The predicate ) Is the most basic table operation . such as , select plays The data is higher than 5000 Million data :

top_df = df[df["plays"] > 50_000_000]
print(top_df)

Another intuitive way is to call query() Method and directly transmit predicate sentences , The string content follows Python grammar . such as :

top_df = df.query("plays >= 50_000_000 and listeners >= 2_500_000")
print(top_df)

Suppose you want to start from genre The types of songs of musicians are summarized in the column , You can use unique() Method to remove the weight . such as :

col = df["genre"].unique()
# ['Jazz' 'Rock' 'Pop']
# col.tolist()
print(col)

array ( SORT BY The predicate ) It is also one of the common table operations . such as , Here, according to plays The data of the column is in descending order Desc array :

sorted_df = df.sort_values("plays",ascending=False)
""" artist genre listeners plays 3 SIA Pop 2000000 74000000 1 Jimi Hendrix Rock 2700000 70000000 2 Miles Davis Jazz 1500000 48000000 0 Billie Holiday Jazz 1300000 27000000 """
print(sorted_df)

Aggregation operation

You can refer to this excellent article :Pandas course | Super easy to use Groupby Usage details - You know (zhihu.com)

Pandas It's like SQL That provides aggregation operations . For example, we expect to follow genre Fields group tables , And count the number of rows in each group . It can be expressed as :

"""SQL: select genre, count(*) from df group by genre """
print(df.groupby("genre")["artist"].count())

except count() Outside the counting method , Other aggregation methods include :mean() mean value ,sum() Sum up ,median() Median ,min() minimum value ,max() Maximum ,var() variance ,std() Standard deviation . A more general method of aggregation is agg(), It allows to receive **kwargs To perform different aggregation operations on different columns ,key Is the column name ,value Is the aggregate function name . such as :

report_df = df.groupby("genre").agg({
"artist": "count",
"listeners": "mean",
"plays": "max"
})
""" artist listeners plays genre Jazz 2 1400000.0 48000000 Pop 1 2000000.0 74000000 Rock 1 2700000.0 70000000 """
print(report_df)

Conversion operation

Pandas Provided Dataframe yes variable data , This means that we can make changes to the original data table . such as , according to listensers and plays Calculate the influence of each musician score, And attach it as a new data column to the original data table :

df = pd.read_csv("music.csv")
""" artist genre listeners plays score 0 Billie Holiday Jazz 1300000 27000000 0.605 1 Jimi Hendrix Rock 2700000 70000000 1.535 2 Miles Davis Jazz 1500000 48000000 1.035 3 SIA Pop 2000000 74000000 1.580 """
# Assume score = 5 * listeners_count + 2 * plays
df["score"] = (df["listeners"] * 5 + df["plays"] * 2) / 10**8
print(df)

map() The function provides lambda The expression is right Single column data ( namely Series type ) The way of transformation . such as :

""" artist genre listeners plays 0 Billie Holiday Jazz 1300000 50mio- 1 Jimi Hendrix Rock 2700000 50mio+ 2 Miles Davis Jazz 1500000 50mio- 3 SIA Pop 2000000 50mio+ """
df["plays"] = df["plays"].map(lambda x: "50mio+" if x > 50_000_000 else "50mio-")
print(df)

If you want to base on multiple columns of data ( This is the case DataFrame type ) To transform , We need to introduce apply() Method . such as :

# axis=1 Indicates operation by data column 
df["scores"] = df[["listeners","plays"]].apply(lambda t: (t["listeners"]*5 + 2*t["plays"]) / 10**8, axis=1)
print(df)

Table joins

Can pass merge() Method implementation is similar to SQL Internal connection of table ( Default ), Left connection , The right connection , All external connection operation , adopt how Parameters to configure . such as :

info = [
("Billie Holiday", "US"),
("Jimi Hendrix", "US"),
("Justin bieber", "Canada"),
]
artist_info = pd.DataFrame(data=info, columns=["artist", "country"])
m0 = df.merge(artist_info) # how="inner"
m1 = df.merge(artist_info, how="outer")
m2 = df.merge(artist_info, how="left")
m3 = df.merge(artist_info, how="right")

Use Pandas Self provided concat() Function can connect two data tables . such as :

row = [("Justin Bieber", "Pop", 300_000, 1_000_000)]
append_df = pd.DataFrame(data=row, columns=["artist", "genre", "listeners", "plays"])
# axis=0 Means appending by line 
ndf = pd.concat([df, append_df], axis=0)
print(ndf)

appendix

In general ,Linux They all bring their own Python Interpreter . such as :

#!/bin/python
#!/bin/env python

shell Interpreter of script learning (#!) - Love code network (likecs.com)


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved