itertools Function tools for creating iterators more efficiently .
itertools The functions provided are influenced by Clojure,Haskell,APL and SML And so on . Their purpose is to use memory quickly and efficiently , And they are linked together to represent more complex iteration based algorithms .
Iterator based code provides better memory consumption characteristics than list based code . Because it is not generated from the iterator until the data needs to be used , All data does not need to be stored in memory at the same time . such “ inert ” The new processing mode can reduce the exchange of large data sets and other side effects , To improve performance .
except itertools Outside the function defined in , The examples in this article also use some built-in functions to iterate .
chain() Function takes multiple iterators as arguments , And return an iterator , So it generates all the input , Just like from a single iterator .
from itertools import chain
for i in chain([1, 2, 3], ['a', 'b', 'c']):
print(i, end=' ')
Use chain() You can easily handle multiple sequences , Instead of generating a larger sequence .
# OutPut
1 2 3 a b c
If the iterations to be combined are not all explicitly declared in advance , Or it needs inert calculation , You can use chain.from_iterable() To replace chain() .
from itertools import chain
def make_iterables_to_chain():
yield [1, 2, 3]
yield ['a', 'b', 'c']
for i in chain.from_iterable(make_iterables_to_chain()):
print(i, end=' ')
# OutPut
1 2 3 a b c
Python Built in functions zip() It also returns an iterator , But it combines elements of several iterators into a metagroup .
for i in zip([1, 2, 3], ['a', 'b', 'c']):
print(i)
zip() Like other functions in this module , Returns an iteratable object , Each iteration produces a value .
# OutPut
(1, 'a')
(2, 'b')
(3, 'c')
however , Use zip() When the first input iterator runs out ,zip() Will stop . If you want to process all the input , Even if the iterator produces a different number of values , Then you can use zip_longest() .
from itertools import zip_longest
r1 = range(3)
r2 = range(2)
print(' Use zip Will advance the result iteration :')
print(list(zip(r1, r2)))
print()
print('zip_longest All values will be processed :')
print(list(zip_longest(r1, r2)))
By default ,zip_longest() Will use None To fill in the value of the missing position . Use fillvalue Parameter to set different override values .
# OutPut
Use zip Will advance the result iteration :
[(0, 0), (1, 1)]
zip_longest All values will be processed :
[(0, 0), (1, 1), (2, None)]
islice() Function returns an iterator , Used to return the specified entry of the input iterator by index .
from itertools import islice
print('Stop at 5:')
for i in islice(range(100), 5):
print(i, end=' ')
print()
print('Start at 5, Stop at 10:')
for i in islice(range(100), 5, 10):
print(i, end=' ')
print()
print('By tens to 100:')
for i in islice(range(100), 0, 100, 10):
print(i, end=' ')
print()
islice() Receive the same parameters as the list slice :start , stop and step . start and step Parameters are optional .
# OutPut
Stop at 5:
0 1 2 3 4
Start at 5, Stop at 10:
5 6 7 8 9
By tens to 100:
0 10 20 30 40 50 60 70 80 90
tee() The function returns multiple independent iterators based on a single raw input ( The default is two ).
from itertools import islice, tee
r = islice(range(10), 5)
i1, i2 = tee(r)
print('i1:', list(i1))
print('i2:', list(i2))
tee() Have and Unix tee Utilities have similar semantics , It repeatedly reads values from its input and writes them to a named file and standard output . adopt tee() Function can provide the same set of data to multiple algorithms for parallel processing .
# OutPut
i1: [0, 1, 2, 3, 4]
i2: [0, 1, 2, 3, 4]
We need to pay attention to , from tee() The new iterators created will share their input , So don't use the input iterator after creating a new iterator .
from itertools import islice, tee
r = islice(range(10), 5)
i1, i2 = tee(r)
print(' Iterative primitives :', end=' ')
for i in r:
print(i, end=' ')
if i > 1:
break
print()
print('i1:', list(i1))
print('i2:', list(i2))
If the original iterator has consumed some values , Then the new iterator will not generate these values .
# OutPut
Iterative primitives : 0 1 2
i1: [3, 4]
i2: [3, 4]
Python Built in map() Function returns an iterator . The iterator calls the function... Based on the value in the input iterator , And return the result . When any one of the input iterators runs out, it immediately stops .
def times_two(x):
return 2 * x
def multiply(x, y):
return (x, y, x * y)
print(' Single input :')
for i in map(times_two, range(5)):
print(i, end=' ')
print('\n Multiple inputs :')
r1 = range(5)
r2 = range(5, 10)
for i in map(multiply, r1, r2):
print('{:d} * {:d} = {:d}'.format(*i))
print('\n Iteration stop :')
r1 = range(5)
r2 = range(2)
for i in map(multiply, r1, r2):
print(i)
In the first example , Function multiplies all input values by 2. In the second example , Function multiplies two parameters obtained from two separate iterators , And return a tuple containing the original parameter and the calculated value . In the third example , It stops after two tuples are generated , Because the second input has been exhausted .
# OutPut
Single input :
0 2 4 6 8
Multiple inputs :
0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36
Iteration stop :
(0, 0, 0)
(1, 1, 1)
starmap() Function and map() similar , But instead of constructing tuples from multiple iterators , But use * Syntax unpacks items in a single iterator as parameters to map function .
from itertools import starmap
values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]
for i in starmap(lambda x, y: (x, y, x * y), values):
print('{} * {} = {}'.format(*i))
If you use map() The function will be such a call f(i1,i2) , While using starmap() Is directly f(*i) .
#OutPut
0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36
count() The function returns an iterator that can produce consecutive integers indefinitely . The first number can be passed as a parameter ( The default value is 0). There is no upper limit parameter ( More control over the result set , See the built-in range()).
from itertools import count
for i in zip(count(1), ['a', 'b', 'c']):
print(i)
This example uses zip() And finite length list parameters, so it stops .
# OutPut
(1, 'a')
(2, 'b')
(3, 'c')
count() Of start and step Parameters can be any numeric value that can be added together .
import fractions
from itertools import count
start = fractions.Fraction(1, 3)
step = fractions.Fraction(1, 3)
for i in zip(count(start, step), ['a', 'b', 'c']):
print('{}: {}'.format(*i))
In this case , The starting point and step size are from Fraction ( fraction ) Modular fraction object .
# OutPut
1/3: a
2/3: b
1: c
cycle() The delta function is going to be : Returns an iterator , The iterator repeats the contents of the parameters given indefinitely . Because it has to remember everything about the input iterator , So if the iterator is very long , It may consume a considerable amount of memory .
from itertools import cycle
for i in cycle(['a', 'b', 'c']):
print(i)
If there is no interruption , It will cycle indefinitely .
# OutPut
a
b
c
a
b
...
repeat() The delta function is going to be : Returns an iterator , The iterator produces the same value each time it is accessed .
from itertools import repeat
for i in repeat('over-and-over', times=5):
print(i)
repeat() The returned iterator will continue to return data , Unless optional times Parameter to limit the number of times .
# OutPut
over-and-over
over-and-over
over-and-over
over-and-over
over-and-over
When you need to include a fixed value in the values of other iterators , Use repeat() And zip() or map() Combinations can be very useful .
from itertools import repeat, count
for i, s in zip(count(), repeat('over-and-over', 5)):
print(i, s)
In this case ,count Value and repeat() The returned constants are grouped together .
This sample uses map() Will be taken from 0 To 4 Multiply the number of by 2.
from itertools import repeat
for i in map(lambda x, y: (x, y, x * y), repeat(2), range(5)):
print('{:d} * {:d} = {:d}'.format(*i))
In this case repeat() There is no need to explicitly limit the number of iterations , because range() Only five elements are returned , map() Processing stops at the end of any of its inputs .
# OutPut
2 * 0 = 0
2 * 1 = 2
2 * 2 = 4
2 * 3 = 6
2 * 4 = 8
dropwhile() The delta function is going to be : Returns an iterator , Until the condition for the first time is false when , This iterator does not generate the elements of the input iterator until it starts .
from itertools import dropwhile
def should_drop(x):
print(' Input :', x)
return x < 1
for i in dropwhile(should_drop, [-1, 0, 1, 2, -2]):
print(' Produce :', i)
dropwhile() Each entry is not filtered ; When the first condition is false , All remaining items in the input are returned directly .
# OutPut
Input : -1
Input : 0
Input : 1
Produce : 1
Produce : 2
Produce : -2
And dropwhile() By contrast, takewhile() . It returns an iterator , As long as the test function returns true, This iterator returns the items in the input iterator .
from itertools import takewhile
def should_take(x):
print(' Input :', x)
return x < 1
for i in takewhile(should_take, [-1, 0, 1, 2, -2]):
print(' produce :', i)
once should_take() return False, takewhile() Stop processing input .
# OutPut
Input : -1
produce : -1
Input : 0
produce : 0
Input : 1
Python Built in functions filter() Is to return a containing test function return true Iterators for all items of .
def check_item(x):
print(' Input :', x)
return x < 1
for i in filter(check_item, [-1, 0, 1, 2, -2]):
print(' Produce :', i)
filter() differ dropwhile() and takewhile() Yes. ,filter() Each item is substituted into the test function before returning .
# OutPut
Input : -1
Produce : -1
Input : 0
Produce : 0
Input : 1
Input : 2
Input : -2
Produce : -2
filterfalse() Returns an iterator , This iterator contains only the test function returns false The item .
from itertools import filterfalse
def check_item(x):
print(' Input :', x)
return x < 1
for i in filterfalse(check_item, [-1, 0, 1, 2, -2]):
print(' Produce :', i)
Test functions check_item() The same as in the above example , But the result returned is exactly the same as filter() contrary .
# OutPut
Input : -1
Input : 0
Input : 1
Produce : 1
Input : 2
Produce : 2
Input : -2
compress() Provides another way to filter iteratable content . It is no longer a calling function , Instead, the values in another iteration are used to indicate when to accept values and when to ignore values .
from itertools import compress, cycle
every_third = cycle([False, False, True])
data = range(1, 10)
for i in compress(data, every_third):
print(i, end=' ')
compress() The first parameter of is the iteratable data that needs to be processed , The second parameter is the iteratively generated Boolean selector , Indicates which elements are taken from the data input (True Generated value ,False Ignore ).
# OutPut
3 6 9
groupby() Function returns an iterator , This iterator generates a set of values aggregated by a common key . The following example shows grouping related values based on attributes .
from itertools import groupby
import functools
import operator
import pprint
@functools.total_ordering
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return '({}, {})'.format(self.x, self.y)
def __eq__(self, other):
return (self.x, self.y) == (other.x, other.y)
def __gt__(self, other):
return (self.x, self.y) > (other.x, other.y)
# Generate Point Instance data set
data = list(map(Point, [1, 2, 3, 1, 2], range(5)))
print('Data:')
pprint.pprint(data, width=35)
print()
# To the disordered data Based on attributes x polymerization
print(' polymerization , disorder data:')
for k, g in groupby(data, operator.attrgetter('x')):
print(k, list(g))
print()
# Yes data Sort
data.sort()
print(' After ordering :')
pprint.pprint(data, width=35)
print()
# After sorting data Based on attributes X polymerization
print(' polymerization , Orderly data:')
for k, g in groupby(data, operator.attrgetter('x')):
print(k, list(g))
The input sequence needs to be sorted according to the key value before the expected aggregation results are output .
# OutPut
Data:
[(1, 0),
(2, 1),
(3, 2),
(1, 3),
(2, 4)]
polymerization , disorder data:
1 [(1, 0)]
2 [(2, 1)]
3 [(3, 2)]
1 [(1, 3)]
2 [(2, 4)]
After ordering :
[(1, 0),
(1, 3),
(2, 1),
(2, 4),
(3, 2)]
polymerization , Orderly data:
1 [(1, 0), (1, 3)]
2 [(2, 1), (2, 4)]
3 [(3, 2)]
accumulate() The delta function is going to be : Deal with iteratible inputs , Will be the first n and n+1 Item is passed to the objective function , Generate return value , Instead of directly returning input . The default function is to add two values , So you can use accumulate() To generate the cumulative sum of a series of numerical inputs .
from itertools import accumulate
print(list(accumulate(range(5))))
print(list(accumulate('abcde')))
If the input sequence is a non integer value , The result depends on putting the two terms “ Add up ” The meaning of being together . For example, the second item in the above example accumulate() It receives a string , The return is to splice the strings one by one .
# OutPut
[0, 1, 3, 6, 10]
['a', 'ab', 'abc', 'abcd', 'abcde']
meanwhile accumulate() Also accept custom functions with two inputs .
from itertools import accumulate
def f(a, b):
print(a, b)
return b + a
print(list(accumulate('abcde', f)))
# OutPut
a b
ba c
cba d
dcba e
['a', 'ba', 'cba', 'dcba', 'edcba']
If nested for Loop through multiple sequences can use product() , It generates an iterator , Its value is the Cartesian product of the set of input values .
from itertools import product
char = ['a', 'b', 'c']
integer = [1, 2, 3]
for each in product(char, integer):
print(each)
from product() The resulting value is a tuple , The members extracted from each iteration are passed as parameters in the order they are passed .
# OutPut
('a', 1)
('a', 2)
('a', 3)
('b', 1)
('b', 2)
('b', 3)
('c', 1)
('c', 2)
('c', 3)
If you want to calculate the Cartesian product of a sequence and itself , You need to specify repeat Parameters .
''' No one answers the problems encountered in learning ? Xiaobian created a Python Exchange of learning QQ Group :711312441 Looking for small partners who share the same aspiration , Help each other , There are also good video tutorials and PDF e-book ! '''
from itertools import product
char = ['a', 'b']
for each in product(char, repeat=2):
print(each)
for each in product(char, repeat=2):
print(each)
# OutPut
('a', 'a', 'a')
('a', 'a', 'b')
('a', 'b', 'a')
('a', 'b', 'b')
('b', 'a', 'a')
('b', 'a', 'b')
('b', 'b', 'a')
('b', 'b', 'b')
permutation() Function generates an array of the specified length from a combination of the input iterations . It generates a complete set of all permutations by default .
from itertools import permutations
for each in permutations('abc'):
print(each)
print()
for each in permutations('abc', r=2):
print(each)
Use r Parameter to limit the length of the returned single spread .
# OutPut
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')
If the output is to be unique , That is, it needs to be combined rather than arranged , Please use combination() . As long as the member entered is unique , The output will not contain any duplicate values .
''' No one answers the problems encountered in learning ? Xiaobian created a Python Exchange of learning QQ Group :711312441 Looking for small partners who share the same aspiration , Help each other , There are also good video tutorials and PDF e-book ! '''
from itertools import combinations
for each in combinations('abc', r=2):
print(each)
And permutations() The difference is , combination() Must pass in r Parameters .
# OutPut
('a', 'b')
('a', 'c')
('b', 'c')
because combination() Do not repeat a single input element , But consider that sometimes you need to include combinations of repeating elements . For these cases , have access to combinations_with_replacement() .
from itertools import combinations_with_replacement
for each in combinations_with_replacement('abc', r=2):
print(each)
In this output , Each input item is combined with itself and all other members of the input sequence .
('a', 'a')
('a', 'b')
('a', 'c')
('b', 'b')
('b', 'c')
('c', 'c')