您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Teach you how to operate files with Python

編輯：Python

Python There are several built-in modules and methods to handle files in . These methods are divided into, for example os, os.path , shutil and pathlib And so on . The article will list Python The most common operations and methods for files in .

In this article , How will you learn ：

Get file properties
Create directory
File name pattern matching
Traverse directory tree
Create temporary files and directories
Delete files and directories
Copy 、 Move and rename files and directories
Create and unzip ZIP and TAR archives
Use fileinput Module opens multiple files

Python Reading and writing of file data in

Use Python It's very easy to read and write files . So , You must first open the file in the appropriate mode . Here is an example of how to open a text file and read its contents .

with open('data.txt', 'r') as f:
data = f.read()
print('context: {}'.format(data))
Copy code

open() Receive a filename and a pattern as its parameters ,r Indicates that the file is opened in read-only mode . If you want to write data to a file , Then use w As a parameter .

with open('data.txt', 'w') as f:
data = 'some data to be written to the file'
f.write(data)
Copy code

In the above example ,open() Opens a file for reading or writing and returns a file handle ( In this example f ), This handle provides methods that can be used to read or write file data . read Working With File I/O in Python Learn more about how to read and write files .

Get directory list

Suppose your current working directory has a name my_directory A subdirectory , This directory contains the following ：

.
├── file1.py
├── file2.csv
├── file3.txt
├── sub_dir
│ ├── bar.py
│ └── foo.py
├── sub_dir_b
│ └── file4.txt
└── sub_dir_c
├── config.py
└── file5.txt
Copy code

Python Built in os The module has many useful methods to list the contents of the catalog and filter the results . To get a list of all files and folders for a specific directory in the file system , You can Python Use in os.listdir() or stay Python 3.x Use in os.scandir() . If you also want to get file and directory properties ( Such as file size and modification date ), that os.scandir() Is the preferred method .

Using legacy versions of Python Get directory list

import os
entries = os.listdir('my_directory')
Copy code

os.listdir() Return to one Python list , It includes path The name of the file and subdirectory of the directory indicated by the parameter .

['file1.py', 'file2.csv', 'file3.txt', 'sub_dir', 'sub_dir_b', 'sub_dir_c']
Copy code

The list of contents doesn't look easy to read right now , Yes os.listdir() The results of the call to help you view .

for entry in entries:
print(entry)
"""
file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
"""
Copy code

Using modern versions of Python Get directory list

In modern times Python In the version , have access to os.scandir() and pathlib.Path To replace os.listdir() .

os.scandir() stay Python 3.5 Cited in , Its documentation is PEP 471 .

os.scandir() Call returns an iterator instead of a list .

import os
entries = os.scandir('my_directory')
print(entries)
# <posix.ScandirIterator at 0x105b4d4b0>
Copy code

ScandirIterator Points to all entries in the current directory . You can traverse the contents of the iterator , And print the file name .

import os
with os.scandir('my_directory') as entries:
for entry in entries:
print(entry.name)
Copy code

here os.scandir() and with Statement together , Because it supports context management protocol . Use the context manager to close the iterator and automatically release the acquired resources after the iterator is exhausted . stay my_directory The result of printing the filename is as follows os.listdir() As you can see in the example ：

file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
Copy code

Another way to get a list of directories is to use the pathlib modular ：

from pathlib import Path
entries = Path('my_directory')
for entry in entries.iterdir():
print(entry.name)
Copy code

pathlib.Path() The return is PosixPath or WindowsPath object , It depends on the operating system .

pathlib.Path() The object has a .iterdir() To create an iterator that contains all the files and directories in the directory . from .iterdir() Each generated entry contains information about the file or directory , For example, its name and file properties .pathlib stay Python3.4 Was first introduced , And yes. Python A good reinforcement , It provides an object-oriented interface to the file system .

In the example above , You call pathlib.Path() And passed in a path parameter . And then call .iterdir() To get my_directory List of all files and directories under .

pathlib Provides a set of classes , Provides most common operations on the path in a simple and object-oriented way . Use pathlib Compared with the use of os Functions in are more efficient . and os comparison , Use pathlib Another benefit is that it reduces the number of packages or modules imported by the operating file system path . Want to learn more , You can read Python 3’s pathlib Module: Taming the File System .

Running the above code will get the following results :

file1.py
file2.csv
file3.txt
sub_dir
sub_dir_b
sub_dir_c
Copy code

Use pathlib.Path() or os.scandir() To replace os.listdir() Is the preferred way to get a list of directories , Especially when you need to get file type and file attribute information .pathlib.Path() Provided in os and shutil Most of the functions that handle files and paths in , And its approach is more efficient than these modules . We will discuss how to get file properties quickly .

function

describe

os.listdir()

Return all files and folders in the directory as a list

os.scandir()

Returns an iterator containing all objects in the directory , Object contains file property information

pathlib.Path().iterdir()

Returns an iterator containing all objects in the directory , Object contains file property information

These functions return a list of everything in the directory , Include subdirectories . It may not always be the result you always want , The next section will show you how to filter results from a list of directories .

List all files in the directory

This section will show you how to use os.listdir() ,os.scandir() and pathlib.Path() Print out the name of the file in the directory . To filter directories and list only os.listdir() Files for the generated directory list , To use os.path ：

import os
basepath = 'my_directory'
for entry in os.listdir(basepath):
# Use os.path.isfile Determine whether the path is a file type
if os.path.isfile(os.path.join(base_path, entry)):
print(entry)
Copy code

Call it here os.listdir() Returns a list of all content in the specified path , Then use os.path.isfile() Filter the list to show only file types, not directory types . The code executes as follows ：

file1.py
file2.csv
file3.txt
Copy code

A simpler way to list all the files in a directory is to use the os.scandir() or pathlib.Path() :

import os
basepath = 'my_directory'
with os.scandir(basepath) as entries:
for entry in entries:
if entry.is_file():
print(entry.name)
Copy code

Use os.scandir() Compared with os.listdir() Looks clearer and easier to understand . Yes ScandirIterator Every call to entry.isfile() , If you return True It means that this item is a file . The output of the above code is as follows ：

file1.py
file3.txt
file2.csv
Copy code

next , Show how to use pathlib.Path() List files in a directory ：

from pathlib import Path
basepath = Path('my_directory')
for entry in basepath.iterdir():
if entry.is_file():
print(entry.name)
Copy code

stay .iterdir() Every call generated .is_file() . The output is the same as above ：

file1.py
file3.txt
file2.csv
Copy code

If you will for Circulation and if Statements combined into a single generator expression , The above code can be more concise . About builder expressions , Recommend an article Dan Bader The article .

The revised version is as follows ：

from pathlib import Path
basepath = Path('my_directory')
files_in_basepath = (entry for entry in basepath.iterdir() if entry.is_file())
for item in files_in_basepath:
print(item.name)
Copy code

The execution result of the above code is the same as before . This section shows the use of os.scandir() and pathlib.Path() Filter files or directories using os.listdir() and os.path More intuitive , Code looks simpler .

List subdirectories

If you want to list subdirectories instead of files , Please use the following method . Now show me how to use os.listdir() and os.path() :

import os
basepath = 'my_directory'
for entry in os.listdir(basepath):
if os.path.isdir(os.path.join(basepath, entry)):
print(entry)
Copy code

When you call multiple times os.path,join() when , Operating the file system in this way becomes cumbersome . Running this code on my computer produces the following output ：

sub_dir
sub_dir_b
sub_dir_c
Copy code

Here's how to use os.scandir() ：

import os
basepath = 'my_directory'
with os.scandir(basepath) as entries:
for entry in entries:
if entry.is_dir():
print(entry.name)
Copy code

Same as the example in the file list , Here in os.scandir() Called on each item returned .is_dir() . If this is a directory , be is_dir() return True, And print out the name of the directory . The output is the same as above ：

sub_dir_c
sub_dir_b
sub_dir
Copy code

Here's how to use pathlib.Path() ：

from pathlib import Path
basepath = Path('my_directory')
for entry in basepath.iterdir():
if entry.is_dir():
print(entry.name)
Copy code

stay .iterdir() Called on each item returned by the iterator is_dir() Check whether it is a file or a directory . If the item is a directory , Then print its name , And the output generated is the same as in the previous example ：

sub_dir_c
sub_dir_b
sub_dir
Copy code

Get file properties

Python It is easy to get file attributes such as file size and modification time . By using os.stat() , os.scandir() or pathlib.Path To get .

os.scandir() and pathlib.Path() The directory list containing file attributes can be obtained directly . This may be better than using os.listdir() It is more efficient to list files and get file attribute information of each file .

The following example shows how to get my_directory Last modified time of file in . Output as time stamp ：

import os
with os.scandir('my_directory') as entries:
for entry in entries:
info = entry.stat()
print(info.st_mtime)
"""
1548163662.3952665
1548163689.1982062
1548163697.9175904
1548163721.1841028
1548163740.765162
1548163769.4702623
"""
Copy code

os.scandir() Return to one ScandirIterator object .ScandirIterator Each item in the object has .stat() Method to get information about the file or directory it points to ..stat() Provides information such as file size and last modified time . In the example above , Code printed st_time attribute , This property is the last time the file content was modified .

pathlib Modules have corresponding methods , File information for getting the same results :

from pathlib import Path
basepath = Path('my_directory')
for entry in basepath.iterdir():
info = entry.stat()
print(info.st_mtime)
"""
1548163662.3952665
1548163689.1982062
1548163697.9175904
1548163721.1841028
1548163740.765162
1548163769.4702623
"""
Copy code

In the example above , loop .iterdir() Iterator returned by calling .stat() To get file properties .st_mtime Property is a floating-point value , Time stamp . In order to make st_time The returned value is easier to read , You can write an auxiliary function to convert it to a datetime object ：

import datetime
from pathlib import Path
def timestamp2datetime(timestamp, convert_to_local=True, utc=8, is_remove_ms=True)
"""
transformation UNIX Timestamp datetime object
:param timestamp: Time stamp
:param convert_to_local: Transfer to local time
:param utc: Time zone information , China is utc+8
:param is_remove_ms: Remove milliseconds or not
:return: datetime object
"""
if is_remove_ms:
timestamp = int(timestamp)
dt = datetime.datetime.utcfromtimestamp(timestamp)
if convert_to_local:
dt = dt + datetime.timedelta(hours=utc)
return dt
def convert_date(timestamp, format='%Y-%m-%d %H:%M:%S'):
dt = timestamp2datetime(timestamp)
return dt.strftime(format)
basepath = Path('my_directory')
for entry in basepath.iterdir():
if entry.is_file()
info = entry.stat()
print('{} Last modified on {}'.format(entry.name, timestamp2datetime(info.st_mtime)))
Copy code

First get my_directory List of files in and their properties , And then call convert_date() To convert the last modification time of the file to display it in a human readable way .convert_date() Use .strftime() take datetime Type to string .

Output of the above code ：

file3.txt Last modified on 2019-01-24 09:04:39
file2.csv Last modified on 2019-01-24 09:04:39
file1.py Last modified on 2019-01-24 09:04:39
Copy code

The syntax for converting dates and times to strings can be confusing .

Create directory

Sooner or later, your program needs to create a directory in which to store data . os and pathlib Contains the function to create a directory . We will consider the following methods ：

Method

describe

os.mkdir()

Create a single subdirectory

os.makedirs()

Create multiple directories , Include intermediate directory

Pathlib.Path.mkdir()

Create a single or multiple directory

Create a single directory

To create a single directory , Pass directory path as parameter to os.mkdir() :

import os
os.mkdir('example_directory')
Copy code

If the directory already exists ,os.mkdir() Will throw out FileExistsError abnormal . perhaps , You can also use pathlib To create a directory :

from pathlib import Path
p = Path('example_directory')
p.mkdir()
Copy code

If the path already exists ,mkdir() Will throw out FileExistsError abnormal :

FileExistsError: [Errno 17] File exists: 'example_directory'
Copy code

To avoid throwing errors like this , Capture errors when they occur and let your users know :

from pathlib import Path
p = Path('example_directory')
try:
p.mkdir()
except FileExistsError as e:
print(e)
Copy code

perhaps , You can give it .mkdir() Pass in exist_ok=True Parameter to ignore FileExistsError abnormal :

from pathlib import Path
p = Path('example_directory')
p.mkdir(exist_ok=True)
Copy code

If directory already exists , No error will be caused .

Create multiple directories

os.makedirs() and os.mkdir() similar . The difference between the two is ,os.makedirs() Not only can you create separate directories , You can also recursively create a directory tree . let me put it another way , It can create any necessary intermediate folder , To ensure that it is kept in the complete path .

os.makedirs() And in bash Run in mkdir -p similar . for example , To create a set of directory images 2018/10/05, You can do it as follows :

import os
os.makedirs('2018/10/05', mode=0o770)
Copy code

The above code creates 2018/10/05 Directory structure and provides read for owners and group users 、 Write and execute permissions . The default mode is 0o777 , Added permissions to other user groups .

function tree Command to confirm the permission of our application :

$ tree -p -i .
.
[drwxrwx---] 2018
[drwxrwx---] 10
[drwxrwx---] 05
Copy code

The above code prints out the directory tree of the current directory . tree Usually used to list the contents of a directory in a tree structure . Pass in -p and -i Parameter will print the directory name and its file permission information in a vertical list .-p For export file permissions ,-i For use tree Command to generate a vertical list without indents .

As you can see , All directories own 770 jurisdiction . Another way to create multiple directories is to use the pathlib.Path Of .mkdir() :

from pathlib import Path
p = Path('2018/10/05')
p.mkdir(parents=True, exist_ok=True)
Copy code

By giving Path.mkdir() Pass on parents=True Keyword parameters make it create 05 Directory and all parent directories that make its path valid .

By default ,os.makedirs() and pathlib.Path.mkdir() Will be thrown when the target directory exists OSError . By passing exist_ok=True This behavior can be overridden as a key parameter （ from Python3.2 Start ）.

Running the above code will result in a structure like the following ：

└── 2018
└── 10
└── 05
Copy code

I prefer to use the pathlib , Because I can use the same function method to create one or more directories .

File name pattern matching

Use one of the above methods to get the list of files in the directory , You may want to search for files that match a specific pattern .

Here are the methods and functions you can use ：

endswith() and startswith() String method
fnmatch.fnmatch()
glob.glob()
pathlib.Path.glob()

These methods and functions are discussed below . The examples in this section will be shown in the some_directory Under the directory of , The directory has the following structure ：

.
├── admin.py
├── data_01_backup.txt
├── data_01.txt
├── data_02_backup.txt
├── data_02.txt
├── data_03_backup.txt
├── data_03.txt
├── sub_dir
│ ├── file1.py
│ └── file2.py
└── tests.py
Copy code

If you are using Bash shell, You can use the following command to create the above directory structure :

mkdir some_directory
cd some_directory
mkdir sub_dir
touch sub_dir/file1.py sub_dir/file2.py
touch data_{01..03}.txt data_{01..03}_backup.txt admin.py tests.py
Copy code

This will create some_directory Directory and enter it , Then create a sub_dir . The next line is sub_dir establish file1.py and file2.py , The last line uses the extension to create all other files .

Use string method

Python There are several built-in Modifying and manipulating strings Methods . When matching file names , Two of them .startswith() and .endswith() Very useful . To do that , First, get a list of directories , Then traverse .

import os
for f_name in os.listdir('some_directory'):
if f_name.endswith('.txt'):
print(f_name)
Copy code

The above code is found some_directory All files in , Traverse and use .endswith() To print all extensions as .txt The name of the file . The running code output on my computer is as follows :

data_01.txt
data_01_backup.txt
data_02.txt
data_02_backup.txt
data_03.txt
data_03_backup.txt
Copy code

Use `fnmatch` Simple filename pattern matching

The ability of string method matching is limited .fnmatch There are more advanced functions and methods for pattern matching . We will consider using fnmatch.fnmatch() , This is a supported use * and ? Functions with equal wildcards . for example , Use fnmatch Find all in directory .txt file , You can do that :

import os
import fnmatch
for f_name in os.listdir('some_directory'):
if fnmatch.fnmatch(f_name, '*.txt'):
print(f_name)
Copy code

iteration some_directory List of files in , And use .fnmatch() For extension .txt Perform wildcard search for .

More advanced pattern matching

Suppose you want to find a .txt file . for example , You may point to finding a single containing data Of .txt file , A set of numbers between underscores , And words in the filename backup . Is similar to the data_01_backup, data_02_backup, or data_03_backup .

You can use it like this fnmatch.fnmatch() :

import os
import fnmatch
for f_name in os.listdir('some_directory'):
if fnmatch.fnmatch(f_name, 'data_*_backup.txt'):
print(f_name)
Copy code

Here just print out the match data_*_backup.txt File name of the schema . In the pattern * Will match any character , So running this code will look up the filename to data Start with backup.txt All text files for , As shown in the following output :

data_01_backup.txt
data_02_backup.txt
data_03_backup.txt
Copy code

Use `glob` File name pattern matching

Another useful pattern matching module is glob .

.glob() stay glob The left and right in the module are like fnmatch.fnmatch(), But with the fnmach.fnmatch() The difference is , It will take . Files at the beginning are treated as special files .

UNIX And related systems use wildcard images in the file list ? and * Indicates full match .

for example , stay UNIX shell Use in mv *.py python_files Mobile all .py Extension From the current directory to python_files . this * Is a wildcard representing any number of characters ,*.py It's a full model .Windows This is not available in the operating system shell function . but glob Modules in Python This feature is added to , bring Windows The program can use this feature .

Here's a use glob Module queries all in the current directory Python Code file :

import glob
print(glob.glob('*.py'))
Copy code

glob.glob('*.py') Search current directory for .py File with extension , And return them as a list . glob And support shell Style to match :

import glob
for name in glob.glob('*[0-9]*.txt'):
print(name)
Copy code

This will find text files with numbers in all filenames (.txt) :

data_01.txt
data_01_backup.txt
data_02.txt
data_02_backup.txt
data_03.txt
data_03_backup.txt
Copy code

glob It is also easy to search files recursively in subdirectories :

import glob
for name in glob.iglob('**/*.py', recursive=True):
print(name)
Copy code

This example uses glob.iglob() Search the current directory and subdirectories for all .py file . Pass on recursive=True As .iglob() Parameter to search the current directory and subdirectories .py file .glob.glob() and glob.iglob() The difference is ,iglob() Returns an iterator instead of a list .

Running the above code will result in the following results :

admin.py
tests.py
sub_dir/file1.py
sub_dir/file2.py
Copy code

pathlib It also includes similar methods to obtain the list of files flexibly . The following example shows that you can use the .Path.glob() List in letters p File list of file types started .

from pathlib import Path
p = Path('.')
for name in p.glob('*.p*'):
print(name)
Copy code

call p.glob('*.p*') Returns a letter that points to all extensions in the current directory p Generator object for the beginning file .

Path.glob() And the ones discussed above os.glob() similar . As you can see , pathlib Mixed a lot os , os.path and glob Best features of a module into a module , This makes it easy to use .

Take a look back. , This is the menu we introduced in this section :

function

describe

startswith()

Test whether a string starts with a specific pattern , return True or False

endswith()

Test whether a string ends in a specific pattern , return True or False

fnmatch.fnmatch(filename, pattern)

Test if the filename matches this pattern , return True or False

glob.glob()

Returns a list of filenames that match the pattern

pathlib.Path.glob()

Returns a generator object that matches the pattern

Traverse directories and process files

A common programming task is to traverse the tree and process the files in the tree . Let's explore how to use the built-in Python function os.walk() To achieve this function .os.walk() Used to generate file names in a directory tree by traversing the tree from top to bottom or from bottom to top . For the purposes of this section , We want to operate the following tree :

├── folder_1
│ ├── file1.py
│ ├── file2.py
│ └── file3.py
├── folder_2
│ ├── file4.py
│ ├── file5.py
│ └── file6.py
├── test1.txt
└── test2.txt
Copy code

Here is an example , Show me how to use os.walk() List all files and directories in the tree .

os.walk() The default is to traverse the directory from top to bottom :

import os
for dirpath, dirname, files in os.walk('.'):
print(f'Found directory: {dirpath}')
for file_name in files:
print(file_name)
Copy code

os.walk() Return three values in each loop ：

The name of the current folder
List of subfolders in the current folder
List of files in the current folder

In each iteration , It prints out the names of subdirectories and files it finds ：

Found directory: .
test1.txt
test2.txt
Found directory: ./folder_1
file1.py
file3.py
file2.py
Found directory: ./folder_2
file4.py
file5.py
file6.py
Copy code

To traverse the tree from the bottom up , Will topdown=False Key parameter passed to os.walk() ：

for dirpath, dirnames, files in os.walk('.', topdown=False):
print(f'Found directory: {dirpath}')
for file_name in files:
print(file_name)
Copy code

Pass on topdown=False Parameters will make os.walk() First print out the files it found in the subdirectory :

Found directory: ./folder_1
file1.py
file3.py
file2.py
Found directory: ./folder_2
file4.py
file5.py
file6.py
Found directory: .
test1.txt
test2.txt
Copy code

As you can see , The program lists the contents of the subdirectory before the contents of the root directory . This is useful when you want to recursively delete files and directories . You will learn how to do this in the following sections . By default ,os.walk Directories created through soft connections are not accessed . By using followlinks = True Parameter to override the default behavior .

Create temporary files and directories

Python Provides tempfile Module to easily create temporary files and directories .

tempfile You can open and store temporary data in a file or directory while your program is running . tempfile These temporary files will be deleted after your program stops running .

Now? , Let's see how to create a temporary file :

from tempfile import TemporaryFile
# Create a temporary file and write some data to it
fp = TemporaryFile('w+t')
fp.write('Hello World!')
# Back to the beginning , Reading data from a file
fp.seek(0)
data = fp.read()
print(data)
# Close file , After that, he will be deleted
fp.close()
Copy code

The first step is from tempfile Module import TemporaryFile . Next , Use TemporaryFile() Method and pass in a pattern that you want to open to create an object like file . This creates and opens a file that can be used as a temporary storage area .

In the example above , The model is w + t, This makes tempfile Create temporary text file in write mode . It is not necessary to provide a filename for a temporary file , Because after the script runs, it will be destroyed .

After writing to file , You can read from it and close it after processing . Once the file is closed , Will be removed from the file system . Use if you need to name tempfile Temporary files generated , Please use tempfile.NamedTemporaryFile() .

Use tempfile Temporary files and directories created are stored in a special system directory used to store temporary files . Python The directory list will search for directories in which users can create files .

stay Windows On , The contents are in order C:\TEMP,C:\TMP,\TEMP and \TMP. On all other platforms , The contents are in order / tmp,/var/tmp and /usr/tmp . If none of the above directories exist ,tempfile Temporary files and directories will be stored in the current directory .

.TemporaryFile() Also a context manager , So it can work with with Statement together . Using context manager will automatically close and delete files after reading them ：

with TemporaryFile('w+t') as fp:
fp.write('Hello universe!')
fp.seek(0)
fp.read()
# Temporary files are now closed and deleted
Copy code

This creates a temporary file and reads data from it . Once the contents of the file are read , The temporary file is closed and removed from the file system .

tempfile Can also be used to create temporary directories . Let's see how to use it tempfile.TemporaryDirectory() To do that ：

import tempfile
import os
tmp = ''
with tempfile.TemporaryDirectory() as tmpdir:
print('Created temporary directory ', tmpdir)
tmp = tmpdir
print(os.path.exists(tmpdir))
print(tmp)
print(os.path.exists(tmp))
Copy code

call tempfile.TemporaryDirectory() A temporary directory is created in the file system , And returns an object representing the directory . In the example above , Create directory using context manager , The name of the directory is stored in tmpdir variable . The third line prints the name of the temporary directory ,os.path.exists(tmpdir) To confirm whether the directory is actually created in the file system .

After the context manager exits the context , Temporary directory will be deleted , And right os.path.exists(tmpdir) Will return False, This means that the directory has been successfully deleted .

Delete files and directories

You can use os,shutil and pathlib Method in module to delete a single file , Directory and entire tree . Here's how to delete files and directories that you no longer need .

Python Delete files in

To delete a single file , Please use pathlib.Path.unlink(),os.remove() or os.unlink().

os.remove() and os.unlink() Semantically the same . To use os.remove() Delete file , Do the following ：

import os
data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'
os.remove(data_file)
Copy code

Use os.unlink() Deleting files and using os.remove() In a similar way ：

import os
data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'
os.unlink(data_file)
Copy code

Called on file .unlink() or .remove() The file is removed from the file system . If the paths passed to them point to directories instead of files , These two functions will throw OSError . To avoid this situation , You can check whether the content you want to delete is a file , And delete it when confirming it is a file , Or you can use exception handling OSError ：

import os
data_file = 'home/data.txt'
# Delete if type is file
if os.path.is_file(data_file):
os.remove(data_file)
else:
print(f'Error: {data_file} not a valid filename')
Copy code

os.path.is_file() Check data_file Is it actually a file . If it is , By calling os.remove() Delete it . If data_file Point to folder , An error message is output to the console .

The following example shows how to use exception handling to handle errors when deleting files ：

import os
data_file = 'home/data.txt'
# Use exception handling
try:
os.remove(data_file)
except OSError as e:
print(f'Error: {data_file} : {e.strerror}')
Copy code

The above code attempts to delete the file before checking its type . If data_file It's not really a file , Thrown out OSError Will be in except Clause , And output error messages to the console . Use of printed error messages Python f-strings format .

Last , You can still use it pathlib.Path.unlink() Delete file ：

from pathlib import Path
data_file = Path('home/data.txt')
try:
data_file.unlink()
except IsADirectoryError as e:
print(f'Error: {data_file} : {e.strerror}')
Copy code

This will create a file named data_file Of Path object , The object points to a file . stay data_file On the call .unlink（） Will delete home / data.txt . If data_file Directing directory , The cause IsADirectoryError . It is worth noting that , above Python The program has the same permissions as the user running it . If the user does not have permission to delete the file , Will trigger PermissionError .

Delete directory

The standard library provides the following functions to delete directories :

os.rmdir()
pathlib.Path.rmdir()
shutil.rmtree()

To delete a single directory or folder, you can use the os.rmdir() or pathlib.Path.rmdir() . These two functions are only valid when you delete an empty directory . If the directory is not empty , It will be thrown out. OSError . Here's how to delete a folder :

import os
trash_dir = 'my_documents/bad_dir'
try:
os.rmdir(trash_dir)
except OSError as e:
print(f'Error: {trash_dir} : {e.strerror}')
Copy code

Now? ,trash_dir Have gone through os.rmdir() Been deleted . If the directory is not empty , The error message will be printed on the screen :

Traceback (most recent call last):
File '<stdin>', line 1, in <module>
OSError: [Errno 39] Directory not empty: 'my_documents/bad_dir'
Copy code

Again , You can also use pathlib To delete a directory :

from pathlib import Path
trash_dir = Path('my_documents/bad_dir')
try:
trash_dir.rmdir()
except OSError as e:
print(f'Error: {trash_dir} : {e.strerror}')
Copy code

We created one here Path Object points to the directory to be deleted . If the directory is empty , call Path Object's .rmdir() Method to delete it .

Delete full tree

To delete a non empty directory and a full tree ,Python Provides shutil.rmtree() :

import shutil
trash_dir = 'my_documents/bad_dir'
try:
shutil.rmtree(trash_dir)
except OSError as e:
print(f'Error: {trash_dir} : {e.strerror}')
Copy code

When calling shutil.rmtree() when ,trash_dir All content in will be deleted . In some cases , You may want to delete empty folders recursively . You can use one of the methods discussed above to combine os.walk() To do this :

import os
for dirpath, dirnames, files in os.walk('.', topdown=False):
try:
os.rmdir(dirpath)
except OSError as ex:
pass
Copy code

This will traverse the tree and try to delete every directory it finds . If the directory is not empty , The cause OSError And skip the directory . The following table lists the functions covered in this section ：

function

describe

os.remove()

Delete single file , Cannot delete directory

os.unlink()

and os.remove() equally , Function delete single file

pathlib.Path.unlink()

Delete single file , Cannot delete directory

os.rmdir()

Delete an empty directory

pathlib.Path.rmdir()

Delete an empty directory

shutil.rmtree()

Delete full tree , Can be used to delete non empty directories

Copy 、 Move and rename files and directories

Python Incidental shutil modular . shutil yes shell Abbreviation for utility . It provides many advanced operations for files , To support file and directory replication , Archive and delete . In this section , You will learn how to move and copy files and directories .

Copy file

shutil Provides functions for copying files . The most common functions are shutil.copy() and shutil.copy2() . Use shutil.copy() Copy files from one location to another , Do the following ：

import shutil
src = 'path/to/file.txt'
dst = 'path/to/dest_dir'
shutil.copy(src, dst)
Copy code

shutil.copy() And based on UNIX In the system of cp Command equivalent . shutil.copy(src,dst) Will file src Copied to the dst Location specified in . If dst It's a document , The contents of the file will be replaced with src The content of . If dst Is a directory , be src Will be copied to this directory . shutil.copy() Copy only the contents of the file and the permissions of the file . Other metadata （ Such as the creation and modification time of files ） No reservation .

To retain all file metadata at copy time , Please use shutil.copy2() ：

import shutil
src = 'path/to/file.txt'
dst = 'path/to/dest_dir'
shutil.copy2(src, dst)
Copy code

Use .copy2() Keep details about files , For example, last visit time , Permission bits , Last modified time and flag .

duplicate catalog

although shutil.copy() Copy only a single file , but shutil.copytree() The entire catalog and everything contained in it will be copied . shutil.copytree(src,dest) Receive two parameters ： Source directory and destination directory to which files and folders are copied .

Here is an example of how to copy the contents of a folder to another location ：

import shutil
dst = shutil.copytree('data_1', 'data1_backup')
print(dst) # data1_backup
Copy code

In this example ,.copytree() take data_1 Copy the contents of to a new location data1_backup And return to the target directory . Destination directory cannot be an existing one . It will be created without its parent directory . shutil.copytree() Is a good way to back up files .

Move files and directories

To move a file or directory to another location , Please use shutil.move(src,dst) .

src Is the file or directory to move ,dst Is the target ：

import shutil
dst = shutil.move('dir_1/', 'backup/')
print(dst) # 'backup'
Copy code

If backup/ There is , be shutil.move('dir_1/','backup/') take dir_1/ Move to backup/ . If backup/ non-existent , be dir_1/ Rename to backup .

Rename files and directories

Python Contains the os.rename(src,dst)：

import os
os.rename('first.zip', 'first_01.zip')
Copy code

Line above first.zip Rename it to first_01.zip . If the destination path points to a directory , It will be thrown out. OSError .

Another way to rename a file or directory is to use the pathlib Module rename（）：

from pathlib import Path
data_file = Path('data_01.txt')
data_file.rename('data.txt')
Copy code

To use pathlib Rename file , First, create a pathlib.Path() object , The object contains the path of the file to replace . The next step is to call on the path object rename() And pass in the new name of the file or directory you want to rename .

file

Archiving is a convenient way to package multiple files into one file . The two most common types of archiving are ZIP and TAR. You wrote Python Programs can create archives , Read and extract data from archives . In this section, you will learn how to read and write two compression formats .

Read ZIP file

zipfile Module is an underlying module , yes Python Part of the standard library . zipfile With easy opening and extraction ZIP Functions of files . To read ZIP The content of the document , The first thing to do is create a ZipFile object .ZipFile Object similar to using open() File objects created .ZipFile Also a context manager , Therefore support with sentence ：

import zipfile
with zipfile.ZipFile('data.zip', 'r') as zipobj:
pass
Copy code

Create a ZipFile object , Pass in ZIP The name of the file and opens in read mode . open ZIP After the document , Can pass zipfile Module provides functions to access information about archive files . In the example above data.zip Archive is from the data Created by , This directory contains a total of 5 File and 1 Subdirectory ：

.
|
├── sub_dir/
| ├── bar.py
| └── foo.py
|
├── file1.py
├── file2.py
└── file3.py
Copy code

To get a list of files in an archive , Please be there. ZipFile Object namelist() ：

import zipfile
with zipfile.ZipFile('data.zip', 'r') as zipobj:
zipobj.namelist()
Copy code

This generates a list of files :

['file1.py', 'file2.py', 'file3.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']
Copy code

.namelist() Returns a list of the names of files and directories in the archive . To retrieve information about files in an archive , Use .getinfo() ：

import zipfile
with zipfile.ZipFile('data.zip', 'r') as zipobj:
bar_info = zipobj.getinfo('sub_dir/bar.py')
print(bar_info.file_size)
Copy code

This will output. :

15277
Copy code

.getinfo() Return to one ZipInfo object , This object stores information about individual members of the archive file . To get information about files in an archive , Please pass its path as a parameter to .getinfo() . Use getinfo() , You can retrieve information about archive members , For example, the date the file was last modified , Compressed size and its full filename . visit .file_size The original size of the file will be retrieved in bytes .

The following example shows how to Python REPL Retrieve more details about archived files in . Assume imported zipfile modular ,bar_info Same as the object created in the previous example ：

>>> bar_info.date_time
(2018, 10, 7, 23, 30, 10)
>>> bar_info.compress_size
2856
>>> bar_info.filename
'sub_dir/bar.py'
Copy code

bar_info Include relevant bar.py Details of , For example, the size of compression and its full path .

The first line shows how to retrieve the last modified date of a file . The next line shows how to get the file size after archiving . The last line shows the bar.py Full path to .

ZipFile Support context manager protocol , That's how you can relate it to with Why statements are used together . Close automatically after operation ZipFile object . Trying to get from a closed ZipFile Opening or extracting files from objects will cause errors .

extract ZIP file

zipfile Module allows you to pass .extract() and .extractall() from ZIP Extract one or more files from a file .

By default , These methods extract files to the current directory . They all take optional path parameters , Allows you to specify another specified directory to extract files to . If the directory does not exist , The directory will be created automatically . To extract a file from a compressed file , Do the following ：

>>> import zipfile
>>> import os
>>> os.listdir('.')
['data.zip']
>>> data_zip = zipfile.ZipFile('data.zip', 'r')
>>> # Extract a single file to the current directory
>>> data_zip.extract('file1.py')
'/home/test/dir1/zip_extract/file1.py'
>>> os.listdir('.')
['file1.py', 'data.zip']
>>> # Bring all files to the specified directory
>>> data_zip.extractall(path='extract_dir/')
>>> os.listdir('.')
['file1.py', 'extract_dir', 'data.zip']
>>> os.listdir('extract_dir')
['file1.py', 'file3.py', 'file2.py', 'sub_dir']
>>> data_zip.close()
Copy code

The third line of code is right os.listdir() Call to , It shows that there is only one file in the current directory data.zip .

Next , Open in read mode data.zip And call .extract() Extract from file1.py . .extract() Returns the full file path of the extracted file . Because no path was specified ,.extract() Will file1.py Extract to current directory .

Print a list of directories on the next line , Show that the current directory now includes archive files other than the original archive . It then shows how to extract the entire archive to the specified directory ..extractall() establish extract_dir And will data.zip The content of . Last line closed ZIP Archive file .

Extract data from encrypted documents

zipfile Support extraction of password protected ZIP. To extract password protected ZIP file , Please pass the password as a parameter to .extract() or .extractall() Method ：

>>> import zipfile
>>> with zipfile.ZipFile('secret.zip', 'r') as pwd_zip:
... # Extract data from encrypted documents
... pwd_zip.extractall(path='extract_dir', pwd='[email protected]')
Copy code

Will open in read mode secret.zip The archive . Password provided to .extractall() , And the contents of the compressed file are extracted to extract_dir . because with sentence , After extraction , Archive will close automatically .

Create a new archive

To create a new ZIP The archive , Please use write mode （w） open ZipFile Object and add files to archive ：

>>> import zipfile
>>> file_list = ['file1.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']
>>> with zipfile.ZipFile('new.zip', 'w') as new_zip:
... for name in file_list:
... new_zip.write(name)
Copy code

In this example ,new_zip Open in write mode ,file_list Each file in is added to the archive . with After statement end , Will close new_zip . Open in write mode ZIP The file deletes the contents of the compressed file and creates a new archive file .

To add a file to an existing archive , Please open in append mode ZipFile object , Then add the file ：

>>> with zipfile.ZipFile('new.zip', 'a') as new_zip:
... new_zip.write('data.txt')
... new_zip.write('latin.txt')
Copy code

This opens the new.zip The archive . Open in append mode ZipFile Object allows new files to be added to ZIP File without deleting its current contents . Add files to ZIP After the document ,with Statement will be out of context and closed ZIP file .

open TAR Archive file

TAR File is like ZIP Etc. uncompressed file archive . They can be used gzip,bzip2 and lzma Compression method for compression . TarFile Class allows reading and writing TAR The archive .

Read from archive ：

import tarfile
with tarfile.open('example.tar', 'r') as tar_file:
print(tar_file.getnames())
Copy code

tarfile Objects open like most file like objects . They have a open() function , It uses a mode to determine how files are opened .

Use “r”,“w” or “a” Mode opens uncompressed TAR File for reading , Write and append . To turn on compressed TAR file , Please pass the mode parameter to tarfile.open(), The format for filemode [:compression] . The following table lists the open TAR Possible modes of files ：

Pattern

Behavior

Open archive in uncompressed read mode

r:gz

With gzip Compressed read mode open archive

r:bz2

With bzip2 Compressed read mode open archive

Open archive in uncompressed write mode

w:gz

With gzip Compressed write mode open archive

w:xz

With lzma Compressed write mode open archive

Open archive in uncompressed append mode

.open() The default is 'r' Pattern . To read uncompressed TAR File and retrieve its filename , Please use .getnames() ：

>>> import tarfile
>>> tar = tarfile.open('example.tar', mode='r')
>>> tar.getnames()
['CONTRIBUTING.rst', 'README.md', 'app.py']
Copy code

This returns the name of the content in the archive as a list .

Be careful ： To show you how to use different tarfile Object methods , In the example TAR File in interactive REPL Manually open and close in session . In this way TAR File interaction , You can view the output of running each command . Usually , You may want to use context manager to open file like objects .

In addition, you can use special properties to access metadata for each entry in the archive ：

>>> for entry in tar.getmembers():
... print(entry.name)
... print(' Modified:', time.ctime(entry.mtime))
... print(' Size :', entry.size, 'bytes')
... print()
CONTRIBUTING.rst
Modified: Sat Nov 1 09:09:51 2018
Size : 402 bytes
README.md
Modified: Sat Nov 3 07:29:40 2018
Size : 5426 bytes
app.py
Modified: Sat Nov 3 07:29:13 2018
Size : 6218 bytes
Copy code

In this example , Loop traversal .getmembers() List of returned files , And print out the properties of each file ..getmembers() The returned object has properties that can be accessed programmatically , For example, the name of each file in the archive , Size and last modified . After reading or writing to the archive , It must be shut down to free system resources .

from TAR Extract files in archive

In this section , You will learn how to use the following methods from TAR Extract files in archive ：

.extract()
.extractfile()
.extractall()

From you to TAR Extract single file in archive , Please use extract() , Incoming filename ：

>>> tar.extract('README.md')
>>> os.listdir('.')
['README.md', 'example.tar']
Copy code

README.md File extraction from archive to file system . call os.listdir() confirm README.md File successfully extracted to current directory . To extract or extract everything from the archive , Please use .extractall() ：

>>> tar.extractall(path="extracted/")
Copy code

.extractall() There is an optional path Parameter to specify the destination of the extracted file . here , Archive is extracted to extracted Directory . The following command shows that the archive was successfully extracted ：

$ ls
example.tar extracted README.md
$ tree
.
├── example.tar
├── extracted
| ├── app.py
| ├── CONTRIBUTING.rst
| └── README.md
└── README.md
1 directory, 5 files
$ ls extracted/
app.py CONTRIBUTING.rst README.md
Copy code

To extract a file object for reading or writing , Please use .extractfile() , It receives File name or TarInfo Object as parameter . .extractfile() Returns a class file object that can be read and used ：

>>> f = tar.extractfile('app.py')
>>> f.read()
>>> tar.close()
Copy code

Open archives should always be closed after reading or writing . To close the archive , Please call on the archive handle .close() , Or create tarfile Object with sentence , To automatically close the archive when it is finished . This will free up system resources , And write any changes you make to the archive to the file system .

Create a new TAR The archive

Create a new TAR The archive , You can do this :

>>> import tarfile
>>> file_list = ['app.py', 'config.py', 'CONTRIBUTORS.md', 'tests.py']
>>> with tarfile.open('packages.tar', mode='w') as tar:
... for file in file_list:
... tar.add(file)
>>> # Read the contents of the newly created archive
>>> with tarfile.open('package.tar', mode='r') as t:
... for member in t.getmembers():
... print(member.name)
app.py
config.py
CONTRIBUTORS.md
tests.py
Copy code

First , You want to create a list of files to add to the archive , So you don't have to add each file manually .

Use next line with The raytext manager opens in write mode with the name packages.tar New archive . In write mode （'w'） Open archive enables you to write new files to the archive . All existing files in the archive will be deleted , And create a new archive .

After creating and populating the archive ,with The context manager automatically closes it and saves it to the file system . The last three lines open the archive you just created , And print out the name of the file contained in it .

To add a new file to an existing archive , Please use append mode （'a'） open state ：

>>> with tarfile.open('package.tar', mode='a') as tar:
... tar.add('foo.bar')
>>> with tarfile.open('package.tar', mode='r') as tar:
... for member in tar.getmembers():
... print(member.name)
app.py
config.py
CONTRIBUTORS.md
tests.py
foo.bar
Copy code

Open archive in append mode allows you to add new files to it without deleting existing files .

Use compressed archive

tarfile Can be read and written using gzip,bzip2 and lzma Compression of the TAR Archive file . To read or write to a compressed archive , Please use tarfile.open() , Passing the appropriate pattern for the compression type .

for example , To read or write to use gzip Compression of the TAR Archived data , Please use separately 'r:gz' or 'w:gz' Pattern ：

>>> files = ['app.py', 'config.py', 'tests.py']
>>> with tarfile.open('packages.tar.gz', mode='w:gz') as tar:
... tar.add('app.py')
... tar.add('config.py')
... tar.add('tests.py')
>>> with tarfile.open('packages.tar.gz', mode='r:gz') as t:
... for member in t.getmembers():
... print(member.name)
app.py
config.py
tests.py
Copy code

'w:gz' Open in write mode gzip Compressed archive ,'r:gz' Open in read mode gzip Compressed archive . Cannot open compressed archive in append mode . To add a file to a compressed archive , You must create a new archive .

An easier way to create an archive

Python The standard library also supports the use of shutil Advanced method creation in modules TAR and ZIP The archive . shutil The archive utility in allows you to create , Read and extract ZIP and TAR file . These utilities depend on the underlying tarfile and zipfile modular .

Use shutil.make_archive() Create Archive

shutil.make_archive() Receive at least two parameters ： Name and format of the archive .

By default , It compresses all files in the current directory into format Archive format specified in parameter . You can pass in optional root_dir Parameter to compress files in different directories . .make_archive() Support zip ,tar ,bztar and gztar archive format .

Here are the USES shutil establish TAR Method of archiving ：

import shutil
# shutil.make_archive(base_name, format, root_dir)
shutil.make_archive('data/backup', 'tar', 'data/')
Copy code

This will copy. data / Everything in , And create a file system named backup.tar Archive of and return its name . To extract an archive , Please call .unpack_archive() ：

shutil.unpack_archive('backup.tar', 'extract_dir/')
Copy code

call .unpack_archive() And pass in the archive name and destination directory , take backup.tar Content extracted to extract_dir/ in . ZIP Archives can be created and extracted in the same way .

Read multiple files

Python Supported by fileinput Module reads data from multiple input streams or file lists . This module allows you to quickly and easily loop through the contents of one or more text files . Here are the USES fileinput Typical methods of ：

import fileinput
for line in fileinput.input()
process(line)
Copy code

fileinput Default passed from to sys.argv Gets its input .

Use fileinput Loop through multiple files

Let's use fileinput Build a common UNIX Tools cat Original version of . cat Tools read files in order , Write them to standard output . When multiple files are given in command line arguments ,cat The text file will be connected and the result will be displayed in the terminal ：

# File: fileinput-example.py
import fileinput
import sys
files = fileinput.input()
for line in files:
if fileinput.isfirstline():
print(f'\n--- Reading {fileinput.filename()} ---')
print(' -> ' + line, end='')
print()
Copy code

There are two text files in the current directory , Running this command produces the following output ：

$ python3 fileinput-example.py bacon.txt cupcake.txt
--- Reading bacon.txt ---
-> Spicy jalapeno bacon ipsum dolor amet in in aute est qui enim aliquip,
-> irure cillum drumstick elit.
-> Doner jowl shank ea exercitation landjaeger incididunt ut porchetta.
-> Tenderloin bacon aliquip cupidatat chicken chuck quis anim et swine.
-> Tri-tip doner kevin cillum ham veniam cow hamburger.
-> Turkey pork loin cupidatat filet mignon capicola brisket cupim ad in.
-> Ball tip dolor do magna laboris nisi pancetta nostrud doner.
--- Reading cupcake.txt ---
-> Cupcake ipsum dolor sit amet candy I love cheesecake fruitcake.
-> Topping muffin cotton candy.
-> Gummies macaroon jujubes jelly beans marzipan.
Copy code

fileinput Allows you to retrieve more information about each line , For example, is it the first line (.isfirstline()), Line number (.lineno()) And file name (.filename()).

summary

You know how to use it now Python Perform the most common actions on files and filegroups . You've learned to use different built-in modules to read , Find and manipulate files .