程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python processing Excel

編輯:Python

Python Handle excel

  • One 、 install
  • Two 、 Module import
  • 3、 ... and 、 Read excel file
    • get files
    • Get worksheet
  • Four 、 establish excel file
  • 5、 ... and 、 Save the file
  • 6、 ... and 、 Get cell value
  • 7、 ... and 、 Write data
  • 8、 ... and 、 Style adjustment
  • Nine 、 Insert a picture

You may face this problem :

Piles of scientific experimental data need to be imported Excel Analyze
Piles of sales data tables are waiting for statistics
Piles of commodity data await analysis
As a person whose life is short Python The programmer , How to operate gracefully Excel? Actually Python Operations provided Excel The library of 7 There are so many , Which is better and more convenient to use ? First of all, let's grasp the characteristics of different libraries as a whole :

xlrd、xlwt、xlutils Each has its own limitations , But the three complement each other , covers Excel Documents, especially .xls Operation of file .xlwt Can generate .xls file ,xlrd You can read existing .xls file ,xlutils Connect xlrd and xlwt Two modules , So that users can read and write one at the same time .xls file . Simply speaking ,xlrd Responsible for reading 、xlwt Responsible for writing 、xlutils Responsible for providing support and interface
xlwings It's very convenient to read and write Excel The data in the file , And can modify the cell format
XlsxWriter It's a way to write .xlsx File format module . It can be used to write text 、 Numbers 、 Formula and support cell formatting 、 picture 、 Chart 、 Document configuration 、 Automatic filtering and other features . But it can't be used to read and modify Excel file
openpyxl adopt “ workbook workbook - Worksheet sheet - Cell cell” That's right .xlsx Read the file 、 Write 、 Change , And you can adjust the style
pandas Everyone is no stranger , It is a powerful module for data processing and analysis , Sometimes it can be used to automate Excel
Here are the characteristics of each from different angles ( If you don't want to look at the detailed comparison process , You can drag it to the end of the article to see the summary chart ):

One 、 install

7 All modules are non-standard Libraries , So it needs to be on the command line

pip/pip3 Installation :

pip/pip3 install xlrd

pip/pip3 install xlwt

pip/pip3 install xlutils

pip/pip3 install xlwings

pip/pip3 install XlsxWriter

pip/pip3 install openpyxl

pip/pip3 install pandas

Tips :

  1. xlutils Support only xls file , namely 2003 The following versions ;
  2. xlwings After successful installation , If the operation prompt reports an error “ImportError: no module named win32api”, Please reinstall pypiwin32 perhaps pywin32 package ;

Two 、 Module import

Most modules can be imported directly by name , Some module conventions use abbreviations

import xlrd

import xlwt

import xlwings as xw

import xlsxwriter

import openpyxl

import pandas as pd

xlutils The module is xlrd and xlwt The bridge between , The core function is to copy a copy through xlrd Read into memory .xls object , And then copy the object through xlwt modify .xls The content of the form .xlutils Can be xlrd Of Book Object copy to xlwt Of Workbook object , When it is used, it is usually imported from the module copy Sub module :

import xlutils.copy

xlwings Can be combined with VBA Realize to Excel Programming , Strong data input and analysis ability , At the same time, it has rich interfaces , combination pandas/numpy/matplotlib Take it easy Excel Data processing work .

xlsxwriter It has rich characteristics , Support pictures / form / Chart / Screening / Format / Formula, etc , Function and openpyxl be similar , The advantage is that compared with openpyxl And support VBA File import , Mini map and other functions , The disadvantage is that it can't be opened / Modify existing documents , It means using xlsxwriter Need to start from scratch .

openpyxl Simple and easy to use , Wide range of functions , Format cells / picture / form / The formula / Screening / Comments / File protection and other functions , The chart function is one of its highlights , The disadvantage is right VBA The support is not good enough .

pandas Data processing is pandas The foundation of our life ,Excel As pandas Input / Container for outputting data .

xlutils be based on xlrd/xlwt, old brand python package , A pioneer in this field , Functional features are regular , The big disadvantage is that it only supports xls file .

3、 ... and 、 Read excel file

get files

get files

Use two sizes below, both of which are 8MB Of .xls and .xlsx File to test :

xls_path = '~/Desktop/test.xls'
xlsx_path = '~/Desktop/test.xlsx'

xlrd Read the file

xlrd Can read .xls and .xlsx file

xls = xlrd.open_workbook(xls_path)
xlsx = xlrd.open_workbook(xlsx_path)

xlwings Read the file

xlwings Direct docking is apps, That is to say Excel Applications , So your computer must have Microsft Of Excel perhaps WPS Of Excel Can only be ( The default is Microsft Excel, It can be done by App It is specified in spec Parameter declarations wpsoffice), Then you can operate the workbook books And worksheet sheets.

app = xw.App(visible=True, add_book=False) # The program is visible , Open only, don't create a new workbook
app.display_alerts = False # Warning off
app.screen_updating = False # Screen update off
# wb = app.books.open(xls_path)
wb = app.books.open(xlsx_path)
wb.save() # Save the file
wb.close() # Close file
app.quit() # Shut down the program

openpyxl Read the file

openpyxl Can read .xlsx file

wb = openpyxl.load_workbook(xlsx_path)

If reading .xls The document will report an error :

wb = openpyxl.load_workbook(xls_path)
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.

pandas Read the file

pandas Can read .xls and .xlsx file

xls = pd.read_excel(xls_path, sheet_name='Sheet1')
xlsx = pd.read_excel(xlsx_path, sheet_name='Sheet1')

Next, compare the four modules read in the same configuration computer 10MB .xlsx Time of file ( function 3 Second average ), The code used is :

import time
import xxx
time_start = time.time()
# Here can be the above test code , Omitted for the time being
....
time_end = time.time()
print('time cost: ', time_end-time_start, 's')

The result of the final test is ,xlwings Read 10MB Files are the fastest ,xlrd second ,openpyxl The slowest ( It's different from computer to computer , Results are for reference only )

Read in Excel The tables in the document section are summarized as follows :

Get worksheet

In view of the above 4 One can read Excel Module of file , Further discussion of its access worksheet sheet The way

xlrd Get worksheet

Can pass sheet Name search :

sheet = xlsx.sheet_by_name("Sheet1")

You can also look up :

sheet = xlsx.sheet_by_index(0)

xlwings Get worksheet

xlwings The worksheets are divided into active worksheets and specific worksheets under specified workbooks :

sheet = xw.sheets.active # In the active workbook
sheet = wb.sheets.active # In a specific workbook

openpyxl Get worksheet

.active Method gets the first sheet of the workbook by default

sheet = wb.active

In addition, you can also get the worksheet by specifying the name of the worksheet :

sheet = wb['Sheet1']

pandas Get worksheet

There's no way to get a worksheet alone pandas What thing , Because when reading a file, you must specify a worksheet to read it :

xlsx = pd.read_excel(xlsx_path, sheet_name='Sheet1')

Not all 7 All modules can read Excel file , And even if you can read Excel Files are also discussed in different suffixes , The specific summary is as follows :

  1. xlwt、xlutils、XlsxWriter Can't read file
  2. xlrd Can read .xls and .xlsx file
  3. xlwings Can read .xls and .xlsx file
  4. openpyxl Can read .xlsx file
  5. pandas Can read .xls and .xlsx file

Four 、 establish excel file

xlwt create a file

xlwt Can only be created .xls file , Can't create .xlsx file

xls = xlwt.Workbook(encoding= 'ascii')
# Create a new sheet surface
worksheet = xls.add_sheet("Sheet1")

xlwings create a file

xlwings You can create .xls and .xlsx file , Just write the suffix clearly when you save it . Use the following command :

wb = app.books.add()

Whether you create a new workbook or open it, you need to save the workbook 、 Close workbook 、 Shut down the program , namely :

wb.save(path + '/new_practice.xlsx')
wb.close()
app.quit()

XlsxWriter create a file

XlsxWriter You can create .xlsx file :

xlsx = xlsxwriter.Workbook()
# Add sheet
sheet = xlsx .add_worksheet('Sheet1')

openpyxl create a file

openpyxl You can create .xls and .xlsx file , Just write the suffix clearly when you save it . Use the following command :

wb = Workbook()
# Create a worksheet as specified in the new workbook
sheet = wb.active

pandas create a file

pandas You just need to write the suffix clearly when you transfer it . It's actually more abstract ,pandas You don't need to create one first Excel file , You can do all kinds of operations around the data frame .to_excel Order again .xls perhaps .xlsx Do file suffixes . If you have to create a blank Excel You can use the following command for the file :

df = pd.DataFrame([])
df.to_excel('~/Desktop/test1.xlsx')

Simply summarize and create Excel File status :

  1. xlrd、xlutils Can't create Excel file
  2. xlwt Can only be created .xls file , Can't create .xlsx file
  3. xlwings You can create .xls and .xlsx file
  4. XlsxWriter You can create .xlsx file
  5. openpyxl You can create .xls and .xlsx file
  6. pandas Not created Excel The concept of , But it can be generated when it's stored .xls or .xlsx file

5、 ... and 、 Save the file

xlwt Save the file

xlwt It can be saved .xls file

# xls = xlwt.Workbook(encoding= 'ascii')
# worksheet = xls.add_sheet("Sheet1")
xls.save("new_table.xls")

xlutils Save the file

xlutils Can be xlrd Object is copied as xlwt Object after saving .xls file

# xls_path = '~/Desktop/test.xls'
# xls = xlrd.open_workbook(xls_path)
xls_xlutils = xlutils.copy.copy(xls)
xls_xlutils.save('new_text.xls')

xlwings Save the file

xlwings It can be saved .xls and .xlsx file

# wb = app.books.open(xls_path)
wb = app.books.open(xlsx_path)
wb.save() # Save the file
wb.close() # Close file
app.quit() # Shut down the program

XlsxWriter Save the file

XlsxWriter It can be saved .xlsx file ,.close After the command is executed, the file is closed and saved at the same time :

# xlsx = xlsxwriter.Workbook()
# sheet = xlsx .add_worksheet('Sheet1')
xlsx.close()

openoyxl Save the file

openpyxl It can be saved .xlsx file

# wb = openpyxl.load_workbook(xlsx_path)
# wb = Workbook()
# sheet = wb.active
wb.save('new_test.xlsx')

pandas Save the file

pandas It can be saved .xls or .xlsx file

df1 = pd.DataFrame([1, 2, 3])
df2 = pd.DataFrame([1, 2, 4])
df1.to_excel('~/Desktop/test1.xls')
df2.to_excel('~/Desktop/test2.xlsx')

Simply summarize and save Excel File status :

  1. xlrd Can't save Excel file
  2. xlwt It can be saved .xls file
  3. xlutils Can be xlrd Object is copied as xlwt Object after saving .xls file
  4. xlwings It can be saved .xls and .xlsx file
  5. XlsxWriter It can be saved .xlsx file
  6. openpyxl It can be saved .xlsx file
  7. pandas It can be saved .xls or .xlsx file

6、 ... and 、 Get cell value

The basic premise of getting cell value is to be able to read the file , So basically around xlrd、xlwings、openpyxl、pandas Introduce .xlutils Because you can make a copy of .xls So it can also be used and xlrd Exactly the same way to read cells .

xlrd/xlutils Get the cell

xlutils Because it's a direct copy of xlrd Applicable objects , How to read cells and xlrd Exactly the same as .xlwt No ability to read cells

# xls = xlrd.open_workbook(xls_path)
# sheet = xlsx.sheet_by_name("Sheet1")
value = sheet.cell_value(4, 6) # The first 5 Xing di 7 Cell of column
print(value)
rows = table.row_values(4)
cols = table.col_values(6)
for cell in rows:
print(cell)

xlwings Get the cell

# app = xw.App(visible=True, add_book=False)
# app.display_alerts = False
# app.screen_updating = False
# wb = app.books.open(xls_path)
# sheet = wb.sheets.active
# Get the value of a single cell
A1 = sheet.range('A1').value
print(A1)
# Get the values of multiple cells horizontally or vertically , Returns a list of
A1_A3 = sheet.range('A1:A3').value
print(A1_A3)
# Gets the value of multiple cells in a given range , Return to nested list , By behavior list
A1_C4 = sheet.range('A1:C4').value
print(A1_C4)
# Get the value of a single cell
A1 = sheet.range('A1').value
print(A1)
# Get the values of multiple cells horizontally or vertically , Returns a list of
A1_A3 = sheet.range('A1:A3').value
print(A1_A3)
# Gets the value of multiple cells in a given range , Return to nested list , By behavior list
A1_C4 = sheet.range('A1:C4').value
print(A1_C4)

openpyxl Get the cell

# wb = openpyxl.load_workbook(xlsx_path)
# wb = Workbook()
# sheet = wb.active
# One 、 Specifies the value of the coordinate range
cells = sheet['A1:B5']
# Two 、 Specify the value of the column
cells = sheet['A']
cells = sheet['A:C']
# 3、 ... and 、 Specify the value of the row
cells = sheet[5]
cells = sheet[5:7]
# Get cell value
for cell in cells:
print(cell.value)

pandas Get cell value

pandas Read Excel File is converted to a data frame object , The way to analyze the content is basically pandas Knowledge points in the system , Such as .iloc() .loc() .ix() etc. :

print(df1.iloc[0:1, [1]])
print(df1.loc['b'])
print(df2.ix['a', 'a']) # Some versions have been canceled ix, It can be used iat

7、 ... and 、 Write data

xlwt/xlutils Write data

# xls = xlrd.open_workbook(xls_path)
# xls_xlutils = xlutils.copy.copy(xls)
# sheet = xls_xlutils.sheet_by_name("Sheet1")
# value = sheet.cell_value(4, 6)
# print(value)
sheet.write(4, 6, " New content ")

xlwings Write data

# app = xw.App(visible=True, add_book=False)
# app.display_alerts = False
# app.screen_updating = False
# wb = app.books.open(xls_path)
# sheet = wb.sheets.active
# write in 1 A cell
sheet.range('A2').value = ' Daming '
# Write to multiple cells in one row or column
# Write horizontally A1:C1
sheet.range('A1').value = [1,2,3]
# Write vertically A1:A3
sheet.range('A1').options(transpose=True).value = [1,2,3]
# Write multiple cells in the range
sheet.range('A1').options(expand='table').value = [[1,2,3], [4,5,6]]

XlsxWriter Write data

In code new_format It's the preset style , The following will introduce

# xlsx = xlsxwriter.Workbook()
# sheet = xlsx .add_worksheet('Sheet1')
# One 、 Write to a single cell
sheet.write(row, col, data, new_format)
# A1: from A1 Cells start inserting data , Insert by line
sheet.write_row('A1', data, new_format)
# A1: from A1 Cells start inserting data , Insert by column
sheet.write_column('A1', data, new_format)

openpyxl Write data

# wb = openpyxl.load_workbook(xlsx_path)
# wb = Workbook()
# sheet = wb.active
# One 、 Write cells
cell = sheet['A1']
cell.value = ' Business needs '
# Two 、 Write one or more lines of data
data1 = [1, 2, 3]
sheet.append(data1)
data2 = [[1, 2, 3], [4, 5, 6]]
sheet.append(data2)

A brief summary of Excel File write data situation :

  1. xlrd Unable to write data
  2. xlwt Can write data
  3. xlutils You can borrow xlwt Method write data
  4. xlwings Can write data
  5. XlsxWriter Can write data
  6. openpyxl Can write data
  7. pandas take Excel After the file is read as a data frame , It abstracts the data frame level for operation , No, right Excel The concept of writing and modifying cells

8、 ... and 、 Style adjustment

xlwt Adjust the style

xlwt Support for adjusting Fonts 、 Frame 、 Color, etc

# Font part
# Initialize style
style1 = xlwt.XFStyle()
# Create fonts for styles
font = xlwt.Font()
font.name = 'Times New Roman' # typeface
font.bold = True # In bold
font.underline = True # Underline
font.italic = True # Italics
# Set the style
style1.font = font
# Use styles
sheet.write(4, 6, " New content 1", style1)
# Border section
borders = xlwt.Borders()
# Set the line type
borders.left = xlwt.Borders.DASHED
borders.right = xlwt.Borders.DASHED
borders.top = xlwt.Borders.DASHED
borders.bottom = xlwt.Borders.DASHED
# Set the color swatch
borders.left_colour = 0x40
borders.right_colour = 0x40
borders.top_colour = 0x40
borders.bottom_colour = 0x40
#
style2 = xlwt.XFStyle()
style2.borders = borders
# Use styles
sheet.write(5, 8, " New content 2", style2)

xlwings Adjust the style

Brief introduction xlwings The adjustment of color :

# Get the color
print(sheet.range('C1').color)
# Set the color
sheet.range('C1').color = (255, 0, 120)
# Clear the color
sheet.range('C1').color = None

XlsxWriter Adjust the style

XlsxWriter Contains a lot of features , You can modify the style of the worksheet after creating it :

new_format = xlsx.add_format({
'bold': True, # Bold font
'border': 1, # Cell border width
'align': 'left', # Horizontal alignment
'valign': 'vcenter', # Vertical alignment
'fg_color': '#F4B084', # Cell background color
'text_wrap': True # Auto wrap or not
})
sheet.write(row, col, data, new_format)

openpyxl Adjust the style

openpyxl Styles mainly include fonts 、 Frame 、 Paragraph alignment styles, etc

# Font style
from openpyxl.styles import Font
cell = sheet['A1']
font = Font(name='Arial', size=12, bold=True, italic=True, color='FF0000')
cell.font = font
# Paragraph alignment
from openpyxl.styles import Alignment
cell = sheet['B2']
alignment = Alignment(horizontal='center', vertical='center', text_rotation=45, wrap_text=True)
cell.alignment = alignment
# Border style
from openpyxl.styles import Side, Border
cell = sheet['B2']
side1 = Side(style='thin', color='FF0000')
side2 = Side(style='dashed')
border = Border(left=side1, right=side1, top=side2, bottom=side2)
cell.border = border

Yes Excel File style adjustment :

  1. xlrd、xlutils You can't adjust the style ( It can also be said that xlutils Sure , It's just borrowing xlwt Methods )
  2. xlwt You can adjust the style
  3. xlwings You can adjust the style
  4. XlsxWriter You can adjust the style
  5. openpyxl You can adjust the style
  6. pandas You can't adjust the style

Nine 、 Insert a picture

xlwt Insert a picture

xlwt To insert a picture, the format of the picture must be .bmp Format can be inserted successfully

sheet.insert_bitmap("test.bmp", 2, 3, 2, 2, 0.5, 0.5)

insert_bitmap(img, x, y, x1, y1, scale_x, scale_y)``img Represents the address of the image to be inserted ,x Said line ,y The column x1 y1 Represents a pixel that is shifted down and right from its original position scale_x scale_y Represents the ratio of width to height relative to the original image , Pictures can be zoomed in and out

xlwings Insert a picture

The following is to use xlwings Insert the code for the picture , Location can be specified

sheet.pictures.add(r'C:\\xxx.jpg')
# You can also insert... At a given location
sheet.pictures.add(r'C:\\xxx.jpg', left=sheet.range('A2').left, top=sheet.range('A2').top, width=100, height=100)

XlsxWriter Insert a picture

The first parameter is the start cell of the insert , The second parameter is the absolute path of the image file

sheet.insert_image('A1', r'C:\\xxx.jpg')

openpyxl Insert a picture

openpyxl You can also give Excel Insert the specified picture into the image and change the size

from openpyxl.drawing.image import Image
img = Image('test.jpg')
newsize = (180, 360)
img.width, img.height = newsize # Set the width and height of the picture
sheet.add_image(img, 'A2') # Go to A2 Cell insert picture

A brief summary of Excel The situation of inserting a picture into a file :

  1. xlrd、xlutils You can't adjust the style ( It can also be said that xlutils Sure , It's just borrowing xlwt Methods )
  2. xlwt You can insert .bmp picture
  3. xlwings You can insert picture
  4. XlsxWriter You can insert pictures
  5. openpyxl You can insert pictures
  6. pandas Can't insert picture

Here is the reference

Conclusion figure :

The purpose of this article is not to evaluate the best Library , Just comparing different libraries from different perspectives , Hope to let you know what each library is good at . such as pandas Although it's easy to handle , But you can't add pictures, modify styles ,openpyxl Although all kinds of operations support , But the speed is relatively slow, etc .

Only by fully understanding the characteristics of different tools , Only in this way can we flexibly use different methods to effectively solve problems in different scenarios !


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved