程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Operating excel with Pythons pandas Library

編輯:Python

Use Python Of Pandas Library operation Excel

Recently, I used Excel Spreadsheets process data , There are other ways to deal with Excel File data , This is the arrangement of study notes .

Excel2003 And previous editions : Maximum number of columns 256(2 Of 8 Power ) Column , Maximum number of rows 65536(2 Of 16 Power ) That's ok ;Excel2007 And later versions : Maximum number of columns 16384(2 Of 14 Power ), Maximum number of rows 1048576(2 Of 20 Power );

obtain Excel Maximum row and maximum column methods :

start-up Excel Then press the shortcut key Ctrl+ Direction key (←↑↓→), Can be located to the leftmost 、 At the top 、 At the bottom 、 The rightmost cell , So you can see the maximum values of rows and columns .

Python There are many libraries to operate Excel, image pandas、xlrd、xlwt、xlutils、openpyxl  etc. .

xlrd library : Read Excel file

xlwt library : write in Excel file

xlutils library : operation Excel File utility , Like copying 、 Division 、 Screening, etc

xlrd、xlwt、xlutils The library can be read and written with the suffix xls Of excel file .

openpyxl library  : operation xlsx Suffix excel file , Also use this library .

This paper mainly introduces pandas. hot tip :

Pandas It's basic NumPy Software library , So the installation Pandas It needs to be installed before NumPy. default pandas Can't read or write directly excel file , Need to install read 、 Writing a library is xlrd、xlwt To achieve xls Suffix excel Reading and writing of documents , To read and write normally xlsx Suffix excel file , It also needs to be openpyxl .

Pandas brief introduction

pandas Official website https://pandas.pydata.org/

pandas Chinese Course https://www.gairuo.com/p/pandas-tutorial

Pandas It's a Python The core data analysis support library , It provides powerful one-dimensional array and two-dimensional array processing capabilities , It is very good at dealing with two-dimensional table structures , Matrix data with row and column labels , time series data .Pandas The two main data structures provided are one-dimensional arrays (Series) And two dimensional arrays (DataFrame) It strongly supports today's finance 、 Statistics 、 Social Sciences 、 Data analysis in engineering and many other fields . adopt Pandas We can easily operate the increase of data 、 check 、 Change 、 Delete 、 Merge 、 restore 、 grouping 、 Statistical analysis , Besides Pandas It also provides very mature I/O Tools , Used to read text files ,excel file , Database and other data from different sources , Use the super fast HDF5 Format preservation / Load data .

Pandas Data structure and Excel Correspondence of document attributes

* pandas Medium DataFrame Be similar to Excel The worksheet for . however Excel A workbook can contain multiple worksheets , and pandas DataFrame It's independent .

* Series Express DataFrame A column of data structure , Use Series Similar to referencing a column in a spreadsheet .

Every DataFrame and Series There is one. Index, It is the label on the data row .

* stay pandas in , If the index is not specified , It is used by default RangeIndex( first line = 0, The second line = 1, And so on ), Similar to row numbers in spreadsheets ( Numbers ).

pandas You can also set the index to a ( Or more ) The only value , It's like having a column in a worksheet that acts as a row identifier .

The index value is fixed , So if you're right DataFrame Reorder rows in , The label of the row will not change .

Pandas install

Python modular ( library 、 package ) Install command format :

[py -X.Y -m] pip install [-i Image URL ] modular ( library 、 package ) name

among [] The partial expression can be first

If multiple... Are installed python edition , For a given Python Version installation module ( library 、 package ),X.Y representative Python edition , Discard the superfluous part, such as 3.8.1 take 3.8,3.10.5 take 3.10, That is, only the part before the second point . Only one... Is installed python The version does not need .

Common mirror URL

tsinghua :https://pypi.tuna.tsinghua.edu.cn/simple

Alibaba cloud :https://mirrors.aliyun.com/pypi/simple/

University of science and technology of China https://pypi.mirrors.ustc.edu.cn/simple/

【 See :https://blog.csdn.net/cnds123/article/details/104393385】

install Pandas It needs to be installed before NumPy,

stay CMD Input in

py -3.10 -m pip install -i http://mirrors.aliyun.com/pypi/simple/ numpy

I have installed NumPy, Skip here

【 see python Third-party module ( library 、 package ) Whether to install and its version number

[py -X.Y -m] pip list

among [] Part means optional , If multiple... Are installed python edition , Appoint Python edition , View by X.Y Appoint python Module associated with version ( library 、 package ) situation 】

Pandas install , open cmd window , Input :

py -3.10 -m pip install -i http://mirrors.aliyun.com/pypi/simple/ Pandas

See the figure below :

Successfully It means success

WARNING Part of the general idea is also available pip The new version can be upgraded , You can follow the command in quotation marks in the prompt to upgrade the operation , Don't worry about it

xlrd、xlwt、xlutils、openpyxl For the installation of the library, please refer to the above method

After successful installation , We can import pandas Used .

Pandas Basic operation

* data fetch

pandas Read excel Example

test1.xlsx Is as follows :

Source code is as follows :

import pandas as pd
file = r'D:\Excel Tips for use \test1.xlsx'
data = pd.read_excel(file)
print(data)

Running results :

Tips :

In quotation marks is excel File path and file name of the table , Add in front “r” To prevent python The interpreter handles string character escape . If... Appears in the string “\t”, No addition “r” Words “\t” Will be transferred , Represents the tab character , Represents four spaces , That's one tab key , And added “r” after “\t” You can keep the original look .

file = r'D:\Excel Tips for use \test1.xlsx' , If it's written as file ='D:\Excel Tips for use \test1.xlsx' Will be an error ! But it can be rewritten as file = 'D:\\Excel Tips for use \\test1.xlsx'  or file = 'D:/Excel Tips for use /test1.xlsx'

read_excel() Method take Excel File read to pandas DataFrame in

There are many parameters to describe in detail https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

Common parameters are

The first parameter specifies the file name with the path ( If the file to be opened is in the current path , You can omit the file path and write only the file name )

sheet_name Parameters can be specified sheet Page name or location , String is used for sheet name . Integers are used for zero index sheet locations , Default default 0 That is, in the first position sheet, Such as :

df= pd.read_excel(r'D:\Excel Tips for use \test1.xlsx' [, sheet_name='sheet1'])

Processing data

# Import pandas library

import pandas as pd

# Read excel file

df= pd.read_excel(r'D:\Excel Tips for use \test1.xlsx' [, sheet_name='sheet1'])

# Get column data

df['column_name']

# Get multiple columns In multiple columns ,df[] Inside the brackets is a list

df[['columns_name1','columns_name2']]

# Get row data

df.loc[Line_number [,'column_name']]

among ,Line_number Is the line number ,column_name Is the column name , It can be defaulted , Column name gets the whole row by default

# Overall data sorting

df.sort_values(by='columns_name',ascending = False)

# Delete duplicate data

df.drop_duplicates()

to_excel() Method take DataFrame Save the contents of to excel file

to_excel() There are many method parameters See also https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html?highlight=to_excel

A common parameter is to specify a file name with a path ( If the file to be opened is in the current path , You can omit the file path and write only the file name )

A simple example is as follows :


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved