程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Official recommendation: there are six ways for pandas to read excel, and the correct answers are written in the source code ~ its too convenient

編輯:Python

Hello everyone , This is Wang Feng, a programmer .

Many friends use Python Medium Pandas This library carries on Excel Data processing of , Data processing can be divided into such macroscopically 3 Stages : data fetch 、 Data processing 、 Data output .

For most newcomers , At this step of data reading, it gets stuck .

Today we will learn together ,Pandas Officially recommended 6 Kind of Excel Read mode .

This article altogether 3 part : download pandas And generation Excel file 、 Source code interpretation 、 Read Excel Of 6 Ways of planting .
If you are a skilled Python Users , You can jump directly to 3 part .
If you're new Python Or just touched Pandas, I suggest you start from 1 Part begins to see .

All codes below , Fine ← about → Slide to see , You can also copy and paste directly .

1、 preparation

  • First , You need to download the latest version Pandas library . So you can use Pandas, This is not hard to understand ?

  • secondly , You have to have one like this Excel file . In order to ensure that everyone and the operation of this article are unified , I suggest you use the same Excel file .

How to download Pandas? How to get Excel? We all use 1 Line the command to do it automatically , After all, we are an automated office community , If these operations cannot be automated , Isn't that too much ?

You can directly execute the following line of code , It will generate an exactly the same as this article Excel Documents ~

1 Command line installation :pandas, edition :1.4.0

In your computer terminal , Execute the following command , You can install it automatically pandas 了 ~

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple python-office -U

1 Command line generation Excel

Excel You don't need to download files everywhere , We introduced a function before , This is where it can be used :1 Line code , Automatically generate with analog data Excel file

We'll deal with it later Excel Cases and demonstrations , This automatic generation method is used every time , Of course, you can also edit one manually , But if we want to learn how to deal with it in the future 10w Yes Excel What about the documents ? Whether you are manually generated or Baidu cloud download , It is an extremely slow process .

But use the following generation method , Simulate one 10w+ Data Excel file , It's just a moment , Be sure to try it ~ You will find a new world .

import office

office.excel.fake2excel(columns=['name', 'company_prefix','job'], rows=5)

In your PyCharm Inside , Execute the above line of code , You can generate one as shown in the following figure , Exactly the same as this article Excel Documents ~

2、pandas What is said in the source code ?

In fact, learning Pandas It's simple , Don't rush around online , All code functions , The founders and developers have written in the source code through annotation .

How to find pandas Source code ?

Download it pandas in the future , We'll open it pandas Source code , have a look pandas What are the recommended reading methods .pandas Source path :D:\ Yours python The installation directory \Lib\site-packages\pandas\

After opening the source code ,pandas There are multiple directory structures under the folder , As shown in the figure below , We need to read Excel function , stay pandas\io\excel\_base.py In the document 290 That's ok -350 That's ok . As shown in the figure below

Now that you have found the source code , So here comes the question What does the source code tell us ?

3、6 There are two kinds of reading Excel The way

Now we will get from the above pandas Source code , Analyze this one by one 6 There are two kinds of reading excel The way .

1、 Specify index column read

This way of reading , fit Excel The data in , There is a column indicating the serial number .

pd.read_excel('fake2excel.xlsx', index_col=0)

#  Use index_col=0, Designate the 1 Column as index column .

The results are shown in the following figure :

  • Column names are not aligned , There is no problem with the code , Because then the column is treated as an index column .

This method does not meet the requirements of our document , So we can make the following modifications : Do not specify index columns .

The code and results are as follows :

pd.read_excel('fake2excel.xlsx', index_col=None)

2、 Appoint sheet Read

See the name and know the meaning .

pd.read_excel(open('fake2excel.xlsx', 'rb'), sheet_name='Sheet2')

#  Use sheet_name=0, Specify read sheet2 What's in it .

We added sheet2, The results are shown in the following figure :

  • In this case , Don't read sheet1 What's in it

3、 Cancel header Read

Read data without column name .

pd.read_excel('fake2excel.xlsx', index_col=None, header=None) 

#  Use header=None, Cancel header Read .

The results are shown in the following figure :

  • In this case , Suitable for the original Excel There is no listing in the table .
  • In case of listing in our document , Column names are also treated as data .

4、 Specify the read format

This is suitable for high-end players , In the case of high precision or fast speed requirements for data processing .

pd.read_excel('fake2excel.xlsx', index_col=0, dtype={'age': float})  

#  Use dtype, Specify the data type of a column .

The results are shown in the following figure :

  • We added a column : Age , It was originally an integer , But specify float After type , Read out into a small book .
  • This kind of reading , It is more suitable for situations with special requirements for data , for example : Financial industry .

5、 Custom missing value

What is the use scenario ? For example, when collecting information, according to the time , It is found that someone filled in a negative age , Then automatically clear his age , Ask him to fill in again .

pd.read_excel('fake2excel.xlsx', index_col=None,na_values={'name':" Pang Qiang "}) 

#  Use na_values, Define the data that is not displayed

The results are shown in the following figure :

  • In our form , There is a man whose name is : Pang Qiang, we don't want to show this person's name
  • So we're in na_values Appoint :name This column is Pang Qiang's name , Set to empty , stay pandas Empty value can be used NaN Express .

6、 Handle Excel Comment line in

Not only Python You can write notes ,Excel You can also write notes . Many people haven't used , Used friends in the comment area say why you give Excel Write notes ~?

pandas Provides handling Excel Comment line method .

pd.read_excel('fake2excel.xlsx', index_col=None, comment='#') 

The results are shown in the following figure :

4、 At the end

As Python The programmer , You need to read the source code at ordinary times , Recognize the principle and logic behind the code .

Recently used pandas More , Just right pandas Can also be handled excel, So it will be updated continuously in the near future pandas Articles used .

What do you want to see next , Let me know in the comments section


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved