您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Incorporated (Pandas data analysis of 13 - box the Vlookup function of excel)

編輯：Python

參考書目：《深入淺出Pandas：利用Python進行數據處理與分析》

pandasData frames sometimes we need to merge,Operates on multiple data frames together.pandasThere are many uses in it,了解一下

導入包

import numpy as np
import pandas as pd

append()

append和列表的用法一樣,Append directly after the data frame

df.append(self,other,ignore_index=False,#重置索引
verify_integrity=False,sort=False)

df1=pd.DataFrame({'x':[1,2],'y':[3,4]})
df2=pd.DataFrame({'x':[5,6],'y':[7,8]})
df3=pd.DataFrame({'z':[1,2],'y':[3,4]})
df1.append(df3,ignore_index=True)
#追加一條數據
df1.append({'x':31,'y':90},ignore_index=True)

數據連接 pd.concat()

concatThere will be more usage,Below are his parameters

pd.concat(obj ,axis=0,join='outer',#inner交集,outer並集
ignore_index=False,keys=None,#連接關系
levels=None,names=None,#索引名稱
sort=False,verify_integrity=False,copy=True)

基本用法

#行連接
pd.concat([df1,df2])
pd.concat([df1,df2],axis=0)

#列連接
df2=pd.DataFrame({'x':[5,6,0],'y':[7,8,80]})
pd.concat([df1,df2],axis=1)

取交集

pd.concat([df1,df2],axis=1,join='inner') #默認並集outer

#指定索引
pd.concat([df1,df2],keys=['a','b'])
pd.concat({'a':df1,'b':df2})
pd.concat([df1,df2],axis=1,keys=['a','b'])

多個文件合並

### 多文件合並 process_your_file處理文件為df
frames=[process_your_file(f) for f in files ]
result = pd.concat(frames)
#一個包含多個 Sheet 的 Excel 合並成一個 DataFrame
dfm = pd.read_excel('team.xlsx', sheet_name=None)
pd.concat(dfm.values())
pd.concat(dfm) # 保留 Sheet name as a primary index

File directory merge

#File directory merge
import glob # Remove everything in the directory xlsx 格式文件
files = glob.glob("data/*.xlsx")
cols = ['記錄ID', '開始時間', '名稱'] # Only take these columns
# List comprehension objects
dflist = [pd.read_excel(i, usecols=cols) for i in files]
df = pd.concat(dflist) # 合並

map操作

#map操作 # Read multiple files one by one,合並
pd.concat(map(pd.read_csv, ['data/d1.csv','data/d2.csv','data/d3.csv']))
pd.concat(map(pd.read_excel, ['data/d1.xlsx','data/d2.xlsx','data/d3.xlsx']))

# 目錄下所有文件
from os import listdir
filepaths = [f for f in listdir("./data") if f.endswith('.csv')]
df = pd.concat(map(pd.read_csv, filepaths))
# 方法 2
import glob
df = pd.concat(map(pd.read_csv, glob.glob('data/*.csv')))
df = pd.concat(map(pd.read_excel, glob.glob('data/*.xlsx')))

數據合並pd.merge()

mergeMore usage,You can merge by keyword,並且可以實現excel表的VLOOKUP函數的功能.

基本參數說明

pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=True,
suffixes=('_x', '_y'), copy=True, indicator=False,
validate=None)

'''how：連接方式,默認為inner,可設為inner/outer/left/right
on：根據某個字段進行連接,必須存在於兩個DateFrame中（若未同時存在,則需要分別使用left_on 和 right_on 來設置）
left_on：左連接,以DataFrame1中用作連接鍵的列
right_on：右連接,以DataFrame2中用作連接鍵的列
left_index：bool, default False,將DataFrame1行索引用作連接鍵
right_index：bool, default False,將DataFrame2行索引用作連接鍵
sort：根據連接鍵對合並後的數據進行排列,默認為True
suffixes：對兩個數據集中出現的重復列,新數據集中加上後綴 _x, _y 進行區別'''

基本用法

left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
pd.merge(left, right, on='key')

索引連接

#索引連接
pd.merge(left.set_index('key'), right.set_index('key'), left_index=True,right_index=True)

multi-key connection

#multi-key connection
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
pd.merge(left, right, on=['key1', 'key2'])

連接方法——Can be based on a folder's key,You can also take intersection or union

#連接方法
pd.merge(left, right, how='left', on=['key1', 'key2'])# Take the left as the table base table
pd.merge(left, right, how='right', on=['key1', 'key2'])# The table on the right is the base table
pd.merge(left, right, how='outer', on=['key1', 'key2'])# 取兩個表的並集
pd.merge(left, right, how='inner', on=['key1', 'key2'])# 取兩個表的交集
#Below is an example with a repeating join key：
left = pd.DataFrame({'A': [1, 2], 'B': [1, 2]})
right = pd.DataFrame({'A': [4, 5, 6], 'B': [2, 3, 4]})
pd.merge(left, right, on='B', how='outer')

#連接指示 indicator

#連接指示 indicator
#如果設置 indicator 為 True, 則會增加名為 _merge 的一列,Show where this column comes from
pd.merge(left, right, on='B', how='outer', indicator=True)

案例

實現excel的vlookup功能,Query and merge the sales prices of different products sold on the sales records,and calculate the total sales
First, the order table and the price table are randomly generated

import random
sale=np.random.randint(2,20,size=(100))
sale_kind=[]
for i in range(100):
sale_kind.append(random.choice(['香蕉','蘋果','橘子','西瓜','葡萄','草莓','芒果','西柚']))
data=pd.DataFrame({'Type of sale':sale_kind,'售賣數量':sale})
data

整了8個水果,Randomly generated theirs100條銷售記錄

#Randomly generate price lists
price=pd.DataFrame(['香蕉','蘋果','橘子','西瓜','葡萄','草莓','芒果','西柚'],columns=['水果種類'])
price['水果價格']=abs(np.random.randn((len(price)),1))*10
price

Randomly generated price,,,I don't know how much a pound of fruit is now.

Calculate the sales for each record,On the left is the sales record,So when merging,Based on the data on the left.Add the prices on the right to the data frame one by one.The corresponding price is then multiplied by the quantity sold,Generate a new column of sales.

pd.merge(data, price ,how='left',left_on='Type of sale',right_on='水果種類').assign(銷售額=lambda d:d.售賣數量*d.水果價格)

Take the sales column,求和.

#計算銷售額
pd.merge(data, price ,how='left',left_on='Type of sale',right_on='水果種類').assign(銷售額=lambda d:d.售賣數量*d.水果價格)['銷售額'].sum(0)

還是很方便的.merge()Data merging rules are still very complex,To be used according to the specific situation

上一篇文章： Pandas數據分析13——數據框合並(實現excel的Vlookup功能)
下一篇文章： Question about #python#: Why cant I open the web page after setting the proxy ip? It keeps showing the web page error, please help

Python

強推10款Python常用的開發工具

要熟練使用python解決各種問題，開發工具必不可少，其中就

Python personnel information management system

1. Modules involved datetimeos

Master 8 examples of Python dictionary

Click on the above “ Xiaobai s

python語法基礎

目錄1.import 和 from … import 模塊的

無語——真的好用到不行的7個Python小技巧

本文總結了我幾個我在學習python過程中，用到的幾個超好用

Strange ways of calculating pi have increased -- Python uses the laws of physics to simulate the calculation of PI

Calculate the PI with the cons

Pandas uses the split function to split the specific string data column of dataframe into two new data columns and generate a new dataframe

51job crawler + data visualization Python

Python data structure problems

Introduction to Python data structure and algorithm

Django project - order module (next) and data statistics_ 11 [more readable version]

Python data analysis - pandas data structure (dataframe)

Python data analysis science library pandas (statistical analysis and decision)

Python -- data visualization using Matplotlib Library

I read a value from a file. How can I make this data value locate according to the value I read (Language Python)

Python implements the cell filling color of data required in Excel

熱門圖文

為什麼C#的StreamWriter.Close()會引起windows平台上的關閉程序的問題？深度優先遍歷，深度優先解讀.NET Framework中的COM+與MTS java之JMX PHP 5.3和PHP 5.4出現FastCGI Error解決方法 LeetCode Sort List hdu1547之BFS jquery ui-jQueryUI sortable，有一些疑惑希望的到大伙的幫助！

欄目導航