您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

拯救pandas計劃（20）——統計零售商店的每月訂單量

編輯：Python

拯救pandas計劃（20）——統計零售商店的每月訂單量

最近發現周圍的很多小伙伴們都不太樂意使用pandas，轉而投向其他的數據操作庫，身為一個數據工作者，基本上是張口pandas，閉口pandas了，故而寫下此系列以讓更多的小伙伴們愛上pandas。

系列文章說明：

系列名（系列文章序號）——此次系列文章具體解決的需求

平台：

windows 10
python 3.8
pandas >=1.2.4

/ 數據需求

最近在看一本關於使用pandas進行數據處理的書，於2020年出版，其中有一段對在線零售商品的統計數據處理部分，每個訂單每個商品是單獨記錄，所以在只關心訂單時會發現有多個同樣的訂單號存在，此篇討論如何統計每月的訂單量。數據讀取如下：

import pandas as pd
df = pd.read_csv('Online_Retail.csv.zip', parse_dates=['InvoiceDate'])
df_new = df.dropna().copy()  
# 拆出月份
df_new['YearMonth'] = df_new['InvoiceDate'].map(lambda x: 100 * x.year + x.month)

ps: 數據獲取方式，後台回復【零售】。

(406829, 9)

/ 需求處理

由於只關心訂單號，重復的訂單號會使數據統計不准確，需要將訂單號去重後再統計。

方式一：書中使用unique後再統計。

df_new.groupby('InvoiceNo')['YearMonth'].unique().value_counts().sort_index()

pandas從2020年發展至今已更新多次，此前書中方法可能無法執行，如此處會產生如下報錯，原因為unique()執行後每行數據為列表類型，value_counts不能處理。

將代碼更改如下就可以完成需求。

df_new.groupby('InvoiceNo')['YearMonth'].unique().explode().value_counts().sort_index()

（手動水印：原創CSDN宿者朽命，https://blog.csdn.net/weixin_46281427?spm=1011.2124.3001.5343 ，公眾號A11Dot派)

方式二：對groupby結果使用value_counts去重再統計。

df_new.groupby('InvoiceNo')['YearMonth'].value_counts().reset_index(name='count')['YearMonth'].value_counts().sort_index()

第一個value_counts的作用就是對YearMonth去重，需要的列名已作為索引，通過reset_index將索引重置為列數據，再對YearMonth進行value_counts統計每月的訂單量。

在同一台電腦上，這一方法比書中提到的方法要快，可能unique在處理上需要消耗一定時間，然而這種處理卻把思想弄復雜了，pandas去重處理可以直接使用drop_duplicates。

方式三：drop_duplicates去重後統計。

df_new[['InvoiceNo', 'YearMonth']].drop_duplicates()['YearMonth'].value_counts().sort_index()

對比前兩種方法，代碼簡短了不少，處理時間也減少了。

/ 總結

本篇通過引入書中例子，復現書中代碼，結合現有數據處理方法，逐步優化代碼處理方式，闡述各個方法的異同點，完成數據需求。源數據可通過文章開頭處獲取。

靜觀天色，曉聽風雨。

於二零二二年六月二十二日作

上一篇文章：拯救pandas計劃（19）——使用自定義方法計算兩列的相似度
下一篇文章： Python detailed installation configuration tutorial

Python

python requests.post 請求返回415的一個問題

背景：練習requests 請求的時候向一個網站發起post

Python灰度圖像彩色化

1️⃣作業要求給定一幅灰度圖像，使用任意方法將其變成一幅彩色

Share a few useful Python modules, it is recommended to collect!

作者 | 俊欣來源 | 關於數據分析與可視化今天小

python爬蟲存入sqlite數據庫最後表是空的

python爬蟲存入sqlite數據庫，不清楚為什麼表裡總是

Software use cases in the micropython kernel development notebook: Chapter 5 - Basic pin input and output functions

Jane Medium ： This paper gives

[Python artificial intelligence] Python full stack system (19)

Artificial intelligence Chapt

没有相关文章

熱門圖文

Fix the data structures commonly used in Python 趣味問題《尋人啟事》的Python程序解決探討捕獲php錯誤信息方法的詳解 aix系統-AIX系統和常見Linux系統有什麼區別？ C++ STL 基礎及應用(2) 模板與操作符重載 C#中創立PDF網格並拔出圖片的辦法 php str_replace的替換漏洞 PHP實現將textarea的值根據回車換行拆分至數組

欄目導航