程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Rescue pandas plan (21) -- obtain the beginning and end of the month of the month to which the specified date belongs

編輯:Python

save pandas plan (21)—— Gets the beginning and end of the month of the month to which the specified date belongs

    • / Data requirement
    • / Demand processing
    • / summary

Recently, I found that many friends around me are not happy to use pandas, Switch to other data operation Libraries , As a data worker , Basically open your mouth pandas, Closed mouth pandas 了 , So I wrote this series to make more friends fall in love with pandas.

Series article description :

Series name ( Serial number of series articles )—— This series of articles specifically address the needs

platform :

  • windows 10
  • python 3.8
  • pandas >=1.2.4

/ Data requirement

Recently I was reading a book about using pandas A book for data processing , On 2020 Published in , There is a section dealing with the sales date of online retail goods , Get the corresponding month beginning date in the date column . The data is read as follows :

import pandas as pd
# According to the code in the book , take InvoiceDate Resolve to date type 
df = pd.read_csv('Online_Retail.csv.zip', parse_dates=['InvoiceDate'])
df = df.dropna().copy()

ps: Data acquisition method :

github:
https://github.com/lk-itween/FunnyCodeRepository/raw/main/PandasSaved/data/Online_Retail.csv.zip


(406829, 9)

/ Demand processing

as everyone knows , In the current calendar, the beginning of each month is the beginning of the month 1 Number , There are also many ways to obtain , This article lists one or two .

  • datetime Set up

The code given in the example in the book is to separate the month, year and day , then 1 No. is spliced into new date data .

def get_month_start(x):
return datetime(x.year, x.month, 1)
df['MonthStart'] = df['InvoiceDate'].map(get_month_start)

  • pandas…MonthBegin, MonthEnd

pandas There are also functions that handle time variables in , You don't need to write your own logic to get the beginning and end dates of the month . But you should pay attention to , The following are some cases and corresponding solutions during the demonstration .

from pandas.tseries.offsets import MonthBegin, MonthEnd
# Construct demo samples 
df2 = pd.to_datetime(['2022-9-1', '2022-9-2', '2022-9-29', '2022-9-30',
'2022-10-1', '2022-10-2', '2022-10-30', '2022-10-31']).to_frame(name='date')

Set the interval parameter n Set to 0, That is to obtain the beginning and end date of the current month , It can be clearly seen in the figure that only when it is the beginning of the month, the beginning of the month can be correctly obtained , The remaining dates will be obtained as the beginning of the next month , The month end date can be obtained correctly .

Then set the interval parameter n Set to 1, Get next month's date , The effect is as follows :

At this point, the month beginning function can correctly obtain the month beginning date of the next month , The month end function can correctly obtain the end date of the next month only when the date is month end .

How to correctly obtain the date of the above error condition , Those that have been correctly obtained will not be repeated , Can be learned that , It is right to obtain the situation at the beginning of next month and the function at the end of this month , The correct result can be converted into the correct target value after adding and subtracting once .

At the beginning of this month :

End of next month :

df[‘InvoiceDate’] The date data in contains time , The time will not be deleted at the beginning and end of the month , Use .dt.floor('D') Capture the date and then get .

df['InvoiceDate'].dt.floor('D') + MonthBegin() - MonthBegin()

The time required to convert using this method is very little compared with the method given in the book .
( Manual watermark : original CSDN The fate of the sleepers ,https://blog.csdn.net/weixin_46281427?spm=1011.2124.3001.5343 , official account A11Dot send )

  • period(‘M’) Of dt Method

stay pandas Periodic date data can be generated in , Of course, you can convert dates into cycles , What this chapter needs to obtain is the beginning and end date of the month , You need to convert the date to the cycle data in month .

df3 = pd.period_range('2021-10', '2022-05', freq='M').to_frame(name='date')

A set of date data with month as cycle is generated . For data that is already a date type, you can use .dt.to_period Method to convert .

Periodic data is compared with the date dt Method , More start_time and end_time, Get the month beginning date and month end date of the current date respectively .

because end_time It will directly return the last time in milliseconds , Need to use floor Intercept date .

You can also use dt.asfreq Get the beginning and end date of the month , Take two parameters :

freq : str # A frequency parameter , Such as A Represents the year ,M Representative month ,D On behalf of the day 
how : str {
'E', 'S'}
# Last : 'E', 'END', or 'FINISH' for end,
# Start : 'S', 'START', or 'BEGIN' for start. 

You need to convert the monthly cycle to the daily cycle , The result is the beginning of the month , You can set it like this :

df3['date'].dt.asfreq('D', how='S')

To return to the end of the month , You can set it like this , Parameter name is not necessary , Default end date :

df3['date'].dt.asfreq('D', 'E')

Compare the two time-consuming situations in the sample data .

asfreq It seems to be better than start_time It will take less time , At the same time, note that the result types after conversion are different , Two dt Some properties of the method 、 The method is different , If you need to convert a period type to a date type , Can be asfreq Change to to_timestamp, Parameters are consistent , It takes a little longer , Results and start_time similar .

notes :

  1. The data used in this article is of date type , If the date is a string Series type , It can be done by pd.to_datetime(s, format), take format Set the corresponding format parameters to date type, and then test several methods mentioned in the article .

  2. The source data can be obtained at the beginning of the article .

/ summary

This article introduces examples , Explain separately pandas There are several ways to get the beginning and end date of a month , Obviously there are other ways to get ,pandas The operation of processing the date into vectorization , Compared to initialization datetime Type data , The method is simple 、 Efficient , It was also mentioned in the previous articles of the series , Vectorization is more common than parameter definition , The execution efficiency of initialization should be high .

The sudden rain and strong wind make people fall down , God must be impressed .


Made on June 24, 2002


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved