程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

10 quickstart Query function USES the Pandas Query examples

編輯:Python

pandas.的queryFunction provides us with a written query filter conditions easier way,Especially in the query conditions a lot,In this article to sort out10個示例,掌握著10The use of an instance you can easilyqueryFunction to solve the problem of any query.

首先,將數據集導入pandas DataFrame - df

import pandas as pd
df = pd.read_csv("Dummy_Sales_Data_v1.csv")
df.head()

它是一個簡單的9999 x 12數據集,是使用Faker創建的,I will provide at the end of this article all the source code.

在開始之前,先快速回顧一下pandas -The query functionquery.The query function is used to according to the specified expression to extract the record,並返回一個新的DataFrame.Expression is in the form of a string of conditions or a combination of.

PANDAS DATAFRAME(.loc和.iloc)Property is used to extract the data according to the row and column labels and index set a subset of the.因此,It doesn't have the flexibility of query.The bracket symbol[]Can be flexibly based on filter data frames,But if the conditions for a lot of code is tedious and error-prone.

pandas query()Functions can be flexibly according to one or more of the conditions of extracting subset,These conditions are written expression and does not need to consider nested parentheses

在後端pandas使用eval()Function to parse and evaluate the expression,And return the expression is evaluated asTRUESubset of data or records.So to filterpandas DataFrame,Need to do is specified in the query function requirement.

Using a single condition to filter

Under the condition of a single filter,在Query()Function of expression contains only one condition.Returns the output will contain the expression evaluation is really all the lines.

示例1

提取數量為95的所有行,So the conditions in the logical form can be written as -

Quantity == 95

Need to write it conditions the string,Is the packaging in double quotes“”中.query代碼如下

df.query("Quantity == 95")

看起來很簡單.It returns the number of95的所有行.If written in general query ways of:

df [df [“Quantity”] == 95]

但是,If you want to be in the same column contains a condition to do again?

It adds a pair of square brackets in bracket symbol,如果是3A condition or more?So he became difficult to manage.這就是Query的優勢了.

In more than one filter

Under the condition of one or more filter,query()Grammar is unchanged

But you need to specify two or more conditions to filter the way

  • and:Meet two conditions to all records
  • or:Return to satisfy any conditions all records

示例2

查詢數量為95&The unit price for182 ,Here include the unit price column is calledUnitPrice(USD)

因此,條件是 -

Quantity == 95
UnitPrice(USD) == 182

那麼代碼就是:

df.query("Quantity == 95 and UnitPrice(USD) == 182")

This query, an:

但是為什麼報錯?

這是因為query()Function of the column name has some limits.列名稱UnitPrice(USD)是無效的.We will use the quotation marks the column name contains up

df.query("Quantity == 95 and `UnitPrice(USD)` == 182")

當兩個條件滿足時,只有3個記錄.

Or we directly change the column to the format of the reasonable:

df.rename(columns={'UnitPrice(USD)':'UnitPrice',
'Shipping_Cost(USD)':'Shipping_Cost',
'Delivery_Time(Days)':'Delivery_Time'},
inplace=True)

Here there is no need to use the quotes:

df.query("Quantity == 95 and UnitPrice == 182")

示例3

Now we only need to meet one condition:

df.query("Quantity == 95 or UnitPrice == 182")

It returns to meet two conditions of a condition of all the columns.

我們也可以使用 | 替代 or關鍵字.

示例4

False assumptions for quantity is not equal to95的所有行.The simplest answer is in the condition before usingnotThe keyword or negation operator〜

df.query("not (Quantity == 95)")

Results it contains quantity is not95的所有行.

Not necessarily equal operator actually conditions here,可以從==,!=,>,<,≥,≤中選擇,例如

df.query("Quantity != 95")

Text filtering column

For text column filter,On condition that the column name with string comparison.

請Query()Expression is a string.So how do you write a string in another string?To pack text values in single quotation marks“”中,就可以了

示例5

Want to get the state“未發貨”所有記錄,可以在query()The form of written expression as follows:

df.query("Status == 'Not Shipped'")

It returns all the records,The status column contains the value - “未發貨”.

Similar to numerical value can be in the same or different columns using multiple conditions,And can be numerical and non-numerical column on the condition of combination.

除此以外, Pandas Query()Can also be used in the query expression math

In the query simple math

Mathematical operations can be listed in the,減,乘,除,And even values in a column or square, etc,如下所示:

示例6

df.query("Shipping_Cost*2 < 50")

Although the quadratic operation without any practical significance,But our sample returned all meet the requirements of line.

We can on one or more columns contains some complicated calculations.

示例7

Let's write a more complicated formula:

df.query("Quantity**2 + Shipping_Cost**2 < 500")

If you use the most primitive[]的形式,The formula of the query basically can't complete,但是使用query()Function was simple

In addition to the mathematical operations,Can also be used in the query expression built-in function.

In the query built-in function

Python內置函數,例如SQRT(),ABS(),Factorial(),EXP()等,Can also be used in the query expression.

示例8

Find the unit price of square root more than15的行

df.query("sqrt(UnitPrice) > 15")

query()Function can also be in the same query expression integrating function and mathematics using

示例9

df.query("sqrt(UnitPrice) < Shipping_Cost/2")

到目前為止,All query examples are about numerical and text columns.但是,query()Also is not limited to these data types,For the date/time values Query()Function can also be very flexible filter.

Date/time column filter

使用Query()Function in the query on the date/time values only requirement is that,Contains column shall be the data type of the valuesdateTime64 [ns]

在示例數據中,OrderDateColumn is the date/time,但是我們的dfIts resolve to strings,So we need to transform first:

df["OrderDate"] = pd.to_datetime(df["OrderDate"], format="%Y-%m-%d")

In order to extract useful information about the date and in theQuery()需要使用DT提取器,DTIs a kind of access object,Used to extract date/time,例如DateTime系列的屬性.

示例10

All the records for August

df.query("OrderDate.dt.month == 8")

All records are in August.OrderDate.dt.month顯示了如何使用DTVisitors to extract only the date value in value.

如果提取2021年8On order for15All orders or more,可以寫成這樣

df.query("OrderDate.dt.month == 8 and OrderDate.dt.year == 2021 and OrderDate.dt.day >=15")

DTUseful and can be in the same column is a combination of multiple conditions,But expression seems too long.So it can be very simple expression to filter through writing more:

df.query("OrderDate >= '2021-08-15' and OrderDate <= '2021-08-31'")

We passed directly a conform to the date format string,It will automatically convert and compare

All of the above content integration:

df.query("OrderDate >= '2021-08-15' and OrderDate <= '2021-08-31' and Status == 'Delivered'")

The query expression contains a date/time columns of text and conditions,It returns all the records of the accord with the query expression

替換

The query above will generate a newdf.這是因為:query()的第二個參數(inplace)默認false.

與一般的pandasProvide the function of,Inplace的默認值都是false,The query will not modify the original data set.If we want to cover the originaldf時,需要將intplace = true.But must be careful when you useintplace = true,Because it will cover the original data.

總結

我希望在閱讀本文後,You can be more frequent,Fluently usePandas Query()函數,因為QueryCan be convenient to filter data sets.The query function more or less every day I use.

In this paper, all the sample code here:

https://avoid.overfit.cn/post/2f07763913a948a5b074e2430b8b2b8e


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved