程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Pandas vs. SQL 1_ nanyangjx

編輯:Python

The present python It is widely used in data processing and quantitative analysis data mining , This article will focus on python in pandas Under bag DataFrame The use of this data type is described in detail , And its equivalent in data processing SQL The contrast in , Don't talk much , On and off !!!
https://blog.csdn.net/weixin_39818658/article/details/79944925

keyword :pandas、DataFrame、SQL

1、 About filtering data table columns

Yes DataFrame( That is, a two-dimensional data table )data:

Use

data=DataFrame(np.arange(9).reshape(3,3),columns=[‘a’,‘b’,‘c’])

Statement to generate the DataFrame

data :

a b c

0 0 1 2

1 3 4 5

2 6 7 8

If we only take data Of a Column Use a slicing operation directly ( notes : The selection slice operation on the column is a two-layer bracket ):

data[[‘a’]]

--------->

0 0

1 3

2 6

Corresponding SQL Statement for :

select a from data

If we want to take a and b The two fields are also slicing operations :

data[[‘a’,‘b’]]

------>

a b

0 0 1

1 3 4

2 6 7

Corresponding SQL Statement for :

select a,b from data

First, let's create two here DataFrame:

Basic information data sheet of the company

companydata=DataFrame({'ID':['aa','bb','cc'],
'name':[' company A',' company B',' company C'],
'establish_year':['2001','2002','2003'],
'city':['ShangHai','BeiJing','ShenZhen']})

----->

ID city establish_year name

0 aa ShangHai 2001 company A

1 bb BeiJing 2002 company B

2 cc ShenZhen 2003 company C

List of products issued by the company :

productdata=DataFrame({'ID':['aa','aa','bb','bb','cc'],
'productname':['stock_A','bond_A','stock_B','bond_B','stock_C']
})

--------->

ID productname

0 aa stock_A

1 aa bond_A

2 bb stock_B

3 bb bond_B

4 cc stock_C

2、 Index of data 、 selection 、 Filter

(1) Single condition screening

First select ’ company A’ All the information about , The operation is as follows :

companydata[companydata[‘name’]==‘ company A’]
---------------------------------------------------------------------------->
ID city establish_year name

0 aa ShangHai 2001 company A
amount to SQL:
select * from companydata where name=’ company A’

(2) Multi criteria screening ( notes : The multi criteria filter criteria should be written in brackets )

selection ’ company B’ also ’city’ by ’BeiJing’, The operation is as follows :

companydata[(companydata[‘name’]‘ company B’)&(companydata[‘city’]‘BeiJing’)]

-------->

ID city establish_year name

1 bb BeiJing 2002 company B

amount to SQL:

select * from companydata where name=’ company A’ and city=’BeiJing’

selection ’ID’ by ’aa’ perhaps ’cc’ All the information about , The operation is as follows :

companydata[(companydata[‘ID’]‘aa’)|(companydata[‘ID’]‘cc’)]

------>

ID city establish_year name

0 aa ShangHai 2001 company A

2 cc ShenZhen 2003 company C

amount to SQL:

select * from companydata where ID=’aa’ or ID=’cc’

perhaps

companydata[companydata[‘ID’].isin([‘aa’,‘cc’])]

------->

 ID city establish_year name

0 aa ShangHai 2001 company A

2 cc ShenZhen 2003 company C

amount to SQL:

select * from companydata where ID in (‘aa’,’cc’)

(3) Sometimes we don't need all the fields after filtering, but only one or several fields ,

for example :

View company A also ’ID’ by ’aa’ Of ’ Year of establishment ’ and ’ city ’

We need to apply it to DataFrame An index method of ix, The operation is as follows :

companydata.ix[(companydata[‘name’]‘ company A’)&(companydata[‘ID’]‘aa’),[‘city’,‘establish_year’]]

---------->

 city establish_year

0 ShangHai 2001

amount to SQL:

select city,establish_year from companydata where ID=’aa’ and name=’ company A’

3、 Label according to different data , That is to say SQL Medium case when … then … operation

for example : When the city where the company is located ’city’ ,’ShangHai’ When is ’SH’,’BeiJing’ When is ’BJ’,’ShenZhen’ When is ’SZ’, The operation is as follows :

fuc=lambda x:'SH' if x=='ShangHai' else 'BJ' if x=='BeiJing' else 'SZ' if x=='ShenZhen' else np.NaN
companydata['citylabel']=companydata['city'].apply(fuc)
print(companydata)

( I can only think of such a way of writing about this , If there is a clever way , Please grant me your advice )

------->

ID city establish_year name citylabel

0 aa ShangHai 2001 company A SH

1 bb BeiJing 2002 company B BJ

2 cc ShenZhen 2003 company C SZ

amount to SQL:

select
ID,
city,
establish_year,
name,
case ‘city’ when ‘'ShangHai' ’ then ‘SH’
when 'BeiJing' then ‘BJ’
when ‘'ShenZhen' ’ then ‘SZ’ else null
end as citylabel
from companydata

4、 Aggregation of data

example :

surface productdata Yes ID Grouping , And calculate each ID There are several financial products

productdata[productname].groupby(productdata[‘ID’]).count()

------>

ID

aa 2

bb 2

cc 1

Name: ID, dtype: int64

amount to SQL:

select count(productname) from productdata group by ID

productdata[productname].groupby([productdata[‘ID’],productdata[‘productname’]]).count()

----------->

ID productname

aa bond_A 1

 stock_A 1

bb bond_B 1

 stock_B 1

cc stock_C 1

Name: ID, dtype: int64

amount to SQL:

select count(productname) from productdata group by ID,‘productname’

5、 Multi table association operation

In general, data processing is not a single table operation , Generally, multiple tables are associated .

example :

Make statistics on what financial products each company has , The operation is as follows :

pd.merge(companydata,productdata,left_on=‘ID’,right_on=‘ID’,how=‘left’)[[‘ID’,‘name’,‘productname’]]

------>

ID name productname

0 aa company A stock_A

1 aa company A bond_A

2 bb company B stock_B

3 bb company B bond_B

4 cc company C stock_C

amount to SQL:

Select A.ID,A.name,B.productname from companydata A left join productdata on A.ID=B.ID

Here is a brief description of the method merge Parameters of

(1) The first two parameters that must be passed in are two DataFrame , I won't elaborate here

(2)left_on Is the left table as the associated foreign key

(3)right_on Is the right table as the associated foreign key

(4)how Is the way of association , There were left\right\inner\outer, Corresponding to SQL Medium left\right\inner\full Four kinds of join The way


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved