程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

[Python] 11 moves comparison pandas double column summation

編輯:Python

official account : Youer cottage

author :Peter
edit :Peter

Hello everyone , I am a Peter~

This paper introduces 11 Two ways to compare Pandas in DataFrame Sum of two columns

  • direct_add

  • for_iloc

  • iloc_sum

  • iat

  • apply( Specified field )

  • apply( For the whole DataFrame)

  • numpy_array

  • iterrows

  • zip

  • assign

  • sum

Send books at the end of the article , Send books at the end of the article , Send books at the end of the article !

The data simulation

In order to have a clear effect , Simulated a 5 Million pieces of data ,4 A field :

import pandas as pd
import numpy as np
data = pd.DataFrame({
    "A":np.random.uniform(1,1000,50000), 
    "B":np.random.uniform(1,1000,50000),
    "C":np.random.uniform(1,1000,50000),
    "D":np.random.uniform(1,1000,50000)
})
data

11 Functions

Here is the passage 11 Three different functions to implement A、C The data of two columns are added and summed E Column

Method 1: Direct additive

hold df The two columns of are added directly

In [3]:

def fun1(df):
    df["E"] = df["A"] + df["C"]

Method 2:for+iloc location

for sentence + iloc Method

In [4]:

def fun2(df):
    for i in range(len(df)):  
        df["E"] = df.iloc[i,0] + df.iloc[i, 2]  # iloc[i,0] location A Columns of data 

Method 3:iloc + sum

iloc Method specifies the sum of columns for all rows :

  • 0: First column A

  • 2: The third column C

In [5]:

def fun3(df):
    df["E"] = df.iloc[:,[0,2]].sum(axis=1)  # axis=1 Means to operate on a column 

Method 3:iat location

for sentence + iat location , Analogy to for + iloc

In [6]:

def fun4(df):
    for i in range(len(df)):
        df["E"] = df.iat[i,0] + df.iat[i, 2]

apply function ( Read only two columns )

apply Method , Just take out AC Two

In [7]:

def fun5(df):
    df["E"] = df[["A","C"]].apply(lambda x: x["A"] + x["C"], axis=1)

apply function ( All df)

For the front DataFrame Use apply Method

In [8]:

def fun6(df):
    df["E"] = df.apply(lambda x: x["A"] + x["C"], axis=1)

numpy Array

Use numpy Array resolution

In [9]:

def fun7(df):
    df["E"] = df["A"].values + df["C"].values

iterrows iteration

iterrows() Iterate over each row of data

In [10]:

def fun8(df):
    for _, rows in df.iterrows():
        rows["E"] = rows["A"] + rows["C"]

zip function

adopt zip The function will now AC Two columns of data are compressed

In [11]:

def fun9(df):
    df["E"] = [i+j for i,j in zip(df["A"], df["C"])]

assign function

Through derived functions assign Generate new fields E

In [12]:

def fun10(df):
    df.assign(E = df["A"] + df["C"])

sum function

At the designated A、C Use... On both columns sum function

In [13]:

def fun11(df):
    df["E"] = df[["A","C"]].sum(axis=1)

result

call 11 Functions , Compare their speed :

Count the mean value of each method , And put them into the same us:

Method result Unified (us) Direct additive 626us626for + iloc9.61s9610000iloc + sum1.42ms1420iat9.2s9200000apply( Only the specified column )666ms666000apply( All columns )697ms697000numpy216us216iterrows3.29s3290000zip17.9ms17900assign888us888sum(axis=1)1.33ms1330
result = pd.DataFrame({"methods":["direct_add","for_iloc","iloc_sum","iat","apply_part","apply_all",
                                  "numpy_arry","iterrows","zip","assign","sum"],
                      "time":[626,9610000,1420,9200000,666000,697000,216,3290000,17900,888,1330]})
result

Visualize in descending order :

result.sort_values("time",ascending=False,inplace=True)
import plotly_express as px
fig = px.bar(result, x="methods", y="time", color="time")
fig.show()

From the results we can see :

  • for Loops are the most time consuming , Use numpy Arrays save the most time , Difference between 4 More than ten thousand times ; Mainly because Numpy Vectorization operation used by array

  • sum function ( Specify axis axis=1) The effect is obviously improved

summary : If we save energy, we will save , Use... As much as possible Pandas perhaps numpy Built in functions to solve .

 Past highlights
It is suitable for beginners to download the route and materials of artificial intelligence ( Image & Text + video ) Introduction to machine learning series download Chinese University Courses 《 machine learning 》( Huang haiguang keynote speaker ) Print materials such as machine learning and in-depth learning notes 《 Statistical learning method 》 Code reproduction album machine learning communication qq Group 955171419, Please scan the code to join wechat group 


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved