程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

In depth understanding of pandas sorting mechanism

編輯:Python

author :Peter edit :Peter

Hello everyone , I am a Peter~

In a previous article , Describes in detail how to use pandas Built in functions for sort_values To sort the data . This article explains how to use custom methods to achieve sorting :

  • Mapping relationship implementation
  • CategoricalDtype Type implementation

<!--MORE-->

Analog data

First simulate a simple data :

import pandas as pd
import numpy as np
df = pd.DataFrame({
"nick":["aaa","bbb","aba","abc","cac","ccc"], # nickname
"math":[100,120,130,111,100,128], # mathematics
"english":[140,80,120,90,125,116], # English
"size":["S","M","L","XS","XL","L"] # Clothing size
})
df

sort_values

DataFrame.sort_values(by,
axis=0,
ascending=True,
inplace=False,
kind='quicksort',
na_position='last', # last,first; The default is last
ignore_index=False,
key=None)

The specific interpretation of the parameter is :

  • by: Indicates what field or index to sort by , It can be one or more
  • axis: Is the sorting on the horizontal or vertical axis , The default is the vertical axis axis=0
  • ascending: Whether the sorting result is in ascending or descending order , The default is ascending
  • inplace: Indicates whether the sorting result is directly modified in place on the original data or generated a new DatFrame
  • kind: Indicates the algorithm using sorting , Quick line up quicksort,, Merger mergesort, Heap sort heapsort, Stable sequencing stable , The default is : Quick line up quicksort
  • na_position: Location handling of missing values , The default is last , Another option is to first
  • ignore_index: Whether the index of the newly generated data frame is rearranged , Default False( Use the index of the original data )
  • key: Functions used before sorting

Here are a few simple examples to review sort_values Use :

Single field sorting

adopt nick Field sorting , Strings are based on letters ASCII code ; The default is ascending from small to large . Same first letter , Compare the second , Reason by analogy :

Arrange in ascending order according to the size of the values :

You can change the sorting method to descending :

Sorting multiple fields

Sorting multiple fields at the same time , Default is also ascending . When the value of the first field is the same , Then arrange them in ascending order according to the second field

Assign different sorting methods to different fields :

Then compare the two different ways completely :

Above is sort_values Methods .

Custom sort

Use sort_values Methods are used to sort by the size of the built-in alphabetic or numeric data , When you encounter the following situations , How to operate ?

When we according to the size of the clothes size Sort by , And what you get is :

Obviously, this sort of sorting is not what we expected , In our cognition :

  • XS: Very small
  • S: Small
  • M: secondary
  • L: Big
  • XL: Super large

How to solve this problem ? There are two ways :

Method 1: By mapping

1、 Find each size The value size corresponding to the order of

2、 Generate new fields order

3、 We are right. order Sort

Method 2: Use CategoricalDtype

CategoricalDtype Is a type of categorical data with a category and order , Can create our custom sort data type . Official website address :

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.CategoricalDtype.html

1、 Specify a classified data type CategoricalDtype

category_size = pd.CategoricalDtype(
['XS', 'S', 'M', 'L', 'XL'],
ordered=True)
category_size

2、 take size The field is set to the above CategoricalDtype type

3、 We're right size Use sort_values We can achieve our goal , And the above map The effect of mapping is the same

And by looking at df Data type of , We also see size The type is category:


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved