程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python3 get quartile [box chart filter abnormal data]

編輯:Python

1、 Box figure - Four percentile

  • Filter abnormal data by box chart , You need to calculate the upper quartile and the lower quartile first , Then calculate the minimum min、 Maximum max, Obtain the threshold range for judging outliers [min, max].
  • The quartile is through 3 A little bit (Q1,Q2,Q3) Divide all the data into 4 part , Each part of it contains 25% The data of .
  • Obviously , The middle quartile Q2 That 's the median .
  • Usually , call Q1 Is the lower quartile , That is, from small to large 25% Number of numbers ; call Q3 The upper quartile is , That is, from small to large 75% Number of numbers .
  • Minimum of outlier threshold range 、 Maximum value calculation formula :min=Q1-1.5*(Q3-Q1)max=Q3+1.5*(Q3-Q1)

2、 Code implementation

Implementation logic :

  • First, get the median of the total sample Q2 And its index ;
  • And then use Q2 The index divides the total sample into two parts ;
  • Then, the median of the two equally divided parts is obtained with the same logic , namely Q1 and Q3;
  • Eventually return Q1,Q2,Q3;
  • We can use Q1,Q3 Calculate to get the minimum min、 Maximum max.
import numpy as np
import math
def do_cal_min_max(q1, q3):
""" Calculate the minimum 、 Maximum """
min = q1 - 1.5 * (q3 - q1)
max = q3 + 1.5 * (q3 - q1)
return min, max
def get_mid_idx(data):
""" Get the index of the median , If it's an even number , Is the average of the indexes of the middle two numbers """
length = len(data)
if length % 2 == 0:
idx1 = length / 2 - 1
idx2 = idx1 + 1
idx = np.mean([idx1, idx2])
else:
idx = math.ceil(length / 2)
return idx
def do_cal_quarter(data):
""" Calculate the quartile : The quartile is through 3 A little bit (Q1,Q2, Q3) Divide all the data into 4 part , Each part of it contains 25% The data of . Q1: Lower quartile , It is equal to the number of all values in the sample arranged from small to large 25% Number of numbers ; Q2: Median , It is equal to the number of all values in the sample arranged from small to large 50% Number of numbers ; Q3: Upper quartile , It is equal to the number of all values in the sample arranged from small to large 75% Number of numbers . """
# Sort from small to large 
data.sort()
# Get the median index and median first 
idx = get_mid_idx(data)
q2 = np.median(data)
# Two parts equally divided by the median index 
part1 = [v for i, v in enumerate(data) if i < idx]
part2 = [v for i, v in enumerate(data) if i > idx]
# Get lower quartile 
q1 = np.median(part1)
# Get the upper quartile 
q3 = np.median(part2)
return q1, q2, q3
def main():
""" The main function """
data = [-1, -2, -3, -4, -5]
# obtain Q1,Q2,Q3
q1, q2, q3 = do_cal_quarter(data)
print(q1, q2, q3)
# Get the minimum min、 Maximum max
min, max = do_cal_min_max(q1, q3)
print(min, max)
if __name__ == '__main__':
main()

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved