程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

How to calculate correlation between all columns and remove highly correlated ones using pandas?

編輯:Python

problem :

"

I have a huge data set and prior to machine learning modeling it is always suggested that first you should remove highly correlated descriptors(columns) how can i calculate the column wice correlation and remove the column with a threshold value say remove all the columns or descriptors having >0.8 correlation.<\/i> I have a huge data set , Before machine learning modeling , It is always recommended that you delete highly relevant descriptors first ( Column ) How do I calculate column dependencies and delete columns with thresholds , For example, delete all columns or descriptors that have > 0.8 The relevance of .<\/b> also it should retained the headers in reduce data..<\/i> It should also keep the headings in the reduced data ..<\/b><\/p>

Example data set<\/i> Sample datasets <\/b><\/p>

 GA PN PC MBP GR AP 0.033 6.652 6.681 0.194 0.874 3.177 0.034 9.039 6.224 0.194 1.137 3.4 0.035 10.936 10.304 1.015 0.911 4.9 0.022 10.11 9.603 1.374 0.848 4.566 0.035 2.963 17.156 0.599 0.823 9.406 0.033 10.872 10.244 1.015 0.574 4.871 0.035 21.694 22.389 1.015 0.859 9.259 0.035 10.936 10.304 1.015 0.911 4.5 

Solution :

Reference resources : https://stackoom.com/en/question/1yuxj

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved