程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python based intelligent financial algorithm - feature mining data preprocessing and feature extraction design report + defense ppt+ project source code and data set

編輯:Python

Catalog
One 、 Code organization and execution sequence 3
Two 、 Problem description 3
3、 ... and 、 feature extraction 4
( One )dat_risk 4
( Two )dat_symbol 4
( 3、 ... and )dat_app 6
( Four )dat_edge 7
Four 、 Feature extraction of user association graph 13
( One ) Centrality class features 13
( Two )Louvain Community clustering 13
5、 ... and 、 Tag spread 14
( One ) Once a contact 14
( Two ) Second degree contact 15
( 3、 ... and ) Once routed 17
6、 ... and 、 Characteristic propagation 17
( One ) Feature Overview 18
( Two ) Data description 18
( 3、 ... and ) feature extraction 18
( Four ) Characteristic evaluation 18
7、 ... and 、 Data summary and characteristic comparison 18
8、 ... and 、 Feature sorting and discretization 19
Nine 、 Adjustable parameter 19
Ten 、 Test set AUC The mystery of decline 21
( One ) user app The difference in the missing rate of data 21
( Two ) The degree to which the user's associated data is closely related 21
11、 ... and 、 Improved space 22
One 、 Code organization and execution sequence
Two 、 Problem description
The explanation here is to reach a consensus , At the same time, it is also to unify the symbolic representation , The following explanations are based on this . The data about user characteristics is divided into four parts ( Sort according to the difficulty of handling , From easy to difficult ):
•(1)dat_risk
•(2)dat_symbol
•(3)dat_app
•(4)dat_edge
About user tags and training sets 、 Verification set 、 Test set data : (1)sample_train( Two : id、label) (2)valid_id( A column of : id) (3)test_id( A column of : id)
hold sample_train、valid_id、test_id Of id Splice up , Get everything id, The data name is recorded as all_id, The data format is a 28959*1 Of DataFrame.
3、 ... and 、 feature extraction
( One )dat_risk
Output is all_id_dat_risk
hold dat_risk and all_id Make internal connections :
all_id_dat_risk = pd.merge(all_id, dat_risk, on=‘id’, how=‘inner’)
The aim is to find out all_id Every one of them id Characteristics of , Of course there are id Probably not , In the final data, it is shown as missing value .
Reprinted from :http://www.biyezuopin.vip/onews.asp?id=16293


























  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved