程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Overseas Financial Risk Control Algorithm Practice (Python)

編輯:Python

一、Introduction of foreign credit status quo

Since the domestic financialP2P暴雷,The domestic many small lending institutions come into southeast Asia、Unexplored markets such as Africa,像印尼、印度、菲律賓、泰國、越南、尼日利亞等國家.

Analysis of the southeast Asia/The market characteristics of the African countries,A low financial inclusion(2017年越南有30.8%的人擁有銀行賬戶),The high demand for financial(2017Borrowing in the proportion of people49.0%)And Internet penetration(2018年為66%)And mobile connectivity,For the development of the southeast Asian financial loans of science and technology provides the most favorable conditions,Opens the savage growth mode.

Combined with the regional loan market situation,Usually credit system construction and the economy are more bad,And most of the user credentials more bad(Also does not meet the bank's loan qualifications).種種因素下,Agency for lending user credit/Fraud risk control is more bad of the,Small credit defaults generally high(As some of the institutions lending to new users bad debts rate can be up to 20~30%,And the bank bad debts usually in10%左右).

In southeast Asia to carry out microcredit products,普遍是714高炮(貸款周期7-14天,High late fee or deducted from the principal in advance when lending interest-砍頭息,Some actual annualized rates have reached300%).

High interest rates must with high risk,This business is also very vulnerable to financial regulatory policy to block.

二、Small credit risk control system is introduced

Such high debts in,If small credit institutions in the user credit lending to grasp,Even higher interest rates may not cover such a high credit risk.

可見,Risk control is the core of the small business loan losses control,Risk control system is usually made of 反欺詐(Id information to verify、人臉識別驗證、黑名單)+ Apply for grading model of.

Risk control is good or not is the key to data acquisition and accumulation of.An obvious difference reflected in,New lending institutions user defaults is20~30%(In the proportion of loan fraud fraud should be quite high),For the old users within the institutions after loan(Before a repeat of borrowing loans user)Bad debts has only4%.

也就是,For institutional users with a master borrowing history,The bad debt rate was significantly lower!Credit risk control ability differences also is actually the embodiment of the monopoly advantage data! For a small loan institutions,After marketing extension new users,How to apply the risk control model accurately assess the new users as far as possible,And to give a lower limit,When it raised quotas after I have a good credit history,To maintain and extend this part of the complex credit users is the key of the business profits.

Small loans from overseas agency to apply for grade model of main source data have:

  • Institutional history records:If use the same mobile phone number to apply for a loan number、逾期次數.In the case of covering incomplete credit system construction,Within the body(Federal agency or)Borrowing history is often the most persuasive and effective.
  • 客戶基本資料:如身份信息、聯系方式、職業、收入、Borrowing purposes, such as information.Because these data are often not online applications of artificial audit,The reliability of the information is in doubt,Usually can use various data to check these are consistent and reliable.
  • Agency's credit report:The world's three largest commercial personal credit registry giant experian respectively(Experian)、艾克發(Equifax)And al(Trans union),Can provide the loan application number、貸款額度、The information such as credit account number.But the insufficient place is,For the construction of credit system is not perfect in,覆蓋度、Records will be more bad(The project verificationExperianThe actual coverage80%左右).
  • 手機短信:Text messages can provide much valuable information,Such as phone owe、Bank card revenue expenditure、聯系人數量、Daily chat message、Agency collection SMS、Credit number of ads.Can be achieved by simple keyword matching、Word bag model method to extract the key features of,Further also can be classified by SMS、信息抽取(實體抽取)Methods such as statistical collection SMS、欠款金額、Data such as income spending amount(注:Get message data is certainly not the compliance,For agency just want more data to ensure,And with money, in a hurry for the users which also tube what private data.當前,有些APPHas been banned for SMS、通話記錄,This is also perfect with regulatory constantly.)
  • 手機通訊錄:Can be used for statistical correlation features such as number of overdue contact,And other social information;
  • APP數據:Can install statistical credit classAPP、社交類APP的數量,以及app使用率;
  • 登錄IP、GPS、Device number information:Can be used for correlation characteristics,如同一IPThe number of late,以及建立IP、設備黑名單;
  • Bank statements data:Such as salary information such as the water,Can more effectively reflect user reimbursement ability.

三、Apply for scoring model practice

3.1 Credit reporting features processing

The project is based on southeast Asia as a recent500The microfinance deal(數據源於網絡,侵刪),獲取相應Experian征信報告數據,並用PythonWork out credit reporting features of sliding window:如近30Day loan number,On average amount、Recently the loan date interval、History overdue frequency characteristics,通過LightGBMBuild application scoring model.

ExperianCredit report of the original message contains the personal basic information、Recently the loan information、信用卡、Information such as loans, such as historic performance.The following code sliding time window,提取相應的特征.

# 完整代碼請關注公眾號“算法進階”或訪問https://github.com/aialgorithm/Blog
def add_fea_grids(fea_dict, mult_datas, apply_dt='20200101', dt_key='Open_Date', calc_key="data['Amount_Past_Due']",groupfun=['count','sum', 'median','mean','max','min','std'], dt_grids=[7, 30,60,360,9999]):
    """
    Credit report using sliding time window-近N天,Processing fieldA的 計數、平均、Sum the characteristics.
    fea_dict:Final characteristics stored dictionary
    mult_datas:Multiple records value
    calc_key:The relative position of data fields 
    """
    new_fea = {} # Record the time window of the original features
    for _dt in dt_grids:
        new_fea.setdefault(_dt, [])# According to the initialization time window
    fea_suffix = calc_key.split("'")[-2]  + str(len(calc_key))   # The prefix note
    if mult_datas:
        mult_datas = con_list(mult_datas) 
        for data in mult_datas:
            if len(data[dt_key]) >=4  and  data[dt_key] < apply_dt: #Filter the records before the date of application,Report should be subject to real-time call
                for _dt in dt_grids:
                    if (_dt==9999) or (ddt.datetime.strptime(str(data[dt_key]),"%Y%m%d") >= (ddt.datetime.strptime(str(apply_dt),"%Y%m%d") + ddt.timedelta(days=-_dt))) :# Screening for nearlyN天的記錄,為9999Don't do screening
                        if "Date" in  calc_key or "Year" in   calc_key : #Determine whether to date type,Date of direct calculation for interval
                            fea_value = diff_date(apply_dt, eval(calc_key) )
                        elif "mean" in groupfun: # 判斷是否為數值型,Direct extraction to the corresponding time window
                            fea_value = to_float_ornan(eval(calc_key))
                        else:# Other type according to the characters of the process
                            fea_value = eval(calc_key)
                        new_fea[_dt].append(fea_value)  #  { 30: [2767.0, 0.0]}
    for _k, data_list in new_fea.items(): # Generate specific features
        for fun in groupfun:
            fea_name = fea_suffix+ '_'+ fun + '_' +str(_k)
            fea_dict.setdefault(fea_name, [])
            if len(data_list) > 0:
                final_value = fun_dict[fun](data_list)
            else :
                final_value = np.nan
            fea_dict[fea_name].append(final_value)
復制代碼

3.2 特征選擇

Consider the credit report privacy,This project provides only a report sample to do feature processing.Features processed selection,Associated overdue label,Form the following the final data features wide table.

df2 = pd.read_pickle('filter_feas_df.pkl')
print(df2.label.value_counts()) # Overdue label aslabel==0
df2.head()
復制代碼

3.3  模型訓練

train_x, test_x, train_y, test_y = train_test_split(df2.drop('label',axis=1),df2.label,random_state=1)
lgb = lightgbm.LGBMClassifier()
lgb.fit(train_x, train_y,
        eval_set=[(test_x,test_y)],
        eval_metric='auc',
        early_stopping_rounds=50,
        verbose=-1)
print('train ',model_metrics2(lgb,train_x, train_y))
print('test ',model_metrics2(lgb,test_x,test_y))
復制代碼

Only with the characteristics of the credit report data,Visible to the user's overdue recognition effect is generally,Test AUC僅60%左右(The follow-up or have hoped to join some messages、History of borrowing class data such as).Comprehensive analysis model of important features,主要為:

以上就是本次分享的所有內容,如果你覺得文章還不錯,歡迎關注公眾號:Python編程學習圈,每日干貨分享,內容覆蓋Python電子書、教程、數據庫編程、Django,爬蟲,雲計算等等.或是前往編程學習網,了解更多編程技術知識.


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved