程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python3.7 machine learning -day3

編輯:Python

Source code

(https://github.com/MLEveryday/100-Days-Of-ML-Code.git)

explain : In the article python Most of the code comes from github( A few tests are added during learning ), The attached notes are for study notes .

Day3 Multiple linear regression

Basic steps :

Data preprocessing –> Training models on training sets –> Predicted results

Learning notes ( Including the test part )

# Day3:Multiple_Linear_Regression
# 2019.2.15
# coding=utf-8
import warnings
warnings.filterwarnings("ignore")
# Importing the libraries
# Import library 
import pandas as pd
import numpy as np
# Importing the dataset
# Import data 
dataset = pd.read_csv('C:/Users/Ymy/Desktop/100-Days-Of-ML-Code/datasets/50_Startups.csv')
# X by dataset Of the 0~3 Column (0:-1 From 0 Start to total number of columns -1 end ),Y For the first time 4 Column 
X = dataset.iloc[ : , :-1].values
Y = dataset.iloc[ : , 4 ].values
print(" Import data ")
print("X")
print(X)
print("Y")
print(Y)
print("----------------")
# Check for missing data , Since there is no missing value in the data , This step is omitted 
# Encoding Categorical data
# data classification ( For a detailed explanation, see Day1)
# 1. from sklearn.preprocessing Introduction in LabelEncoder, OneHotEncoder Two classes 
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
# 2. establish LabelEncoder Class object labelencoder, And use fit_transform Method 
labelencoder = LabelEncoder()
# Yes X No 3 Class uses fit_transform
X[: , 3] = labelencoder.fit_transform(X[ : , 3])
print("labelEncoder")
print("X")
print(X)
# 3. establish OneHotEncoder Class object onehotencoder, And use fit_transform Method 
# This unique heat code will be based on the... Of the sample 3 Column encodes the target data 
onehotencoder = OneHotEncoder(categorical_features = [3])
# take X The first 3 Column is uniquely hot coded , Turn the result into an array , And assign it to X
X = onehotencoder.fit_transform(X).toarray()
print("OneHotEncoder")
print("X")
print(X)
print("----------------")
# Avoiding Dummy Variable Trap( See the bottom for detailed explanation explain 1)
# Avoid dummy variable traps 
# Deleted X Of the 0 Column , That is, two dummy variables are used to represent 0、1、2 These three categories 
X = X[: , 1:]
print(" Avoid dummy variable traps ")
print("X")
print(X)
print("----------------")
# Splitting the dataset into the Training set and Test set
# Divide the data set ( Test set accounts for 20%)
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
print("X_train")
print(X_train)
print("X_test")
print(X_test)
print("Y_train")
print(Y_train)
print("Y_test")
print(Y_test)
print("----------------")
# Fitting Multiple Linear Regression to the Training set
# Perform multiple linear regression on the training set , Methods with Day2 Simple linear regression of 
# 1. Use sklearn.linear_model Of LinearRegression class 
from sklearn.linear_model import LinearRegression
# 2. establish LinearRegression Class object regressor, And use fit() Method 
regressor = LinearRegression()
regressor.fit(X_train, Y_train)
# Predicting the Test set results
# Use the method in the previous step , Predict test set results 
y_pred = regressor.predict(X_test)
# regression evaluation
# Regression assessment 
# Use sklearn.metrics Of r2_score
from sklearn.metrics import r2_score
''' R ^ 2( Determine the coefficient ) Regression fractional function def r2_score(y_true, y_pred, sample_weight=None,multioutput="uniform_average"): """R^2 (coefficient of determination) regression score function. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.""" The best score is 1.0, It could be negative ( Because the model can be any worse ) Always predict y Constant model of the expected value of , Ignore input characteristics , Will get R ^ 2 The score is 0.0 Parameters ---------- y_true ( Fact target value ): array-like of shape = (n_samples) or (n_samples, n_outputs) Ground truth (correct) target values. y_pred ( Estimate the target value ): array-like of shape = (n_samples) or (n_samples, n_outputs) Estimated target values. sample_weight : array-like of shape = (n_samples), optional Sample weights. multioutput : string in ['raw_values', 'uniform_average', \ 'variance_weighted'] or None or array-like of shape (n_outputs) Defines aggregating of multiple output scores. Array-like value defines weights used to average scores. Default is "uniform_average". The default is "uniform_average" 'raw_values' : Returns a full set of scores in case of multioutput input. Returns a complete set of scores in the case of multiple output inputs 'uniform_average' : Scores of all outputs are averaged with uniform weight. The scores of all outputs are uniform 'variance_weighted' : Scores of all outputs are averaged, weighted by the variances of each individual output. Average all output scores , And weighted according to the variance of each output Returns ------- z : float or ndarray of floats '''
# Calculate the regression R ^ 2 score , And the output 
print("R ^ 2 The score is :")
print(r2_score(Y_test,y_pred))

Output

Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>>
RESTART: C:\Users\Ymy\Desktop\100-Days-Of-ML-Code\study_code\Day3\Day 3.py
Import data
X
[[165349.2 136897.8 471784.1 'New York']
[162597.7 151377.59 443898.53 'California']
[153441.51 101145.55 407934.54 'Florida']
[144372.41 118671.85 383199.62 'New York']
[142107.34 91391.77 366168.42 'Florida']
[131876.9 99814.71 362861.36 'New York']
[134615.46 147198.87 127716.82 'California']
[130298.13 145530.06 323876.68 'Florida']
[120542.52 148718.95 311613.29 'New York']
[123334.88 108679.17 304981.62 'California']
[101913.08 110594.11 229160.95 'Florida']
[100671.96 91790.61 249744.55 'California']
[93863.75 127320.38 249839.44 'Florida']
[91992.39 135495.07 252664.93 'California']
[119943.24 156547.42 256512.92 'Florida']
[114523.61 122616.84 261776.23 'New York']
[78013.11 121597.55 264346.06 'California']
[94657.16 145077.58 282574.31 'New York']
[91749.16 114175.79 294919.57 'Florida']
[86419.7 153514.11 0.0 'New York']
[76253.86 113867.3 298664.47 'California']
[78389.47 153773.43 299737.29 'New York']
[73994.56 122782.75 303319.26 'Florida']
[67532.53 105751.03 304768.73 'Florida']
[77044.01 99281.34 140574.81 'New York']
[64664.71 139553.16 137962.62 'California']
[75328.87 144135.98 134050.07 'Florida']
[72107.6 127864.55 353183.81 'New York']
[66051.52 182645.56 118148.2 'Florida']
[65605.48 153032.06 107138.38 'New York']
[61994.48 115641.28 91131.24 'Florida']
[61136.38 152701.92 88218.23 'New York']
[63408.86 129219.61 46085.25 'California']
[55493.95 103057.49 214634.81 'Florida']
[46426.07 157693.92 210797.67 'California']
[46014.02 85047.44 205517.64 'New York']
[28663.76 127056.21 201126.82 'Florida']
[44069.95 51283.14 197029.42 'California']
[20229.59 65947.93 185265.1 'New York']
[38558.51 82982.09 174999.3 'California']
[28754.33 118546.05 172795.67 'California']
[27892.92 84710.77 164470.71 'Florida']
[23640.93 96189.63 148001.11 'California']
[15505.73 127382.3 35534.17 'New York']
[22177.74 154806.14 28334.72 'California']
[1000.23 124153.04 1903.93 'New York']
[1315.46 115816.21 297114.46 'Florida']
[0.0 135426.92 0.0 'California']
[542.05 51743.15 0.0 'New York']
[0.0 116983.8 45173.06 'California']]
Y
[192261.83 191792.06 191050.39 182901.99 166187.94 156991.12 156122.51
155752.6 152211.77 149759.96 146121.95 144259.4 141585.52 134307.35
132602.65 129917.04 126992.93 125370.37 124266.9 122776.86 118474.03
111313.02 110352.25 108733.99 108552.04 107404.34 105733.54 105008.31
103282.38 101004.64 99937.59 97483.56 97427.84 96778.92 96712.8
96479.51 90708.19 89949.14 81229.06 81005.76 78239.91 77798.83
71498.49 69758.98 65200.33 64926.08 49490.75 42559.73 35673.41
14681.4 ]
----------------
labelEncoder
X
[[165349.2 136897.8 471784.1 2]
[162597.7 151377.59 443898.53 0]
[153441.51 101145.55 407934.54 1]
[144372.41 118671.85 383199.62 2]
[142107.34 91391.77 366168.42 1]
[131876.9 99814.71 362861.36 2]
[134615.46 147198.87 127716.82 0]
[130298.13 145530.06 323876.68 1]
[120542.52 148718.95 311613.29 2]
[123334.88 108679.17 304981.62 0]
[101913.08 110594.11 229160.95 1]
[100671.96 91790.61 249744.55 0]
[93863.75 127320.38 249839.44 1]
[91992.39 135495.07 252664.93 0]
[119943.24 156547.42 256512.92 1]
[114523.61 122616.84 261776.23 2]
[78013.11 121597.55 264346.06 0]
[94657.16 145077.58 282574.31 2]
[91749.16 114175.79 294919.57 1]
[86419.7 153514.11 0.0 2]
[76253.86 113867.3 298664.47 0]
[78389.47 153773.43 299737.29 2]
[73994.56 122782.75 303319.26 1]
[67532.53 105751.03 304768.73 1]
[77044.01 99281.34 140574.81 2]
[64664.71 139553.16 137962.62 0]
[75328.87 144135.98 134050.07 1]
[72107.6 127864.55 353183.81 2]
[66051.52 182645.56 118148.2 1]
[65605.48 153032.06 107138.38 2]
[61994.48 115641.28 91131.24 1]
[61136.38 152701.92 88218.23 2]
[63408.86 129219.61 46085.25 0]
[55493.95 103057.49 214634.81 1]
[46426.07 157693.92 210797.67 0]
[46014.02 85047.44 205517.64 2]
[28663.76 127056.21 201126.82 1]
[44069.95 51283.14 197029.42 0]
[20229.59 65947.93 185265.1 2]
[38558.51 82982.09 174999.3 0]
[28754.33 118546.05 172795.67 0]
[27892.92 84710.77 164470.71 1]
[23640.93 96189.63 148001.11 0]
[15505.73 127382.3 35534.17 2]
[22177.74 154806.14 28334.72 0]
[1000.23 124153.04 1903.93 2]
[1315.46 115816.21 297114.46 1]
[0.0 135426.92 0.0 0]
[542.05 51743.15 0.0 2]
[0.0 116983.8 45173.06 0]]
Warning (from warnings module):
File "D:\python\lib\site-packages\sklearn\preprocessing\_encoders.py", line 390
"use the ColumnTransformer instead.", DeprecationWarning)
DeprecationWarning: The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
OneHotEncoder
X
[[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.6534920e+05 1.3689780e+05
4.7178410e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 1.6259770e+05 1.5137759e+05
4.4389853e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.5344151e+05 1.0114555e+05
4.0793454e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.4437241e+05 1.1867185e+05
3.8319962e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.4210734e+05 9.1391770e+04
3.6616842e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.3187690e+05 9.9814710e+04
3.6286136e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 1.3461546e+05 1.4719887e+05
1.2771682e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.3029813e+05 1.4553006e+05
3.2387668e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.2054252e+05 1.4871895e+05
3.1161329e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 1.2333488e+05 1.0867917e+05
3.0498162e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.0191308e+05 1.1059411e+05
2.2916095e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 1.0067196e+05 9.1790610e+04
2.4974455e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 9.3863750e+04 1.2732038e+05
2.4983944e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 9.1992390e+04 1.3549507e+05
2.5266493e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.1994324e+05 1.5654742e+05
2.5651292e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.1452361e+05 1.2261684e+05
2.6177623e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 7.8013110e+04 1.2159755e+05
2.6434606e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 9.4657160e+04 1.4507758e+05
2.8257431e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 9.1749160e+04 1.1417579e+05
2.9491957e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 8.6419700e+04 1.5351411e+05
0.0000000e+00]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 7.6253860e+04 1.1386730e+05
2.9866447e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 7.8389470e+04 1.5377343e+05
2.9973729e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 7.3994560e+04 1.2278275e+05
3.0331926e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 6.7532530e+04 1.0575103e+05
3.0476873e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 7.7044010e+04 9.9281340e+04
1.4057481e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 6.4664710e+04 1.3955316e+05
1.3796262e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 7.5328870e+04 1.4413598e+05
1.3405007e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 7.2107600e+04 1.2786455e+05
3.5318381e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 6.6051520e+04 1.8264556e+05
1.1814820e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 6.5605480e+04 1.5303206e+05
1.0713838e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 6.1994480e+04 1.1564128e+05
9.1131240e+04]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 6.1136380e+04 1.5270192e+05
8.8218230e+04]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 6.3408860e+04 1.2921961e+05
4.6085250e+04]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 5.5493950e+04 1.0305749e+05
2.1463481e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 4.6426070e+04 1.5769392e+05
2.1079767e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 4.6014020e+04 8.5047440e+04
2.0551764e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 2.8663760e+04 1.2705621e+05
2.0112682e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 4.4069950e+04 5.1283140e+04
1.9702942e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 2.0229590e+04 6.5947930e+04
1.8526510e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 3.8558510e+04 8.2982090e+04
1.7499930e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 2.8754330e+04 1.1854605e+05
1.7279567e+05]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 2.7892920e+04 8.4710770e+04
1.6447071e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 2.3640930e+04 9.6189630e+04
1.4800111e+05]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.5505730e+04 1.2738230e+05
3.5534170e+04]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 2.2177740e+04 1.5480614e+05
2.8334720e+04]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 1.0002300e+03 1.2415304e+05
1.9039300e+03]
[0.0000000e+00 1.0000000e+00 0.0000000e+00 1.3154600e+03 1.1581621e+05
2.9711446e+05]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3542692e+05
0.0000000e+00]
[0.0000000e+00 0.0000000e+00 1.0000000e+00 5.4205000e+02 5.1743150e+04
0.0000000e+00]
[1.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.1698380e+05
4.5173060e+04]]
----------------
Avoid dummy variable traps
X
[[0.0000000e+00 1.0000000e+00 1.6534920e+05 1.3689780e+05 4.7178410e+05]
[0.0000000e+00 0.0000000e+00 1.6259770e+05 1.5137759e+05 4.4389853e+05]
[1.0000000e+00 0.0000000e+00 1.5344151e+05 1.0114555e+05 4.0793454e+05]
[0.0000000e+00 1.0000000e+00 1.4437241e+05 1.1867185e+05 3.8319962e+05]
[1.0000000e+00 0.0000000e+00 1.4210734e+05 9.1391770e+04 3.6616842e+05]
[0.0000000e+00 1.0000000e+00 1.3187690e+05 9.9814710e+04 3.6286136e+05]
[0.0000000e+00 0.0000000e+00 1.3461546e+05 1.4719887e+05 1.2771682e+05]
[1.0000000e+00 0.0000000e+00 1.3029813e+05 1.4553006e+05 3.2387668e+05]
[0.0000000e+00 1.0000000e+00 1.2054252e+05 1.4871895e+05 3.1161329e+05]
[0.0000000e+00 0.0000000e+00 1.2333488e+05 1.0867917e+05 3.0498162e+05]
[1.0000000e+00 0.0000000e+00 1.0191308e+05 1.1059411e+05 2.2916095e+05]
[0.0000000e+00 0.0000000e+00 1.0067196e+05 9.1790610e+04 2.4974455e+05]
[1.0000000e+00 0.0000000e+00 9.3863750e+04 1.2732038e+05 2.4983944e+05]
[0.0000000e+00 0.0000000e+00 9.1992390e+04 1.3549507e+05 2.5266493e+05]
[1.0000000e+00 0.0000000e+00 1.1994324e+05 1.5654742e+05 2.5651292e+05]
[0.0000000e+00 1.0000000e+00 1.1452361e+05 1.2261684e+05 2.6177623e+05]
[0.0000000e+00 0.0000000e+00 7.8013110e+04 1.2159755e+05 2.6434606e+05]
[0.0000000e+00 1.0000000e+00 9.4657160e+04 1.4507758e+05 2.8257431e+05]
[1.0000000e+00 0.0000000e+00 9.1749160e+04 1.1417579e+05 2.9491957e+05]
[0.0000000e+00 1.0000000e+00 8.6419700e+04 1.5351411e+05 0.0000000e+00]
[0.0000000e+00 0.0000000e+00 7.6253860e+04 1.1386730e+05 2.9866447e+05]
[0.0000000e+00 1.0000000e+00 7.8389470e+04 1.5377343e+05 2.9973729e+05]
[1.0000000e+00 0.0000000e+00 7.3994560e+04 1.2278275e+05 3.0331926e+05]
[1.0000000e+00 0.0000000e+00 6.7532530e+04 1.0575103e+05 3.0476873e+05]
[0.0000000e+00 1.0000000e+00 7.7044010e+04 9.9281340e+04 1.4057481e+05]
[0.0000000e+00 0.0000000e+00 6.4664710e+04 1.3955316e+05 1.3796262e+05]
[1.0000000e+00 0.0000000e+00 7.5328870e+04 1.4413598e+05 1.3405007e+05]
[0.0000000e+00 1.0000000e+00 7.2107600e+04 1.2786455e+05 3.5318381e+05]
[1.0000000e+00 0.0000000e+00 6.6051520e+04 1.8264556e+05 1.1814820e+05]
[0.0000000e+00 1.0000000e+00 6.5605480e+04 1.5303206e+05 1.0713838e+05]
[1.0000000e+00 0.0000000e+00 6.1994480e+04 1.1564128e+05 9.1131240e+04]
[0.0000000e+00 1.0000000e+00 6.1136380e+04 1.5270192e+05 8.8218230e+04]
[0.0000000e+00 0.0000000e+00 6.3408860e+04 1.2921961e+05 4.6085250e+04]
[1.0000000e+00 0.0000000e+00 5.5493950e+04 1.0305749e+05 2.1463481e+05]
[0.0000000e+00 0.0000000e+00 4.6426070e+04 1.5769392e+05 2.1079767e+05]
[0.0000000e+00 1.0000000e+00 4.6014020e+04 8.5047440e+04 2.0551764e+05]
[1.0000000e+00 0.0000000e+00 2.8663760e+04 1.2705621e+05 2.0112682e+05]
[0.0000000e+00 0.0000000e+00 4.4069950e+04 5.1283140e+04 1.9702942e+05]
[0.0000000e+00 1.0000000e+00 2.0229590e+04 6.5947930e+04 1.8526510e+05]
[0.0000000e+00 0.0000000e+00 3.8558510e+04 8.2982090e+04 1.7499930e+05]
[0.0000000e+00 0.0000000e+00 2.8754330e+04 1.1854605e+05 1.7279567e+05]
[1.0000000e+00 0.0000000e+00 2.7892920e+04 8.4710770e+04 1.6447071e+05]
[0.0000000e+00 0.0000000e+00 2.3640930e+04 9.6189630e+04 1.4800111e+05]
[0.0000000e+00 1.0000000e+00 1.5505730e+04 1.2738230e+05 3.5534170e+04]
[0.0000000e+00 0.0000000e+00 2.2177740e+04 1.5480614e+05 2.8334720e+04]
[0.0000000e+00 1.0000000e+00 1.0002300e+03 1.2415304e+05 1.9039300e+03]
[1.0000000e+00 0.0000000e+00 1.3154600e+03 1.1581621e+05 2.9711446e+05]
[0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3542692e+05 0.0000000e+00]
[0.0000000e+00 1.0000000e+00 5.4205000e+02 5.1743150e+04 0.0000000e+00]
[0.0000000e+00 0.0000000e+00 0.0000000e+00 1.1698380e+05 4.5173060e+04]]
----------------
X_train
[[1.0000000e+00 0.0000000e+00 5.5493950e+04 1.0305749e+05 2.1463481e+05]
[0.0000000e+00 1.0000000e+00 4.6014020e+04 8.5047440e+04 2.0551764e+05]
[1.0000000e+00 0.0000000e+00 7.5328870e+04 1.4413598e+05 1.3405007e+05]
[0.0000000e+00 0.0000000e+00 4.6426070e+04 1.5769392e+05 2.1079767e+05]
[1.0000000e+00 0.0000000e+00 9.1749160e+04 1.1417579e+05 2.9491957e+05]
[1.0000000e+00 0.0000000e+00 1.3029813e+05 1.4553006e+05 3.2387668e+05]
[1.0000000e+00 0.0000000e+00 1.1994324e+05 1.5654742e+05 2.5651292e+05]
[0.0000000e+00 1.0000000e+00 1.0002300e+03 1.2415304e+05 1.9039300e+03]
[0.0000000e+00 1.0000000e+00 5.4205000e+02 5.1743150e+04 0.0000000e+00]
[0.0000000e+00 1.0000000e+00 6.5605480e+04 1.5303206e+05 1.0713838e+05]
[0.0000000e+00 1.0000000e+00 1.1452361e+05 1.2261684e+05 2.6177623e+05]
[1.0000000e+00 0.0000000e+00 6.1994480e+04 1.1564128e+05 9.1131240e+04]
[0.0000000e+00 0.0000000e+00 6.3408860e+04 1.2921961e+05 4.6085250e+04]
[0.0000000e+00 0.0000000e+00 7.8013110e+04 1.2159755e+05 2.6434606e+05]
[0.0000000e+00 0.0000000e+00 2.3640930e+04 9.6189630e+04 1.4800111e+05]
[0.0000000e+00 0.0000000e+00 7.6253860e+04 1.1386730e+05 2.9866447e+05]
[0.0000000e+00 1.0000000e+00 1.5505730e+04 1.2738230e+05 3.5534170e+04]
[0.0000000e+00 1.0000000e+00 1.2054252e+05 1.4871895e+05 3.1161329e+05]
[0.0000000e+00 0.0000000e+00 9.1992390e+04 1.3549507e+05 2.5266493e+05]
[0.0000000e+00 0.0000000e+00 6.4664710e+04 1.3955316e+05 1.3796262e+05]
[0.0000000e+00 1.0000000e+00 1.3187690e+05 9.9814710e+04 3.6286136e+05]
[0.0000000e+00 1.0000000e+00 9.4657160e+04 1.4507758e+05 2.8257431e+05]
[0.0000000e+00 0.0000000e+00 2.8754330e+04 1.1854605e+05 1.7279567e+05]
[0.0000000e+00 0.0000000e+00 0.0000000e+00 1.1698380e+05 4.5173060e+04]
[0.0000000e+00 0.0000000e+00 1.6259770e+05 1.5137759e+05 4.4389853e+05]
[1.0000000e+00 0.0000000e+00 9.3863750e+04 1.2732038e+05 2.4983944e+05]
[0.0000000e+00 0.0000000e+00 4.4069950e+04 5.1283140e+04 1.9702942e+05]
[0.0000000e+00 1.0000000e+00 7.7044010e+04 9.9281340e+04 1.4057481e+05]
[0.0000000e+00 0.0000000e+00 1.3461546e+05 1.4719887e+05 1.2771682e+05]
[1.0000000e+00 0.0000000e+00 6.7532530e+04 1.0575103e+05 3.0476873e+05]
[1.0000000e+00 0.0000000e+00 2.8663760e+04 1.2705621e+05 2.0112682e+05]
[0.0000000e+00 1.0000000e+00 7.8389470e+04 1.5377343e+05 2.9973729e+05]
[0.0000000e+00 1.0000000e+00 8.6419700e+04 1.5351411e+05 0.0000000e+00]
[0.0000000e+00 0.0000000e+00 1.2333488e+05 1.0867917e+05 3.0498162e+05]
[0.0000000e+00 0.0000000e+00 3.8558510e+04 8.2982090e+04 1.7499930e+05]
[1.0000000e+00 0.0000000e+00 1.3154600e+03 1.1581621e+05 2.9711446e+05]
[0.0000000e+00 1.0000000e+00 1.4437241e+05 1.1867185e+05 3.8319962e+05]
[0.0000000e+00 1.0000000e+00 1.6534920e+05 1.3689780e+05 4.7178410e+05]
[0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3542692e+05 0.0000000e+00]
[0.0000000e+00 0.0000000e+00 2.2177740e+04 1.5480614e+05 2.8334720e+04]]
X_test
[[1.0000000e+00 0.0000000e+00 6.6051520e+04 1.8264556e+05 1.1814820e+05]
[0.0000000e+00 0.0000000e+00 1.0067196e+05 9.1790610e+04 2.4974455e+05]
[1.0000000e+00 0.0000000e+00 1.0191308e+05 1.1059411e+05 2.2916095e+05]
[1.0000000e+00 0.0000000e+00 2.7892920e+04 8.4710770e+04 1.6447071e+05]
[1.0000000e+00 0.0000000e+00 1.5344151e+05 1.0114555e+05 4.0793454e+05]
[0.0000000e+00 1.0000000e+00 7.2107600e+04 1.2786455e+05 3.5318381e+05]
[0.0000000e+00 1.0000000e+00 2.0229590e+04 6.5947930e+04 1.8526510e+05]
[0.0000000e+00 1.0000000e+00 6.1136380e+04 1.5270192e+05 8.8218230e+04]
[1.0000000e+00 0.0000000e+00 7.3994560e+04 1.2278275e+05 3.0331926e+05]
[1.0000000e+00 0.0000000e+00 1.4210734e+05 9.1391770e+04 3.6616842e+05]]
Y_train
[ 96778.92 96479.51 105733.54 96712.8 124266.9 155752.6 132602.65
64926.08 35673.41 101004.64 129917.04 99937.59 97427.84 126992.93
71498.49 118474.03 69758.98 152211.77 134307.35 107404.34 156991.12
125370.37 78239.91 14681.4 191792.06 141585.52 89949.14 108552.04
156122.51 108733.99 90708.19 111313.02 122776.86 149759.96 81005.76
49490.75 182901.99 192261.83 42559.73 65200.33]
Y_test
[103282.38 144259.4 146121.95 77798.83 191050.39 105008.31 81229.06
97483.56 110352.25 166187.94]
----------------
R ^ 2 The score is :
0.9347068473282989

Description and reference

explain 1:

Dummy variable (Dummy Variable) and Virtual variable trap (Dummy Variable Regression)

 Dummy variables are also called dummy variables 、 Nominal variable or dummy variable , An artificial variable used to reflect qualitative attributes , It's a quantified qualitative variable , Usually the value is 0 or 1.
The introduction of dummy variables can make the linear regression model more complex , But the description of the problem is more concise , One equation can achieve the function of two equations , And close to reality .

for example , The dummy variable reflecting the degree of literacy can be taken as :1: Bachelor degree ;0: Non bachelor degree

In a general way , In the setting of virtual variables : The base type 、 The value of positive type is 1; Comparison of type , The negative type value is 0.

The virtual variable trap refers to the requirement, if any, when introducing a virtual variable m Qualitative variables , Introduce... Into the model m-1 Virtual variables . otherwise , If you introduce m Virtual variables , It will lead to complete collinearity between the explanatory variables of the model .

We generally call the problem that the model cannot estimate because the number of dummy variables is the same as the number of qualitative factors , be called " Virtual variable trap ".

As defined above : In the above tests X Of the 3 Column data item -State It has carried out label coding and single heat coding , among , The tag encoding will be three variables New York、California、Florida The codes are respectively 2、0、1. And then again X This column of data is uniquely encoded , take 2、0、1 Encoded as 001、100、010. And at this time 3 A variable New York、California、Florida, It's introduced 3 Virtual variables , Belongs to the virtual variable trap . So in Avoiding Dummy Variable Trap in , Deleted the first column of the unique heat code , That is to use 01、00、10 respectively New York、California、Florida, Avoid the virtual variable trap .

explain 2: Because it is just beginning to learn , Therefore, the test and corresponding output are more detailed ( The data with a large number of rows has been expanded ).

Reference :
sklearn-metrics-r2-score
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn-metrics-r2-score


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved