程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Data standard preprocessing collection_ Python machine learning sklearn Library

編輯:Python

List of articles

  • Data acquisition
  • ① normalization MinMaxScaler
    • 1.1 Default call
    • 1.2 Learn about properties / Parameters
  • ② Regularization Normalizer
    • 2.1 Default call
    • 2.2 Related properties / Parameters
  • ③ Standardization
    • 3.1 Default call
  • 3.2 Related properties / Parameters
  • ④ Two valued
    • 4.1 Default call
  • 4.2 Related properties / Parameters

Data acquisition


Take iris data as an example , First load the dataset .

from sklearn.datasets import load_iris
dataset = load_iris()
# print(dataset)
X = dataset.data
y = dataset.target

You can check the basic characteristics of the data

print(X)

print(y)


① normalization MinMaxScaler

1.1 Default call

from sklearn.preprocessing import MinMaxScaler
X_transformed = MinMaxScaler().fit_transform(X)
print(X_transformed)

Program execution result :

1.2 Learn about properties / Parameters

Instantiation MinMaxScaler() Relevant attributes can be passed in

MinMaxScaler(self, feature_range=(0, 1), *, copy=True, clip=False)

  • feature_range Default to tuple (0,1), Indicates the range of eigenvalues
  • copy The default is True, It means not to change the original X, This is False Houyuan X Be changed .
  • clip I don't know what attribute it is , Generally, it should not be used , Know the boss can add in the comment area .

Example

from sklearn.preprocessing import MinMaxScaler
MinMaxScaler(feature_range=(0, 0.5), copy=False).fit_transform(X)
print(X)

Program execution result :


② Regularization Normalizer

2.1 Default call

from sklearn.preprocessing import Normalizer
X_transformed = Normalizer().fit_transform(X)
print(X_transformed)

Program execution result :


2.2 Related properties / Parameters

(self, norm=‘l2’, *, copy=True)

norm The default is ’l2’( It's the letters l Not numbers 1). The available values are "l1",“l2”,“max”.

  • 'l2’ Express , The transformation mode is , Each eigenvalue , Convert to the square of the eigenvalue , The ratio of the square of all eigenvalues of the sample .
    namely
    X i ′ = X i 2 ∑ X i 2 \displaystyle X_i'=\frac{ {X_i}^2}{\sum {X_i}^2} Xi′​=∑Xi​2Xi​2​

  • 'l1’ Express , The transformation mode is , Each eigenvalue , Convert to Its ratio to the sum of absolute values of each eigenvalue of the sample .

  • 'max’ Express , The transformation mode is , Divide each eigenvalue by the largest eigenvalue in the sample .

copy ditto , That is, whether to copy . The default is True Represents replication , Replication does not change the original dataset .

from sklearn.preprocessing import Normalizer
X_transformed = Normalizer(norm='l1').fit_transform(X)
print(X_transformed)

Program execution result :


③ Standardization

3.1 Default call

from sklearn.preprocessing import StandardScaler
X_transformed = StandardScaler().fit_transform()
print(X_transformed)

Program execution result :

3.2 Related properties / Parameters

StandardScaler(self, *, copy=True, with_mean=True, with_std=True)

  • with_mean Consider the mean value
  • with_std Consider the standard deviation
  • copy Copy or not ( ditto )

④ Two valued

4.1 Default call

The default threshold is 0, Greater than 0 The data is converted to 1, Less than 0 All the data is converted into 0.

from sklearn.preprocessing import Binarizer
X_transformed = Binarizer().fit_transform(X)
print(X_transformed)

Program execution result :

4.2 Related properties / Parameters

(self, *, threshold=0.0, copy=True)

  • threshold threshold
  • copy Copy or not ( ditto )
from sklearn.preprocessing import Binarizer
X_transformed = Binarizer(threshold=3).fit_transform(X)
print(X_transformed)

Program execution result :


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved