您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

AdaBoost algorithm principle based on single-layer decision tree +python implementation

編輯：Python

Here, I'll sort out the realization of the experimental class A weak classifier based on single-layer decision tree Of AdaBoost Algorithm .

Because I'm a beginner , When looking for materials in the experiment class, I saw too many English abbreviations in other people's code , It's not easy to understand , But also look at the details of the code implementation 、 The principle of the algorithm , The experience was very bad .

So there is no abbreviation in the code here , Try to write your thoughts clearly .

Basic concepts

Integrated learning ： By combining multiple base classifiers （base classifier） To complete the learning task , Base classifiers generally use weak learners .
Weak learner ： Only learning accuracy is only slightly better than random guessing . Through an integrated approach , Can be combined into a strong learner .
Bagging and Boosting： There are two main methods of ensemble learning to assemble weak classifiers into strong classifiers .
AdaBoost yes adaptive boosting Abbreviation .
Bagging： The training set is extracted from the original data set , Each training set is independent . One model at a time using one training set . The classification problem makes these models vote to get the classification results , The regression problem is the mean of these models .
Boosting： The training set of each round is the same , It's just that the weight of each sample in the training set changes in the classifier , And the weight distribution of samples in each round depends on the classification results of the previous round . Each weak classifier also has its own weight , The classifier with small classification error has greater weight .
for instance ：Bagging+ Decision tree = Random forests ;Boosting+ Decision tree = Ascension tree
AdaBoost： Calculate the sample weight -> Calculate the error rate -> Calculate the weight of the weak learner -> Update sample weight ... repeat .

principle + Code implementation

Is to construct a weak classifier , And then assembled according to certain methods .

from numpy import *

Load data set

The data set given in the experiment class is used (horseColic Equine hernia data set )... I won't say much about this part . belt _array The suffix is a list , belt _matrix The suffix is the corresponding array Become np matrix .

def load_dataset(filename):
num_feature = len(open(filename).readline().split('\t'))
data_array = []
label_array = []
f = open(filename)
for line in f.readlines():
line_array = []
current_line = line.strip().split('\t')
for i in range(num_feature - 1):
line_array.append(float(current_line[i]))
data_array.append(line_array)
label_array.append(float(current_line[-1]))
return data_array, label_array

A weak classifier is constructed based on a single level decision tree

1、 Nonsense Literature ,stump A tree stump , A single-layer decision tree is a decision tree with only one layer , That is, simply by being greater than or less than the threshold （threshold） Divide the data into two piles . The data on the threshold side is classified into categories -1, The other side is divided into categories 1.

def stump_classify(data_matrix, dimension, threshold, inequality_sign):
# dimension It's data set number two dimension Dimension characteristics , Corresponding data_matrix Of the dimension Column
return_array = ones((shape(data_matrix)[0], 1))
if inequality_sign == 'less_than':
return_array[data_matrix[:, dimension] <= threshold] = -1.0
else:
return_array[data_matrix[:, dimension] > threshold] = -1.0
return return_array

2、 Traverse above stump_classify Function all possible input values , Find the best single-layer decision tree （ Given the weight vector d The best single-layer decision tree ）.best_stump Is the best single-layer decision tree , A dictionary , Which stores the single-layer decision tree 'dimension'（ The last code comment ）、'threshold'（ threshold ）、'inequality_sign'（ Unequal sign ,less_than and greater_than）.

Combine the code and comments .

def build_stump(data_array, class_labels, d):
data_matrix = mat(data_array)
label_matrix = mat(class_labels).T
m, n = shape(data_matrix)
num_steps = 10.0
best_stump = {} # Used to store the given weight vector d The relevant information of the best single-layer decision tree
best_class_estimate = mat(zeros((m, 1)))
'''
The minimum error rate min_error Set to inf
For every feature in the dataset （ Layer 1 cycle ）：
For each step （ The second cycle ）：
For every inequality sign （ The third cycle ）：
Build a single-layer decision tree and test it with weighted data set
If the error rate is lower than min_error, Then set the current single-layer decision tree as the best single-layer decision tree
Return to the best single-layer decision tree
'''
min_error = inf
for i in range(n):
range_min = data_matrix[:, i].min()
range_max = data_matrix[:, i].max()
step_size = (range_max - range_min) / num_steps
for j in range(-1, int(num_steps) + 1):
for inequality_sign in ['less_than', 'greater_than']:
threshold = (range_min + float(j) * step_size)
predicted_values = stump_classify(data_matrix, i, threshold, inequality_sign) # call stump_classify Function to get the predicted result
error_array = mat(ones((m, 1))) # This is a column vector with the same length as the classification prediction result , If predicted_values The value in is not equal to label_matrix Tag value in , The corresponding position is 1, Otherwise 0.
error_array[predicted_values == label_matrix] = 0
weighted_error = d.T * error_array # This is the weighted error rate , This will be used to update the weight vector d
if weighted_error < min_error:
min_error = weighted_error
best_class_estimate = predicted_values.copy()
best_stump['dimension'] = i
best_stump['threshold'] = threshold
best_stump['inequality_sign'] = inequality_sign
return best_stump, min_error, best_class_estimate

Weighted error rate weighted_error Is with the AdaBoost The place of interaction , Each time it is used to update the weight vector d.

Adaboost Training process and classification

Training

The return value of the training is ： weak_classifier_array, aggregate_class_estimate, An array storing many weak classifiers and the class estimate of ensemble learning .

So the idea is , For each iteration ： utilize build_stump Find the best single-layer decision tree ; Put the best single-layer decision tree into the weak classifier array ; Using the error rate to calculate the weight of the weak classifier α; Update weight vector d; Update category estimates for integrated learning ; Update error rate .

The formula that appears inside ：

Error rate ：
Using the error rate to calculate the weight of the weak classifier α：

Visible error rate ε The higher the , The smaller the weight of this classifier . In the code ：

alpha = float(0.5 * log((1.0 - error) / max([error, 1e-16])))

（ notes ：max The parentheses are added to the function because before from numpy import * 了 , This max yes np.amax, Without brackets, there will be a warning.）

Update weight vector d： This is to make the wrong samples focus on learning in the next study .

The first i The correct classification of samples is
,Zt yes sum(D)

Let's take a look at the above formula , Programming is aimed at simplifying the merged formula ：

In the code ：

exp The index of ：exponent = multiply(-1 * alpha * mat(class_labels).T, class_estimate)

d：d = multiply(d, exp(exponent)) and d = d / d.sum()

Category estimates for ensemble learning （aggregate_class_estimate） Is to put The estimate given by each weak classifier according to The weight of each weak classifier Add up . The code is in a loop ：aggregate_class_estimate += alpha * class_estimate

It should be clear to put the code directly below ：

def ada_boost_train_decision_stump(data_array, class_labels, num_iteration=40):
weak_classifier_array = []
m = shape(data_array)[0]
d = mat(ones((m, 1)) / m)
aggregate_class_estimate = mat(zeros((m, 1)))
for i in range(num_iteration):
best_stump, error, class_estimate = build_stump(data_array, class_labels, d) # utilize build_stump Find the best single-layer decision tree
alpha = float(0.5 * log((1.0 - error) / max([error, 1e-16]))) # count α
best_stump['alpha'] = alpha
weak_classifier_array.append(best_stump) # Throw the best single-layer decision tree into the weak classifier array
exponent = multiply(-1 * alpha * mat(class_labels).T, class_estimate) # count d
d = multiply(d, exp(exponent))
d = d / d.sum()
aggregate_class_estimate += alpha * class_estimate # Calculate category estimates for ensemble learning
aggregate_errors = multiply(sign(aggregate_class_estimate) != mat(class_labels).T, ones((m, 1)))
error_rate = aggregate_errors.sum() / m
if error_rate == 0.0: # If the error rate is 0 Exit the loop
break
return weak_classifier_array, aggregate_class_estimate

classification

Just write a loop , Put the weak classifier array obtained above Each weak classifier in it is put into stump_classify Run through the function , Add up the results according to the weight of each .

def ada_boost_classify(data_array, classifier_array):
data_matrix = mat(data_array)
m = shape(data_matrix)[0]
aggregate_class_estimate = mat(zeros((m, 1)))
for i in range(len(classifier_array)):
class_estimate = stump_classify(data_matrix,
classifier_array[i]['dimension'],
classifier_array[i]['threshold'],
classifier_array[i]['inequality_sign'])
aggregate_class_estimate += classifier_array[i]['alpha'] * class_estimate
return sign(aggregate_class_estimate)

The above is the implementation process . Next, put more regular pseudo code ：

Complete code

from numpy import *
def load_dataset(filename):
num_feature = len(open(filename).readline().split('\t'))
data_array = []
label_array = []
f = open(filename)
for line in f.readlines():
line_array = []
current_line = line.strip().split('\t')
for i in range(num_feature - 1):
line_array.append(float(current_line[i]))
data_array.append(line_array)
label_array.append(float(current_line[-1]))
return data_array, label_array
def stump_classify(data_matrix, dimension, threshold, inequality_sign):
return_array = ones((shape(data_matrix)[0], 1))
if inequality_sign == 'less_than':
return_array[data_matrix[:, dimension] <= threshold] = -1.0
else:
return_array[data_matrix[:, dimension] > threshold] = -1.0
return return_array
def build_stump(data_array, class_labels, d):
data_matrix = mat(data_array)
label_matrix = mat(class_labels).T
m, n = shape(data_matrix)
num_steps = 10.0
best_stump = {}
best_class_estimate = mat(zeros((m, 1)))
min_error = inf
for i in range(n):
range_min = data_matrix[:, i].min()
range_max = data_matrix[:, i].max()
step_size = (range_max - range_min) / num_steps
for j in range(-1, int(num_steps) + 1):
for inequality_sign in ['less_than', 'greater_than']:
threshold = (range_min + float(j) * step_size)
predicted_values = stump_classify(data_matrix, i, threshold, inequality_sign)
error_array = mat(ones((m, 1)))
error_array[predicted_values == label_matrix] = 0
weighted_error = d.T * error_array
if weighted_error < min_error:
min_error = weighted_error
best_class_estimate = predicted_values.copy()
best_stump['dimension'] = i
best_stump['threshold'] = threshold
best_stump['inequality_sign'] = inequality_sign
return best_stump, min_error, best_class_estimate
def ada_boost_train_decision_stump(data_array, class_labels, num_iteration=40):
weak_classifier_array = []
m = shape(data_array)[0]
d = mat(ones((m, 1)) / m)
aggregate_class_estimate = mat(zeros((m, 1)))
for i in range(num_iteration):
best_stump, error, class_estimate = build_stump(data_array, class_labels, d)
alpha = float(0.5 * log((1.0 - error) / max([error, 1e-16])))
best_stump['alpha'] = alpha
weak_classifier_array.append(best_stump)
exponent = multiply(-1 * alpha * mat(class_labels).T, class_estimate)
d = multiply(d, exp(exponent))
d = d / d.sum()
aggregate_class_estimate += alpha * class_estimate
aggregate_errors = multiply(sign(aggregate_class_estimate) != mat(class_labels).T, ones((m, 1)))
error_rate = aggregate_errors.sum() / m
if error_rate == 0.0:
break
return weak_classifier_array, aggregate_class_estimate
def ada_boost_classify(data_array, classifier_array):
data_matrix = mat(data_array)
m = shape(data_matrix)[0]
aggregate_class_estimate = mat(zeros((m, 1)))
for i in range(len(classifier_array)):
class_estimate = stump_classify(data_matrix,
classifier_array[i]['dimension'],
classifier_array[i]['threshold'],
classifier_array[i]['inequality_sign'])
aggregate_class_estimate += classifier_array[i]['alpha'] * class_estimate
return sign(aggregate_class_estimate)
def main():
data_array, label_array = load_dataset('AdaBoost/horseColicTraining2.txt')
weak_classier_array, aggregate_class_estimate = ada_boost_train_decision_stump(data_array, label_array, 40)
test_array, test_label_array = load_dataset('AdaBoost/horseColicTest2.txt')
print(weak_classier_array)
predictions = ada_boost_classify(data_array, weak_classier_array)
error_array = mat(ones((len(data_array), 1)))
print('error rate of training set: %.3f%%'
% float(error_array[predictions != mat(label_array).T].sum() / len(data_array) * 100))
predictions = ada_boost_classify(test_array, weak_classier_array)
error_array = mat(ones((len(test_array), 1)))
print('error rate of test set: %.3f%%'
% float(error_array[predictions != mat(test_label_array).T].sum() / len(test_array) * 100))
if __name__ == '__main__':
main()

The output is ： The previous is the trained weak classifier , Then there is the error rate .

[{'dimension': 9, 'threshold': 3.0, 'inequality_sign': 'greater_than', 'alpha': 0.4616623792657674}, {'dimension': 17, 'threshold': 52.5, 'inequality_sign': 'greater_than', 'alpha': 0.31248245042467104}, {'dimension': 3, 'threshold': 55.199999999999996, 'inequality_sign': 'greater_than', 'alpha': 0.2868097320169572}, {'dimension': 18, 'threshold': 62.300000000000004, 'inequality_sign': 'less_than', 'alpha': 0.23297004638939556}, {'dimension': 10, 'threshold': 0.0, 'inequality_sign': 'less_than', 'alpha': 0.19803846151213728}, {'dimension': 5, 'threshold': 2.0, 'inequality_sign': 'greater_than', 'alpha': 0.1884788734902058}, {'dimension': 12, 'threshold': 1.2, 'inequality_sign': 'less_than', 'alpha': 0.1522736899747682}, {'dimension': 7, 'threshold': 1.2, 'inequality_sign': 'greater_than', 'alpha': 0.15510870821690584}, {'dimension': 5, 'threshold': 0.0, 'inequality_sign': 'less_than', 'alpha': 0.1353619735335944}, {'dimension': 4, 'threshold': 28.799999999999997, 'inequality_sign': 'less_than', 'alpha': 0.12521587326132164}, {'dimension': 11, 'threshold': 2.0, 'inequality_sign': 'greater_than', 'alpha': 0.13347648128207715}, {'dimension': 9, 'threshold': 4.0, 'inequality_sign': 'less_than', 'alpha': 0.14182243253771004}, {'dimension': 14, 'threshold': 0.0, 'inequality_sign': 'greater_than', 'alpha': 0.10264268449708018}, {'dimension': 0, 'threshold': 1.0, 'inequality_sign': 'less_than', 'alpha': 0.11883732872109554}, {'dimension': 4, 'threshold': 19.2, 'inequality_sign': 'greater_than', 'alpha': 0.09879216527106725}, {'dimension': 2, 'threshold': 36.72, 'inequality_sign': 'less_than', 'alpha': 0.12029960885056902}, {'dimension': 3, 'threshold': 92.0, 'inequality_sign': 'less_than', 'alpha': 0.10846927663989149}, {'dimension': 15, 'threshold': 0.0, 'inequality_sign': 'less_than', 'alpha': 0.09652967982091365}, {'dimension': 3, 'threshold': 73.6, 'inequality_sign': 'greater_than', 'alpha': 0.08958515309272022}, {'dimension': 18, 'threshold': 8.9, 'inequality_sign': 'less_than', 'alpha': 0.0921036196127237}, {'dimension': 16, 'threshold': 4.0, 'inequality_sign': 'greater_than', 'alpha': 0.10464142217079568}, {'dimension': 11, 'threshold': 3.2, 'inequality_sign': 'less_than', 'alpha': 0.09575457291711578}, {'dimension': 20, 'threshold': 0.0, 'inequality_sign': 'greater_than', 'alpha': 0.09624217440331542}, {'dimension': 17, 'threshold': 37.5, 'inequality_sign': 'less_than', 'alpha': 0.07859662885189661}, {'dimension': 9, 'threshold': 2.0, 'inequality_sign': 'less_than', 'alpha': 0.07142863634550768}, {'dimension': 5, 'threshold': 2.0, 'inequality_sign': 'greater_than', 'alpha': 0.07830753154662261}, {'dimension': 4, 'threshold': 28.799999999999997, 'inequality_sign': 'less_than', 'alpha': 0.07606159074712755}, {'dimension': 4, 'threshold': 19.2, 'inequality_sign': 'greater_than', 'alpha': 0.08306752811081908}, {'dimension': 7, 'threshold': 4.2, 'inequality_sign': 'greater_than', 'alpha': 0.08304167411411778}, {'dimension': 3, 'threshold': 92.0, 'inequality_sign': 'less_than', 'alpha': 0.08893356802801176}, {'dimension': 14, 'threshold': 3.0, 'inequality_sign': 'greater_than', 'alpha': 0.07000509315417822}, {'dimension': 7, 'threshold': 5.3999999999999995, 'inequality_sign': 'less_than', 'alpha': 0.07697582358566012}, {'dimension': 18, 'threshold': 0.0, 'inequality_sign': 'less_than', 'alpha': 0.08507457442866745}, {'dimension': 5, 'threshold': 3.2, 'inequality_sign': 'less_than', 'alpha': 0.06765903873020729}, {'dimension': 7, 'threshold': 3.0, 'inequality_sign': 'greater_than', 'alpha': 0.08045680822237077}, {'dimension': 12, 'threshold': 1.2, 'inequality_sign': 'less_than', 'alpha': 0.05616862921969537}, {'dimension': 11, 'threshold': 2.0, 'inequality_sign': 'greater_than', 'alpha': 0.06454264376249834}, {'dimension': 7, 'threshold': 5.3999999999999995, 'inequality_sign': 'less_than', 'alpha': 0.053088884353829344}, {'dimension': 11, 'threshold': 0.0, 'inequality_sign': 'less_than', 'alpha': 0.07346058614788575}, {'dimension': 13, 'threshold': 0.0, 'inequality_sign': 'greater_than', 'alpha': 0.07872267320907471}]
error rate of training set: 19.732%
error rate of test set: 19.403%