程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Understand the working principle of genetic algorithm (with Python Implementation)

編輯:Python

 Datawhale dried food  

selected from :AnalyticsVidhya, Ed translate : Almost Human

In recent days, ,Analyticsvidhya A paper entitled 《Introduction to Genetic Algorithm & their application in data science》 The article , author Shubham Jain make a personal example as an effective means of convincing others , This paper gives a comprehensive and concise overview of genetic algorithm in easy to understand language , Its practical applications in many fields are listed , The data science application of genetic algorithm is emphasized . The heart of the machine compiles this article .

brief introduction

A few days ago , I set about solving a practical problem —— Sales problems in large supermarkets . After using several simple models to do some feature Engineering , I'm number one on the list 219 name .


Although the result is good , But I still want to do better . therefore , I began to study optimization methods that can improve scores . As a result, I found one , It is called genetic algorithm . After applying it to the supermarket sales problem , In the end, my score leaped to the top of the list .


you 're right , Genetic algorithm alone, I will start from 219 Name jumps directly to 15 name , Terrible! ! I believe that after reading this article , You can also use genetic algorithms freely , And will find out , When it comes to the problem you are dealing with , The effect will also be greatly improved .

Catalog

1、 The origin of genetic algorithm theory

2、 Inspired by Biology

3、 Genetic algorithm defines

4、 Specific steps of genetic algorithm

  • initialization

  • Fitness function

  • choice

  • cross

  • variation

5、 Application of genetic algorithm

  • Feature selection

  • Use  TPOT  Library implementation

6、 The practical application

7、 Conclusion

1、 The origin of genetic algorithm theory

Let's start with Charles · Darwin's famous saying begins :

It is often not the most powerful species that can survive , Not the smartest species , But the most adaptable species .

You may be thinking : What does this sentence have to do with genetic algorithm ? In fact, the whole concept of genetic algorithm is based on this sentence .

Let's use a basic example to explain  :

Let's start with a scenario , Now you are the king of a country , To save your country from disaster , You implemented a set of laws :

  • You choose all the good people , Ask them to expand the number of people through childbirth .

  • This process has been going on for generations .

  • You will find , You already have a whole bunch of good people .

This example is unlikely , But I use it to help you understand the concept . in other words , We changed the input value ( such as : Population ), You can get better output value ( such as : A better country ). Now? , I assume you have a general understanding of the concept , The meaning of genetic algorithm should be related to biology . So let's quickly look at some small concepts , In this way, it can be connected to understand .

2、 Inspired by Biology

I believe you still remember this sentence :「 Cells are the cornerstone of all living things .」 Thus we can see that , In any cell of an organism , All have the same set of chromosomes . So called chromosomes , Means by DNA Composed of Polymers .


Traditionally , These chromosomes can be represented by numbers  0  and 1 Composed of string expression .


A chromosome is made up of genes , These genes actually make up DNA Basic structure ,DNA Each gene in the genome encodes a unique trait , such as , Hair or eye color . I hope you will recall the biological concepts mentioned here before you continue reading . This concludes this section , Now let's look at what genetic algorithm actually means ?

3、 Genetic algorithm defines

First, let's go back to the example we discussed earlier , And sum up what we have done .

  1. First , We have set the initial population size of the people .

  2. then , We define a function , Use it to distinguish good people from bad people .

  3. Again , We choose good people , And let them breed their own offspring .

  4. Last , These descendants have replaced some of the bad guys from the original citizens , And repeat the process .

This is how genetic algorithms actually work , in other words , It basically tries to simulate the process of evolution to some extent .

therefore , To formally define a genetic algorithm , We can think of it as an optimization method , It can try to find some input , With these inputs, we can get the best output value or result . The way genetic algorithms work also comes from biology , See the figure below for the specific process :


Now let's understand the whole process step by step .

4、 Specific steps of genetic algorithm

In order to make the explanation easier , Let's first understand the famous combinatorial optimization problem 「 knapsack problem 」. If you don't understand , Here is a version of my explanation .

such as , You are going on a picnic 1 Months , But you can only carry one weight limit 30  Kilogram backpack . Now you have different necessities , Each of them has its own 「 Survival points 」( Details are given in the table below ). therefore , Your goal is to have a limited backpack weight , Maximize your 「 Survival points 」.


4.1  initialization

Here we use genetic algorithm to solve this knapsack problem . The first step is to define our overall . The population includes individuals , Each individual has its own set of chromosomes .

We know , Chromosomes can be expressed as binary strings , In this case ,1 Genes representing the next position exist ,0  It means losing .( translator's note : The author borrows chromosomes here 、 Genes to solve the knapsack problem in front , So the gene at a specific location represents the item in the knapsack problem table above , For example, the first position is Sleeping Bag, At this time, it is reflected in the chromosome 『 gene 』 The position is the first of the chromosome 『 gene 』.)


Now? , Let's take the 4 Chromosomes are considered as our overall initial value .

4.2  Fitness function

Next , Let's calculate the fitness scores of the first two chromosomes . about A1 Chromosome  [100110]  for , Yes :


Similarly , about A2 Chromosome  [001110]  Come on , Yes :


For this question , We think , When chromosomes contain more survival scores , That means it is more adaptable .

therefore , It can be seen from the picture that , Chromosome 1 Adaptability is stronger than chromosome 2.

4.3  choice

Now? , We can start selecting the right chromosome from the population , To make them mutual 『 mating 』, Have their own next generation . This is the general idea of the selection operation , However, this will lead to the reduction of chromosome differences after several generations , Lost diversity . therefore , We usually do 「 Roulette selection 」(Roulette Wheel Selection method).


Imagine having a roulette , Now let's split it into m Parts of , there m Represents the number of chromosomes in our population . The area occupied by each chromosome on the roulette will be expressed in proportion to the fitness score .


Based on the values in the above figure , We establish the following 「 Wheel disc 」.


Now? , This disc begins to rotate , We will be fixed by the pointer in the figure (fixed point) The area indicated was selected as the first parent . then , For the second parent , We do the same . Sometimes we mark two fixed pointers on the way , Here's the picture :


In this way , We can get two parents in one round . We call this method 「 Random universal selection 」(Stochastic Universal Selection method).

4.4  cross

In the last step , We have selected the parent chromosomes that can produce offspring . So in biological terms , So-called 「 cross 」, Actually, it means reproduction . Now let's look at the chromosomes 1 and 4( Selected in the previous step ) Conduct 「 cross 」, See the picture below :


This is the most basic form of crossover , We call it 「 A single point of intersection 」. Here we randomly select an intersection , then , The chromosome parts before and after the crossing point are cross transferred between chromosomes , So there are new descendants .

If you set two intersections , Then this method is called 「 Multi-point crossover 」, See the picture below :


4.5  variation

If we now look at this problem from a biological point of view , So, please. : Does the offspring produced by the above process have the same traits as their parents ? Is the answer . During the growth of offspring , There will be some changes in their genes , Make them different from their parents . This process we call 「 variation 」, It can be defined as a random change on a chromosome , It is because of variation , There will be diversity in the population .

The following figure is a simple example of variation :


After mutation , We get new individuals , Evolution is complete , The whole process is shown below :


After a round 「 Genetic variation 」 after , We use fitness function to verify these new offspring , If the function determines that they have sufficient Fitness , Then they will be used to replace those chromosomes with insufficient fitness from the population . Here's the problem , What criteria should we use to judge whether the offspring have reached the best fitness level ?

Generally speaking , There are several termination conditions :

  1. It's going on X After iterations , The overall situation has not changed much .

  2. We have defined the number of evolutions for the algorithm in advance .

  3. When our fitness function has reached the predefined value .

Okay , Now I assume that you have basically understood the essentials of genetic algorithm , Now let's use it in the scenario of data science .

5、 Application of genetic algorithm

5.1  Feature selection

Imagine every time you take part in a data science competition , What methods would you use to select those features that are important for the prediction of your target variables ? You often judge the importance of features in the model , Then manually set a threshold , Select the features whose importance is higher than this threshold .

that , Is there any better way to deal with this problem ? In fact, one of the most advanced algorithms for feature selection is genetic algorithm .

Our previous approach to the knapsack problem can be fully applied here . Now? , Let's start with the establishment of 「 Chromosome 」 Overall start , The chromosomes here are still binary strings ,「1」 The representation model contains this feature ,「0  The representation model excludes this feature 」.

however , There is one difference , That is, our fitness function needs to be changed . The fitness function here should be the standard of the accuracy of this competition . in other words , If the predicted value of chromosome is more accurate , Then it can be said that its adaptability is higher .

Now let me assume that you have some idea of this method . I will not explain the process of solving this problem immediately , But let's use it first TPOT Library to implement it .

5.2  use  TPOT  Library to implement

This part is believed to be the final goal you want to achieve when you first read this article . namely : Realization . So first let's take a quick look at TPOT library (Tree-based Pipeline Optimisation Technique, Tree transfer optimization technology ), The library is based on scikit-learn Library establishment . The following figure shows a basic transfer structure .


The gray areas in the figure are marked with TPOT The library implements automatic processing . Genetic algorithm is needed to realize the automatic processing of this part .

We won't go into details here , Instead, apply it directly . In order to be able to use TPOT library , You need to install some TPOT Based on it python library . Let's quickly install them :

# installing DEAP, update_checker and tqdm
pip install deap update_checker tqdm
# installling TPOT
pip install tpot

here , I used it Big Mart Sales( Dataset address :https://datahack.analyticsvidhya.com/contest/practice-problem-big-mart-sales-iii/) Data sets , Prepare for implementation , Let's start by downloading training and test files quickly , Here are python Code :

# import basic libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn import preprocessing
from sklearn.metrics import mean_squared_error
## preprocessing
### mean imputations
train['Item_Weight'].fillna((train['Item_Weight'].mean()), inplace=True)
test['Item_Weight'].fillna((test['Item_Weight'].mean()), inplace=True)
### reducing fat content to only two categories
train['Item_Fat_Content'] = train['Item_Fat_Content'].replace(['low fat','LF'], ['Low Fat','Low Fat'])
train['Item_Fat_Content'] = train['Item_Fat_Content'].replace(['reg'], ['Regular'])
test['Item_Fat_Content'] = test['Item_Fat_Content'].replace(['low fat','LF'], ['Low Fat','Low Fat'])
test['Item_Fat_Content'] = test['Item_Fat_Content'].replace(['reg'], ['Regular'])
train['Outlet_Establishment_Year'] = 2013 - train['Outlet_Establishment_Year']
test['Outlet_Establishment_Year'] = 2013 - test['Outlet_Establishment_Year']
train['Outlet_Size'].fillna('Small',inplace=True)
test['Outlet_Size'].fillna('Small',inplace=True)
train['Item_Visibility'] = np.sqrt(train['Item_Visibility'])
test['Item_Visibility'] = np.sqrt(test['Item_Visibility'])
col = ['Outlet_Size','Outlet_Location_Type','Outlet_Type','Item_Fat_Content']
test['Item_Outlet_Sales'] = 0combi = train.append(test)for i in col:
combi[i] = number.fit_transform(combi[i].astype('str'))
combi[i] = combi[i].astype('object')
train = combi[:train.shape[0]]
test = combi[train.shape[0]:]
test.drop('Item_Outlet_Sales',axis=1,inplace=True)
## removing id variables
tpot_train = train.drop(['Outlet_Identifier','Item_Type','Item_Identifier'],axis=1)
tpot_test = test.drop(['Outlet_Identifier','Item_Type','Item_Identifier'],axis=1)
target = tpot_train['Item_Outlet_Sales']
tpot_train.drop('Item_Outlet_Sales',axis=1,inplace=True)
# finally building model using tpot library
from tpot import TPOTRegressor
X_train, X_test, y_train, y_test = train_test_split(tpot_train, target,
train_size=0.75, test_size=0.25)
tpot = TPOTRegressor(generations=5, population_size=50, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_boston_pipeline.py')


Once the code is running ,tpot_exported_pipeline.py Will be used for path optimization python Code . We can find out ,ExtraTreeRegressor Can best solve this problem .

## predicting using tpot optimised pipeline
tpot_pred = tpot.predict(tpot_test)
sub1 = pd.DataFrame(data=tpot_pred)
#sub1.index = np.arange(0, len(test)+1)
sub1 = sub1.rename(columns = {'0':'Item_Outlet_Sales'})
sub1['Item_Identifier'] = test['Item_Identifier']
sub1['Outlet_Identifier'] = test['Outlet_Identifier']
sub1.columns = ['Item_Outlet_Sales','Item_Identifier','Outlet_Identifier']
sub1 = sub1[['Item_Identifier','Outlet_Identifier','Item_Outlet_Sales']]
sub1.to_csv('tpot.csv',index=False)

If you submit this csv, Then you will find that what I promised at the beginning has not been fully realized . Is it true that I am lying to you ? Of course not. . actually ,TPOT The library has a simple rule . If you don't run TPOT Too long , Then it won't find the most likely way to deliver your problem .

therefore , You have to add evolutionary Algebra , Take a coffee out for a walk , Others to TPOT Just go . Besides , You can also use this library to deal with classification problems . Refer to this document for further information :http://rhiever.github.io/tpot/. Except for the game , In life, we also have many application scenarios where genetic algorithms can be used .

6、  The practical application

Genetic algorithm has many applications in the real world . Here I have listed some interesting scenes , But due to space constraints , I won't go into details one by one .

6.1  Engineering

Engineering design relies heavily on computer modeling and simulation , In this way, the design cycle process can be fast and economical . Genetic algorithm can be optimized here and give a good result .

Related resources :

  • The paper :Engineering design using genetic algorithms

  • Address :http://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=16942&context=rtd

6.2  Transportation and shipping routes (Travelling Salesman Problem, Traveling salesman problem )

This is a very famous question , It has been used by many trading companies to make transportation more time-saving 、 economic . Genetic algorithm is also used to solve this problem .

6.3  robot

Genetic algorithm is widely used in robot field . actually , At present, people are using genetic algorithms to create autonomous learning robots that can act like humans , Its task can be cooking 、 Washing clothes, etc .

Related resources :

  • The paper :Genetic Algorithms for Auto-tuning Mobile Robot Motion Control

  • Address :https://pdfs.semanticscholar.org/7c8c/faa78795bcba8e72cd56f8b8e3b95c0df20c.pdf

7、 Conclusion

I hope to introduce , You now have a good understanding of genetic algorithms , And I can also use TPOT Library to implement it . But if you don't do it yourself , The knowledge of this article is also very limited .

therefore , Please readers, be sure to try to realize it by yourself in the data science competition or life . 

Link to the original text :https://www.analyticsvidhya.com/blog/2017/07/introduction-to-genetic-algorithm/

Sorting is not easy to , spot Fabulous Three even


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved