程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Use a python program to divide the dataset proportionally (training set + test set)

編輯:Python

There is a data set goods of retail product images, without any division.In the goods directory, the name of the subdirectory is the category of the picture, and there are multiple pictures under each category. The file structure is shown in Figure 1.Now, we need to split it into training set and test set in a certain proportion.

Figure 1. Schematic diagram of data set division

The idea of ​​realizing this function is to create a new directory tree with the same structure, and randomly select some pictures in each category in the original directory tree and move them to the corresponding category in the new directory tree.The complete code is as follows:

import osimport randomimport shutil#source dataset path and target dataset pathpath_source = './goods'path_target = './goods_test'#Parameters: the proportion of source path, target path and test setdef seperate(path_source, path_target, percent):#Generate a list of all directory names under path_sourcecategories = os.listdir(path_source)for name in categories:#Create a subdirectory with the same name under path_targetos.makedirs(os.path.join(path_target, name))#Generate a list of all images in subdirectoriesnums = os.listdir(os.path.join(path_source, name))#Randomly select a part of the picture according to the proportionnums_target = random.sample(nums, int(len(nums)*percent))#Cut the image to the target pathfor pic in nums_target:shutil.move(os.path.join(path_source, name, pic), os.path.join(path_target, name, pic))#After execution, path_source is the training set, and path_target is the test set.seperate(path_source, path_target, 0.3)

If the data set is of other structure, you can make slight changes on this basis.

Key functions among them:

os.listdir(path): List the names of all projects under path (including directories and files)

os.makedirs(path): recursively create the directory on the path path, if the directory already exists, an error will be reported

os.makedir(path) can be used to create only one directory

random.sample(list, num): randomly select num elements from the list to form a new list

shutil.move(source, target): move the source file to the target file

Note that shutil.move is a very dangerous operation. It is recommended to comment it out when debugging the code to ensure that the code is correct before executing it.


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved