程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

【Python】torch. utils. data. DataLoader

編輯:Python

DataLoader

DataLoader yes PyTorch A data type in . Read data in batches .

 Use Pytorch The steps for custom data reading are as follows :
1) establish Dataset object
2) take Dataset Object is passed as a parameter to Dataloader in

Dataloader It's an iterator , The most basic use is to pass in a Dataset object , It will be based on the parameters batch_size The value of generates a batch The data of .

Parameter description

init( Constructors ) Several important properties in [3]:
1、dataset:( data type dataset)
The type of data entered . The name looks like a database ,C# There are also dataset class , Theoretically, there should be a lower level datatable. This should be the input of raw data .PyTorch There is also this data structure in . It doesn't matter here , Estimate and C# Similar , All you need to know here is that the input data type is dataset That's all right. .
2、batch_size:( data type int)
The number of rows of data entered each time , The default is 1.PyTorch When training the model, the data is not called line by line ( This is too inefficient ), They came in bundles . Here is the definition of how many rows of data are fed to the neural network each time , If I set it to 1, That is, line by line ( Personal preferences ,PyTorch The default setting is 1).
3、shuffle:( data type bool)
Shuffle . The default setting is False. Whether to shuffle the data during each iterative training , The default setting is False. Disrupt the order of input data , To make the data more independent , But if the data has sequence characteristics , Don't set it to True 了 .
4、collate_fn:( data type callable, A type I haven't seen before )
Merge a small piece of data into a data list , The default setting is False. If I set it to True, The system will send the tensor data before returning (Tensors) Copied to the CUDA In the memory .( I don't quite understand what the function is , For the time being False)
5、batch_sampler:( data type Sampler)
Batch sampling , The default setting is None. But each time it returns an index of a batch of data ( Not the data ). And batch_size、shuffle 、sampler and drop_last Parameters are incompatible . I think , It should be that the data input to the network every time is in random sampling mode , This makes the data more independent . therefore , It is input in sequence with bundles , Data shuffle , Data sampling , And other modes are incompatible .
6、sampler:( data type Sampler)
sampling , The default setting is None. Sample inputs from the dataset according to defined policies . If you define sampling rules , Then shuffle (shuffle) The setting must be False.
7、num_workers:( data type Int)
Number of workers , The default is 0. How many child processes are used to import data . Set to 0, Is to use the main process to import data . Be careful : This number must be greater than or equal to 0 Of , Negative numbers can go wrong .
8、pin_memory:( data type bool)
Memory register , The default is False. Before the data is returned , Whether to copy data to CUDA In the memory .
9、drop_last:( data type bool)
Discard last data , The default is False. Set up batch_size After the number of , The size of the last batch of data is not necessarily the set batch size , It may be smaller . Do you need to discard this batch of data .
10、timeout:( data type numeric)
Overtime , The default is 0. It is used to set the timeout of data reading , If the data is not read after this time, an error will be reported . therefore , The value must be greater than or equal to 0.
11、worker_init_fn( data type callable ?)
Subprocess import mode , The default is None. Before data import and after step length , According to the work subprocess ID Import data one by one in order .( Number of threads )
12、multiprocessing_context=None 【 I don't understand for the time being 】

And Dataset

Dataset Is a packaging class , The data can be tensor (tensor) Encapsulation , It can be used as DataLoader Parameters passed in , Further implementation is based on tensor Data preprocessing of .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved