It is so much chaos in loading large-sized data set. Here PyTorch comes into the picture to make our task easy with its DataSet and DataLoader libraries. By importing these two libraries we can load our data in batches and will give less load to our system. let’s get started……..
I have taken the Wine dataset from here
Before going ahead let’s understand some of the confusing terms
epochs:- One complete cycle of the forward pass and backward pass.
batch:- number of training samples has been taken for one epoch
iterations (datasize/batch):-number of iteraions for one epoch
Importing necessary PyTorch libraries
from torch.utils.data import Dataset,DataLoader
torch.utls.data is used to load Dataset and Dataloader. let’s discuss it in bits and pieces.
Dataset:- The Dataset class consists of three methods to implement our Custom data. These three methods are
__init__() :- Here we load the data into memory using
df=pd.read_csv() then we convert the data into NumPy using
df.values as we are dealing with tensors then we must have to convert the matrix into tensors using
torch.from_numpy(df) . As we have dependent and independent values are combined hence in the next step we need to separate both. This is the whole lot of work we need to do in
__init__() the method.
__len__() :- The Dataset object should know the size of data so that our DataLoader can iterate through the whole dataset by its assigned weight.
__getitem__(self,index) :- In the PyTorch tensor, the independent features and the dependent feature is stored in the form of key-value pair. So by assigning an index parameter to the getter method we can fetch their values.
DataLoader:- DataLoader helps us to access the dataset into batches. Its work is similar to a generator so when we apply the data loader operation we need to use
next() to iterate over the whole dataset.
if you have any doubt and suggestions regarding this blog please comment below. keep learning keep exploring………