Dataset and Dataloader in PyTorch

akhil anand
2 min readFeb 28, 2021
Source

It is so much chaos in loading large-sized data set. Here PyTorch comes into the picture to make our task easy with its DataSet and DataLoader libraries. By importing these two libraries we can load our data in batches and will give less load to our system. let’s get started……..

I have taken the Wine dataset from here

Before going ahead let’s understand some of the confusing terms

epochs:- One complete cycle of the forward pass and backward pass.

batch:- number of training samples has been taken for one epoch

iterations (datasize/batch):-number of iteraions for one epoch

Importing necessary PyTorch libraries

import torch
from torch.utils.data import Dataset,DataLoader
import torchvision

torch.utls.data is used to load Dataset and Dataloader. let’s discuss it in bits and pieces.

Dataset:- The Dataset class consists of three methods to implement our Custom data. These three methods are __init__() ,__len__() and __getitems__() .

i. __init__() :- Here we load the data into memory using df=pd.read_csv() then we convert the data into NumPy using df.values as we are dealing with tensors then we must have to convert the matrix into tensors using torch.from_numpy(df) . As we have dependent and independent values are combined hence in the next step we need to separate both. This is the whole lot of work we need to do in __init__() the method.

ii. __len__() :- The Dataset object should know the size of data so that our DataLoader can iterate through the whole dataset by its assigned weight.

iii. __getitem__(self,index) :- In the PyTorch tensor, the independent features and the dependent feature is stored in the form of key-value pair. So by assigning an index parameter to the getter method we can fetch their values.

figure 1

DataLoader:- DataLoader helps us to access the dataset into batches. Its work is similar to a generator so when we apply the data loader operation we need to use iter() and next() to iterate over the whole dataset.

figure 2

conclusion:-

if you have any doubt and suggestions regarding this blog please comment below. keep learning keep exploring………

Reference:-

python engineer

--

--