Dataset and Dataloader in PyTorch


It is so much chaos in loading large-sized data set. Here PyTorch comes into the picture to make our task easy with its DataSet and DataLoader libraries. By importing these two libraries we can load our data in batches and will give less load to our system. let’s get started……..

I have taken the Wine dataset from here

Before going ahead let’s understand some of the confusing terms

epochs:- One complete cycle of the forward pass and backward pass.

batch:- number of training samples has been taken for one epoch

iterations (datasize/batch):-number of iteraions for one epoch

Importing necessary PyTorch libraries

import torch
from import Dataset,DataLoader
import torchvision is used to load Dataset and Dataloader. let’s discuss it in bits and pieces.

Dataset:- The Dataset class consists of three methods to implement our Custom data. These three methods are __init__() ,__len__() and __getitems__() .

i. __init__() :- Here we load the data into memory using df=pd.read_csv() then we convert the data into NumPy using df.values as we are dealing with tensors then we must have to convert the matrix into tensors using torch.from_numpy(df) . As we have dependent and independent values are combined hence in the next step we need to separate both. This is the whole lot of work we need to do in __init__() the method.

ii. __len__() :- The Dataset object should know the size of data so that our DataLoader can iterate through the whole dataset by its assigned weight.

iii. __getitem__(self,index) :- In the PyTorch tensor, the independent features and the dependent feature is stored in the form of key-value pair. So by assigning an index parameter to the getter method we can fetch their values.

figure 1

DataLoader:- DataLoader helps us to access the dataset into batches. Its work is similar to a generator so when we apply the data loader operation we need to use iter() and next() to iterate over the whole dataset.

figure 2


if you have any doubt and suggestions regarding this blog please comment below. keep learning keep exploring………


python engineer




In a process of becoming Doer.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

15 months of 24x7 Primary On-Call — Here’s How I Survived

Placing a Craft CMS application behind the CloudFront.

Orchestrating Dynamic Reports in Python and R with Rmd Files

A roadmap to build a modern Android app in 2018–2019

Data Lineage Overview & Techniques

Programming as Theory Building by Peter Naur (an excerpt)

An Unwilling Conspiracy of Silence: Confronting the Industry Secret of Non-Compliance

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
akhil anand

akhil anand

In a process of becoming Doer.

More from Medium

YOLO Object Detection using Daisies

Neural Architecture Search

PyTorch and Tensorflow in Natural Language Processing Pipeline_Model Training

Primer on Pytorch’s Dataset class