source :- Human Resource

Why we need Data Manipulation ?

Real world data is so messy , we by doing certain operations make data meaningful based on one’s requirement this process of transforming messy data into insightful information can be done by data manipulation.There are various language that do support data manipulation (eg:-sql,R,Excel..etc). In this blog we will broadly discuss Pandas for data manipulation.In this section I will take titanic dataset for broader understanding.

1. Load Dataset :

Seaborn will load example dataset that is present in online repository.

2. Read first/last five rows :


What is Web Scrapping ?

The process of establishing the connection between the client and server(website) to parse the data out of that specific website is known as web scrapping.Suppose …

Prediction of Disaster Tweets using tensorflow 2.0


Here we will Be applying Deep learning Based NLP approach to predict the disaster based on tweets.I have taken dataset from Here.The dataset is present in csv format.There are various features present in dataset i.e:- id,location,keyword,text and target.Target column is given in binary format where 1 represent the condition of abnormality(Disaster) and 0 represent the normal condition(No Disaster).Let’s get started………

Step I :- Loading Data set and importing necessary libraries

this section will cover almost all Data cleaning approach


In the real world, we won’t get modified data all we need to do to modify it by itself, Here we introduce data cleaning. A good Data Scientist has a great ability for data cleaning/data modification. I will be using pandas for data cleaning operations. In this blog, I will create a toy dataset for the data cleaning process. Let’s get started…..

Step I:- Creating toy dataset and importing necessary libraries

How to handle large dataset


Suppose You are dealing with image dataset having 100 classes and each class consists 1000 images if we will train the model on low configuration device it will give run out of memory.Then the question is what should we do? Here Keras wrapper comes with an idea of ImageDataGenerator . Instead of taking whole dataset with the help of ImageDataGenerator We will divide the data into batches and then will feed the batches of image data into network for image classification or various CNN applications.Please read documentation of tensorflow believe me it will give more…


It is so much chaos in loading large-sized data set. Here PyTorch comes into the picture to make our task easy with its DataSet and DataLoader libraries. By importing these two libraries we can load our data in batches and will give less load to our system. let’s get started……..

I have taken the Wine dataset from here

Before going ahead let’s understand some of the confusing terms

epochs:- One complete cycle of the forward pass and backward pass.

batch:- number of training samples has been taken for one epoch

iterations (datasize/batch):-number of iteraions for one epoch

Importing necessary PyTorch libraries



This dataset consists information about used car listed on It has 9 columns each columns consists information about specific features like Car_Name gives information about car company .which Year the brand new car has been purchased.selling_price the price at which car is being sold this will be target label for further prediction of price.km_driven number of kilometre car has been driven.fuel this feature the fuel type of car (CNG , petrol,diesel etc).seller_type tells whether the seller is individual or a dealer. transmission gives information about the whether the car is automatic and manual.owner number of previous owner of the…


What is Feature importance ?

It assigns the score of input features based on their importance to predict the output. More the features will be responsible to predict the output more will be their score. We can use it in both classification and regression problem.Suppose you have a bucket of 10 fruits out of which you would like to pick mango, lychee,orange so these fruits will be important for you, same way feature importance works in machine learning.In this blog we will understand various feature importance methods.let’s get started…….

1. Permutation Feature Importance :

It is Best for those algorithm which natively does not support…

Binary Class Classification Approach to Solve Multi Class Classification Problem


What is Multi Class Classification problem?

When we predict one class out of multi class known as multi class classification .Suppose your mother has given you a task to bring mango from a basket having variety of fruits , so indirectly you mother had told you to solve multi class classification problem.

But our main is to apply the binary classification approach to predict the result from multi class.

Why we need One vs Rest and One vs One?

There are some classification algorithm which has not been made to solve multi class classification problem directly these algorithms are LogisticRegression and SupportVectorClassifier. By applying heuristic approach to these algorithms we can solve multi class classification problem…


Overview :

The dataset I chose is the affairs dataset that comes with Stats models. It was derived from a survey of women in 1974 by Red book magazine, in which married women were asked about their participation in extramarital affairs.I decided to treat this as a classification problem by creating a new binary variable affair (did the woman have at least one affair?) and trying to predict the classification for each woman.Variables that is present in the dataset for prediction are :-rate_marriage(women’s rating for her marriage) ,age(women’s age),yrs_married(number of years married), children(no. …

akhil anand

In a process of becoming Doer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store