Why we need Data Manipulation ?
Real world data is so messy , we by doing certain operations make data meaningful based on one’s requirement this process of transforming messy data into insightful information can be done by data manipulation.There are various language that do support data manipulation (eg:-sql,R,Excel..etc). In this blog we will broadly discuss Pandas for data manipulation.In this section I will take titanic dataset for broader understanding.
Seaborn will load example dataset that is present in online repository.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("titanic")
What is Web Scrapping ?
The process of establishing the connection between the client and server(website) to parse the data out of that specific website is known as web scrapping.Suppose …
Prediction of Disaster Tweets using tensorflow 2.0
Here we will Be applying Deep learning Based NLP approach to predict the disaster based on tweets.I have taken dataset from Here.The dataset is present in csv format.There are various features present in dataset i.e:- id,location,keyword,text and target.Target column is given in binary format where 1 represent the condition of abnormality(Disaster) and 0 represent the normal condition(No Disaster).Let’s get started………
import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from tensorflow import keras from keras.layers import Embedding,Dense,Dropout,Bidirectional,LSTM from keras.models import Sequential from sklearn import metrics from…
this section will cover almost all Data cleaning approach
In the real world, we won’t get modified data all we need to do to modify it by itself, Here we introduce data cleaning. A good Data Scientist has a great ability for data cleaning/data modification. I will be using pandas for data cleaning operations. In this blog, I will create a toy dataset for the data cleaning process. Let’s get started…..
Step I:- Creating toy dataset and importing necessary libraries
import pandas as pd
import numpy as np
How to handle large dataset
Suppose You are dealing with image dataset having 100 classes and each class consists 1000 images if we will train the model on low configuration device it will give run out of memory.Then the question is what should we do? Here Keras wrapper comes with an idea of ImageDataGenerator . Instead of taking whole dataset with the help of ImageDataGenerator We will divide the data into batches and then will feed the batches of image data into network for image classification or various CNN applications.Please read documentation of tensorflow believe me it will give more…
It is so much chaos in loading large-sized data set. Here PyTorch comes into the picture to make our task easy with its DataSet and DataLoader libraries. By importing these two libraries we can load our data in batches and will give less load to our system. let’s get started……..
I have taken the Wine dataset from here
Before going ahead let’s understand some of the confusing terms
epochs:- One complete cycle of the forward pass and backward pass.
batch:- number of training samples has been taken for one epoch
iterations (datasize/batch):-number of iteraions for one epoch
import torch from…
This dataset consists information about used car listed on cardekho.com. It has 9 columns each columns consists information about specific features like Car_Name gives information about car company .which Year the brand new car has been purchased.selling_price the price at which car is being sold this will be target label for further prediction of price.km_driven number of kilometre car has been driven.fuel this feature the fuel type of car (CNG , petrol,diesel etc).seller_type tells whether the seller is individual or a dealer. transmission gives information about the whether the car is automatic and manual.owner number of previous owner of the…
What is Feature importance ?
It assigns the score of input features based on their importance to predict the output. More the features will be responsible to predict the output more will be their score. We can use it in both classification and regression problem.Suppose you have a bucket of 10 fruits out of which you would like to pick mango, lychee,orange so these fruits will be important for you, same way feature importance works in machine learning.In this blog we will understand various feature importance methods.let’s get started…….
It is Best for those algorithm which natively does not support…
When we predict one class out of multi class known as multi class classification .Suppose your mother has given you a task to bring mango from a basket having variety of fruits , so indirectly you mother had told you to solve multi class classification problem.
But our main is to apply the binary classification approach to predict the result from multi class.
There are some classification algorithm which has not been made to solve multi class classification problem directly these algorithms are LogisticRegression and SupportVectorClassifier. By applying heuristic approach to these algorithms we can solve multi class classification problem…
The dataset I chose is the affairs dataset that comes with Stats models. It was derived from a survey of women in 1974 by Red book magazine, in which married women were asked about their participation in extramarital affairs.I decided to treat this as a classification problem by creating a new binary variable affair (did the woman have at least one affair?) and trying to predict the classification for each woman.Variables that is present in the dataset for prediction are :-rate_marriage(women’s rating for her marriage) ,age(women’s age),yrs_married(number of years married), children(no. …
In a process of becoming Doer.