Image for post
Image for post
source :- Human Resource

Why we need Data Manipulation ?

Real world data is so messy , we by doing certain operations make data meaningful based on one’s requirement this process of transforming messy data into insightful information can be done by data manipulation.There are various language that do support data manipulation (eg:-sql,R,Excel..etc). In this blog we will broadly discuss Pandas for data manipulation.In this section I will take titanic dataset for broader understanding.

1. Load Dataset :

Seaborn will load example dataset that is present in online repository.

import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("titanic")

2. Read first/last five rows :

It is so hectic to go through each and every row of dataset so for the Cursory Glance we see first /last five row. …


Image for post
Image for post
Source

What is Mulit-Collinearity ?

Multi Collinearity occurs when two or more independent variables(Features) are highly correlated with one another.let’s take an example.

Image for post
Image for post
Figure 1

Suppose you have a dataset to predict the salary of a person having independent variables Age,Year of Service.Here both the independent variable are strongly correlated and make their own relationship x1=m*x2+c.Because of mutual dependencies on one another these independent variable will be less correlated with y.

Multi collinearity won’t affect model performance but the effect of independent variables(which is multi collinear) on output variable will be less hence resulted into reduction in interpretability.

what are the cause of Multi-Collinearity ?


Image for post
Image for post
Building Design + construction

Overview

Boston Dataset is the information collected by U.S census Service concerning housing in Boston city.The data was originally published in 1978 containing nearly 500 samples.We can easily access this data with the help of sklearn library .Our main aim would be to predict housing price based on features present in dataset.let’s get started……

Step 1. Importing Libraries and Acquiring Dataset

Importing libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

Acquiring Dataset

boston=load_boston()
type(boston)
[0ut]>> sklearn.utils.Bunch

The type of boston data is utils.Bunch . sklearn stores data in the form of dictionary like object and Bunch consists of four keys through which we can understand data more precisely i.e:-{data,target,feature_names,DESCR}. …


Image for post
Image for post
Source

What is Optimization ?

During Backward Propagation it updates attributes (weights and biases) of neural networks.

It also ensures how much data would be needed for back propagation and provides only that amount of data to the network.

Image for post
Image for post
Figure 1

let’s get started…….

Non Momentum Based Optimization

In Non momentum based optimization new weight has no any dependencies on previous weight everytime we feed new set of inputs we will obtain new weight which has no any relationship with previous weight hang tight you will understand all these thing when you go further in this article.

Image for post
Image for post
Figure 2

1. Batch Gradient Descent

Suppose you have dataset having n training set inputs.When we will send all the training set input data to calculate the attributes is known as batch gradient descent. …


Image for post
Image for post
Source

Activation Function

When we pass any function to hidden layer , known as activation function.The task of activation function is to filter, normalize and non linearize the dataset and it also fires the input to perceptron of the next layer. In neural networks we update weights and biases in reference with error at output using Back propagation. Back Propagation would be possible only because of activation function.

Why do we introduce Non-linearlity in the dataset?

When we will be approaching deep learning problems most of the times we would deal with complex datasets. Non linear function gives clear understanding about complex datasets and can be differentiated multiple times , will give weights and baises for different- different layers and hence , will provide better learning opportunity for neural networks. …


Image for post
Image for post
source:-gCaptain

Overview :

911 is a North American emergency helpline number . By doing analysis of this dataset we will try to understand whether emergency response team is well equipped to deal with emergencies or not.We will also get to know about the frequency of emegency due to natural cause(health issues,etc) and due to human mistakes(fire accident,road accident etc).

step 1 : Importing libraries and acquiring dataset

a.> Importing libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline #function for in-notebook display

b.>Acquiring dataset

you can get dataset here.

df=pd.read_csv("911.csv")  #reading dataset from local system
df.head() #reading first five rows
Image for post
Image for post

step 2 : A brief understanding of data

a. checking shape of dataset i.e:-number of rows and column present in given dataset .


Image for post
Image for post
Image by ThoughtCo

what is Data Pre-Processing ?

Raw Dataset won’t be available in machine readable format.The process of transforming Raw dataset by doing operations like cleaning , manipulating , Standardizing, encoding, organising etc so that it can be readable by machine is known as Data Pre-Processing.

In this blog we will perform data preprocessing on black friday dataset.

Step 1 : Importing useful library

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

Step 2 : Acquiring Dataset

Now we will acquire dataset from here black friday dataset in csv format for further operations.

df_train=pd.read_csv("blackFriday_train.csv") …

Image for post
Image for post
source :- Potential

What is hypothesis testing ?

Data is not interesting at all when we do interpretation and find out some meaning out of it then it becomes interesting. Hypothesis is the method of making data interesting.It provides confidence and liklihood to an answer.

Parameters of hypothesis testing?

H(0) — — ->Null hypothesis and H(1) — — ->Alternate hypothesis play a major role in hypothesis testing.

H(0):- Represent default assumption or assumption that has been asked in question.

H(1):-When default assumption would be rejected then Alternate hypothesis comes into picture.

Critical Region,One tail test & two tail test ?

Before diving deep into hypothesis testing we need to have knowledge about parameters that play a major role in deciding whether the hypothesis is null or alternate. …


Population, statistics and …

Image for post
Image for post
Photo by Winston Chen on Unsplash

What is comparison of two population ?

We Compare two population means from independent population to obtain their difference for hypothesis testing . It plays a great role to decide whether null hypothesis should be rejected or accepted.

Image for post
Image for post
Photo by Christopher Rusev on Unsplash

Why is it so important ?

Let’s take a real world Scenario. Suppose you are a ration shop owner and there is certain fluctuation happened in sugar price and because of that you have sold sugar at two different prices i.e:- …


Image for post
Image for post
source:-infectioncontroltoday

Standardization

What is Standardization why it is important ?

Suppose you have purchased a laptop at a cost of 50,000/- in India , after some time your brother brought the same product from canada at a cost of 950 canadian dollar(1C.D=50/-) for you .You want to compare both the deal for that you have to convert the laptop’s price in single currency (either in rupee or canadian dollar).Now;

price of laptop you have purchased=50,000/-

price of laptop your brother has been brought for you=50 X 950=47,500/-

Now by converting both the laptop’s price in single currency you are able to descriminate who has paid fleecing amount of money.In statistics this process is known as standardization.We need it so that we can convert the different-different units of data into single unit for obtaining inference out of it. …

About

akhil anand

In a process of becoming Doer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store