# Data Manipulation Using Pandas

Why we need Data Manipulation ?

Real world data is so messy , we by doing certain operations make data meaningful based on one’s requirement this process of transforming messy data into insightful information can be done by data manipulation.There are various language that do support data manipulation (eg:-sql,R,Excel..etc). In this blog we will broadly discuss Pandas for data manipulation.In this section I will take titanic dataset for broader understanding.

Seaborn will load example dataset that is present in online repository.

`import numpy as np import pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltdf = sns.load_dataset("titanic")`

## 2. Read first/last five rows :

It is so hectic to go through each and every row of dataset so for the Cursory Glance we see first /last five row. …

# Multi-collinearity Key Aspect of Regression Problem

What is Mulit-Collinearity ?

Multi Collinearity occurs when two or more independent variables(Features) are highly correlated with one another.let’s take an example.

Suppose you have a dataset to predict the salary of a person having independent variables Age,Year of Service.Here both the independent variable are strongly correlated and make their own relationship x1=m*x2+c.Because of mutual dependencies on one another these independent variable will be less correlated with y.

Multi collinearity won’t affect model performance but the effect of independent variables(which is multi collinear) on output variable will be less hence resulted into reduction in interpretability.

what are the cause of Multi-Collinearity ?

# Overview

Boston Dataset is the information collected by U.S census Service concerning housing in Boston city.The data was originally published in 1978 containing nearly 500 samples.We can easily access this data with the help of sklearn library .Our main aim would be to predict housing price based on features present in dataset.let’s get started……

## Step 1. Importing Libraries and Acquiring Dataset

Importing libraries

`import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom sklearn.datasets import load_bostonfrom sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import train_test_split`

Acquiring Dataset

`boston=load_boston()type(boston)[0ut]>> sklearn.utils.Bunch`

The type of boston data is `utils.Bunch` . sklearn stores data in the form of dictionary like object and Bunch consists of four keys through which we can understand data more precisely i.e:-{data,target,feature_names,DESCR}. …

# Optimization in Neural Networks

What is Optimization ?

During Backward Propagation it updates attributes (weights and biases) of neural networks.

It also ensures how much data would be needed for back propagation and provides only that amount of data to the network.

let’s get started…….

Non Momentum Based Optimization

In Non momentum based optimization new weight has no any dependencies on previous weight everytime we feed new set of inputs we will obtain new weight which has no any relationship with previous weight hang tight you will understand all these thing when you go further in this article.

Suppose you have dataset having n training set inputs.When we will send all the training set input data to calculate the attributes is known as batch gradient descent. …

# Activation Functions in Neural Networks

## Activation Function

When we pass any function to hidden layer , known as activation function.The task of activation function is to filter, normalize and non linearize the dataset and it also fires the input to perceptron of the next layer. In neural networks we update weights and biases in reference with error at output using Back propagation. Back Propagation would be possible only because of activation function.

## Why do we introduce Non-linearlity in the dataset?

When we will be approaching deep learning problems most of the times we would deal with complex datasets. Non linear function gives clear understanding about complex datasets and can be differentiated multiple times , will give weights and baises for different- different layers and hence , will provide better learning opportunity for neural networks. …

# Exploratory Data Analysis on 911 Emergency calls

## Overview :

911 is a North American emergency helpline number . By doing analysis of this dataset we will try to understand whether emergency response team is well equipped to deal with emergencies or not.We will also get to know about the frequency of emegency due to natural cause(health issues,etc) and due to human mistakes(fire accident,road accident etc).

## step 1 : Importing libraries and acquiring dataset

a.> Importing libraries

`import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as sns%matplotlib inline                 #function for in-notebook display`

b.>Acquiring dataset

you can get dataset here.

`df=pd.read_csv("911.csv")  #reading dataset from local systemdf.head()            #reading first five rows`

## step 2 : A brief understanding of data

a. checking shape of dataset i.e:-number of rows and column present in given dataset .

# Hands on Data Pre-Processing

what is Data Pre-Processing ?

Raw Dataset won’t be available in machine readable format.The process of transforming Raw dataset by doing operations like cleaning , manipulating , Standardizing, encoding, organising etc so that it can be readable by machine is known as Data Pre-Processing.

In this blog we will perform data preprocessing on black friday dataset.

## Step 1 : Importing useful library

`import numpy as npimport pandas as pdfrom sklearn.preprocessing import StandardScalerfrom sklearn.model_selection import train_test_split`

## Step 2 : Acquiring Dataset

Now we will acquire dataset from here black friday dataset in csv format for further operations.

`df_train=pd.read_csv("blackFriday_train.csv") …`

# Hypothesis Testing Explained

What is hypothesis testing ?

Data is not interesting at all when we do interpretation and find out some meaning out of it then it becomes interesting. Hypothesis is the method of making data interesting.It provides confidence and liklihood to an answer.

Parameters of hypothesis testing?

H(0) — — ->Null hypothesis and H(1) — — ->Alternate hypothesis play a major role in hypothesis testing.

H(0):- Represent default assumption or assumption that has been asked in question.

H(1):-When default assumption would be rejected then Alternate hypothesis comes into picture.

Critical Region,One tail test & two tail test ?

Before diving deep into hypothesis testing we need to have knowledge about parameters that play a major role in deciding whether the hypothesis is null or alternate. …

# A step by step guide to population comparison

Population, statistics and …

What is comparison of two population ?

We Compare two population means from independent population to obtain their difference for hypothesis testing . It plays a great role to decide whether null hypothesis should be rejected or accepted.

Why is it so important ?

Let’s take a real world Scenario. Suppose you are a ration shop owner and there is certain fluctuation happened in sugar price and because of that you have sold sugar at two different prices i.e:- …

# Standardization

What is Standardization why it is important ?

Suppose you have purchased a laptop at a cost of 50,000/- in India , after some time your brother brought the same product from canada at a cost of 950 canadian dollar(1C.D=50/-) for you .You want to compare both the deal for that you have to convert the laptop’s price in single currency (either in rupee or canadian dollar).Now;

price of laptop you have purchased=50,000/-

price of laptop your brother has been brought for you=50 X 950=47,500/-

Now by converting both the laptop’s price in single currency you are able to descriminate who has paid fleecing amount of money.In statistics this process is known as standardization.We need it so that we can convert the different-different units of data into single unit for obtaining inference out of it. … 