Tuesday, 22 January 2013

Exploring Regression and ANOVA

#this post is created as a solution for assignments given on 22/01/2013 in IT & Business Applications Lab, Spring Semester, VGSoM, IIT Kharagpur Class of 2014.

Assignment #1: We believe for one kind of car, groove(Independent Variable) is impacting mileage (Dependent Variable). We have to fit 'lm' and comment on the applicability of 'lm'. 

Solution: 

Steps:

1. Extract the data in a separate .csv file.
2.Assign Groove and Mileage to separate variables and apply Regression on them
3. Find the Residual and draw the Q-Q plot.
Results:


plot(k1,res):

 
Q-Q Norm(res):

 
Q-Q Line(res):



We can see that the generated plot is not scattered enough, so linearity is not applicable in this case.
Assignment #2:
Using data of alpha and pluto, find the following:
1. First find the linear regression:


2.Calculate the residuals:
3.plot(p1,res1):
4. Standard residual:


5.Q-Q Norm(res1):
 6. Q-QLine(res1) :


Assignment #3:
Justify Null Hypothesis using ANOVA:
Answer:



We found from the result that, p=0.687
Using 95% confidence interval, we can see that as p>0.05
So, we can't reject the Null Hypothesis and we accept it.

Tuesday, 15 January 2013

The Matrix Revolutionised!!!

#this post is created as a solution for assignment for IT & Business Applications Lab, Spring Semester, VGSoM, IIT Kharagpur Class of 2014.

As our learning progressed, we explored some more features of R(Matrix Representation,Regression Analysis, Normal Distribution).
Based on our learning, we are submitting the following asignments:
Assignment # 1
a. We have to create two matrix.
b. We have to select highlighted columns.
c. We have to use "cbind" command to join those two columns and create a new matrix.
The solution is given below: 
Sol -:
Matrix 2 assignment and generation
> mat1<-c(1:10)
> dim(mat1)<-c(2,5)
Matrix 2 assignment and generation
> mat2<-c(11:16)
> dim(mat2)<-c(2,3)
Taking 3rd column from matrix1 and 2nd column from matrix 2, we use the cbind(for column binding) and rbind(for row binding) functions as shown -



Assignment # 2
We have to Multiply 2 matrices
Sol -:
Command to multiply 2 matrices
> multip <- z1 %*% z2


Assignment #3-:
1.To download NSE data dated from 1st Dec, 2012 to 31st Dec, 2012 in the form of a .csv file.
2.To find regression between the High Price and the opening share price and calculate the residuals. Soln- :
Command for finding the Regression :
> reg1<-lm(HighPrice ~ OpenPrice , data = NSEData)
The above arguments are explained below:
NSEData - Object with file historical data
High Price - Dependent variable
Open Price - Independent variable
The snapshot of the data collected is given below:

 The Residuals calculated are given below:

Assignment # 4
We have to Generate and plot a Normal distribution, with arbitrary mean and standard deviation taken.Soln -:
To generate normally distributed random numbers function used is -:
dnorm(N, mean,sd)
where N is the no of observations
mean is the mean vector
sd - standard deviation
The command ran are given below:
We have got the following normal distribution curve for the taken mean and standard deviation:


Tuesday, 8 January 2013

The exordium

Journey Begins.....

R or rather the R Statistical package, very simply put is the open source equivalent of SAS.  R can pretty much do everything SAS can do in terms of Statistical analysis and there are some pretty cool things R can do which SAS can’t. Say someone wants to build a predictive model using Logistic regression, well R can do it; ARIMA model, yes; Decision Trees, yes; Association rule mining,yes;etc.Many of R's standard functions are written in R itself, which makes it easy for users to follow the algorithmic choices made. It's applied in insurance,finance, marketing etc.      

 In a nutshell, R is here to stay and to grow.

The R project for Statistical Computing
Assignment 1: Draw a histogram after concatenating 3 data points.
Soln : 
Commands used are as under -:
> x<-c(1,2,3)
> plot(x, type = "h")

Assignment 2: Drawing a line graph with points and naming the graph and the axis.  

Soln : We gathered the data from National Stock Exchange web site. Let z be the variable that contains data from the .csv file selected. Reading from the csv file is done as under -:   

> z<-read.csv(file.choose(), header=T)

This command prompts the user to select the data file from the saved location. 

zcol1 be the variable that contains contents of column 3 from the excel data.

the following commands were used.
> zcol1<-z[,3]
> plot(zcol1 , type="b" , main="NSE Graph" , xlab="Time" , ylab="indices").

Assignment 3: Merge two columns from the table obtained. Create a scatter plot by using share HIGH and LOW values from the NSE Historical data as obtained from the .csv file.
Soln :HIGH values as obtained in previous ques 
> zcol1<-z[,3]
LOW values are in column 4 from the csv file
> zcol2<-z[,4]
To plot the scatter plot 
> plot(zcol1,zcol2)


Assignment 4 :
To find the volatility between the merged values obtained from NSE historical data and obtain the range for the same.
Soln :-
For this, we would require the maximum value amongst the HIGH values and the minimum values amongst the LOW values.
Merging both the columns into one vector variable 'y' to get the HIGH and LOW values together.
> y<-c(zcol1,zcol2)
> summary(y)
 will give the min and the max value as under -:
   Min.    1st Qu.  Median    Mean   3rd Qu.    Max.
   4888    5660    5723        5758    5884       6021 

> range(y)
will give the desired range of volatility
[1] 4888.20 6020.75