Acquaintance in the world of R: March 2013

#this post is created as a solution for assignments given on 13/02/2013 in IT & Business Applications Lab, Spring Semester, VGSoM, IIT Kharagpur Class of 2014.

Panel (data) analysis is a statistical method, widely used in social science, epidemiology, and econometrics, which deals with two-dimensional panel data.The data are usually collected over time and over the same individuals and then a regression is run over these two dimensions.A common panel data regression model looks like $y_{it}=a+bx_{it}+\epsilon_{it}$ , where y is the dependent variable, x is the independent variable, a and b are coefficients, i and t are indices for individuals and time. The error $\epsilon_{it}$ is very important in this analysis. Assumptions about the error term determine whether we speak of fixed effects or random effects. In a fixed effects model, $\epsilon_{it}$ is assumed to vary non-stochastically over $i$ or $t$ making the fixed effects model analogous to a dummy variable model in one dimension.

We will be busing 3 models for this purpose:

Pooled affect model
Fixed affect model
Random affect model

Assignment #1:
Do Panel data analysis on data "Produc" using package "plm" on three types of model and then determine which model is the best for this data set by using the following functions:
pFtest : for determining between fixed and poole
plmtest : for determining between pooled and random
phtest: for determining between random and fixed

Solution: Commands used are:

First we load the data by using following commands:
> data(Produc , package ="plm")
> head(Produc)

Snapshot of commands and result is given below:

The description for the header of data set is as under.It contains the following datatypes

- state : the state

- year : the year

- pcap: private capital stock

- hwy : highway and streets

- water: water and sewer facilities

- util: other public buildings and structures

- pc: public capital

- gsp: gross state products

- emp: labor input measured by the employement in non–agricultural payrolls

- unemp: state unemployment rate

Here, we assume that "pcap" is dependent variable and other variables are independent, so we try to estimate "pcap" by using pooled affect model

Commands and snapshot of result is given below:
> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))
> summary(pool)

Then we try to estimate "pcap" by using fixed affect model.
Commands and snapshot of result is given below:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))
> summary(fixed)

Then we try to estimate "pcap" by using Random affect model.
Commands and snapshot of result is given below:

> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))
> summary(random)

Comparison

The comparison between the models would be a Hypothesis testing where always null hypothesis will validate pooled data analysis.

H0: Null Hypothesis: the individual index and time based params are all zero

H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)

Result:
data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp)
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16

From the result, we can see that the p value is negligible, so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model over Pooled Affect model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis: Random Affect Model

Command :

> plmtest(pool)

Result:

        Lagrange Multiplier Test - (Honda)

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp)

normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model

Alternate Hypothesis: Fixed Affect Model

Command:

> phtest(fixed,random)

Result:

        Hausman Test

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp)

chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion:

So after making all the comparisons we can see that Fixed affect model is preferred over Pooled Affect Model, Random Affect model is preferred over Pooled Affect Model, and finally Fixed affect model is preferred to Random Affect model .

So, we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set and significant correlation observed with the regressor variables and index impact exists.

Acquaintance in the world of R

Thursday, 14 March 2013

Panel data Analysis: An Inception

#this post is created as a solution for assignments given on 13/02/2013 in IT & Business Applications Lab, Spring Semester, VGSoM, IIT Kharagpur Class of 2014.

We will be busing 3 models for this purpose:

Pooled affect model

Fixed affect model

Random affect model