Thursday, 28 March 2013

IT & BA LAB Session 10: 26/03/2013

Assignment 1: Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,

T<- cbind(x,y,z)

Create 3 dimensional plot of the same (of all the 3 types as taught)

Commands :

> Random1<-rnorm(30,mean=0,sd=1)

> Random1

> x<-Random1[1:10]

> x

> y<-Random1[11:20]

> y

> z<-Random1[21:30]

> z

> T<-cbind(x,y,z)

> T

> plot3d(T[,1:3])

> plot3d(T[,1:3],col=rainbow(64))

> plot3d(T[,1:3],col=rainbow(64),type= 's')

Screenshots:

Assignment no 2:
Read the documentation of rnorm and pnorm,
Create 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories) Hint: ?factor
3. Color code and draw the graph
4. Smooth and best fit line for the curve

Commands :
> x<-rnorm(200,mean=5,sd=1)
> y<-rnorm(200,mean=3,sd=1)
> z1<-sample(letters,5)
> z2<-sample(z1,200,replace=TRUE)
> z<-as.factor(z2)
> t<-cbind(x,y,z)
> qplot(x,y)

> qplot(x,z,alpha=I(2/10))

> qplot(x,z)

> qplot(x,y,geom=c("point","smooth"))

> qplot(x,y,colour=z)

> qplot(log(x),log(y),colour=z)

Screenshots

Saturday, 23 March 2013

IT & BA Lab session 9:19/3/2013

479acc48363

zeebly.com

my facebook proflile summary is provided this photo is my profile picture and taking about my likes and my intersts

This shows my total no of friends with the percentages of female and male friend this is very helpful because this sorts data out according to age groups.

Usage::

This tool is very interesting and attractive, the data is all provided in piecharts and bargraphs . The data is all graphical and very precise people like this and it is very easy to look and figure the required information.

Visual.ly

Visual.ly is a community platform for data visualization and infographics. It was founded by Stew Langille, Lee Sherman, Tal Siach, and Adam Breckler in 2011.

Visual.ly is structured as both as a showcase for infographics as well as a marketplace and community for publishers, designers, and researchers. The site allows users to search images through description, tags, and sources in a variety of categories, ranging from Education to Business or Politics.Users can publish infographics to their personal profile, which they can subsequently share through their social networks.

Visual.ly maintains a team of data analysts, journalists, and designers that create infographics and data visualizations using the Visual.ly tools. They are currently developing a tool that allows anyone to create and publish their own data visualizations.Through this tool, users will be able to gather information from databases and APIs in an automated service to produce an infographic.

By tapping into Visually's vibrant community of more than 35,000 designers, Marketplace is able to match infographic commissioners – brands, companies, agencies – with designers, Once matched, commissioners have direct access to the designers working on their projects and can communicate and transact with them in Visually's Project Center. Through such unique features as the Project Timeline, commissioners always know where their project stands and can ensure that it stays on time and on budget.

Visually partners with the world's leading publications and brands, bringing tools, community, and talented team to bear data visualization needs, wherever bespoke creation is needed.

Some points that I found were wonderful about this tool were:

UI is very user friendly

it is open source

numerous options regarding visual presentation of different types of data are available

the full tool is available online and it is not necessary to install any software on your PC

it is fast

the results are attractive and elegant

themes and options suiting everyone's style and taste are available.

once the visual presentation of data is ready, all possible options to retain and avail that data are available.

Here is the picture of my resume, hope you will like it.......

Friday, 15 March 2013

Session #8 -12 Mar Assignment

Session #8 -12 Mar Assignment Submission

Problem:

Perform Panel Data Analysis of "Produc" data

Solution:

There are three types of models:
      Pooled affect model
      Fixed affect model
      Random affect model

We will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed

The data can be loaded using the following command
data(Produc , package ="plm")
head(Produc)

Pooled Affect Model

pool <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("pooling"),index =c("state","year"))
summary(pool)

Fixed Affect Model:

fixed<-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("within"),index =c("state","year"))

summary(fixed)

Random Affect Model:

random <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("random"),index =c("state","year"))

> summary(random)

Testing of Model

This can be done through Hypothesis testing between the models as follows:

H0: Null Hypothesis: the individual index and time based params are all zero

H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)

Result:

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis: Random Affect Model

Command :

> plmtest(pool)

Result:

Lagrange Multiplier Test - (Honda)

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model

Alternate Hypothesis: Fixed Affect Model

Command:

> phtest(fixed,random)

Result:

Hausman Test

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion:

So after making all the tests we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation.

Wednesday, 13 February 2013

session 6- assignment 6

Assignment 6

Assignment 1:

Create log(returndata). There are two ways to do
a>[Log St -Log S(t-1)]/Log S(t-1)
b>Log[(St-S(t-1))/S(t-1)]

Take the log returns and calculate the historical data

Commands:
> stockprice=read.csv(file.choose(), header=T)
> head(stockprice)
> closingprice<-stockprice[,5]
> closingprice<-ts(closingprice,frequency=252)
> closingprice.ts<-ts(closingprice,frequency=252)
> lagtable<-cbind(closingprice.ts,lag(closingprice.ts,k=-1),(closingprice.ts-lag(closingprice.ts,k=-1)))
> head(lagtable)
> returns<-(closingprice.ts-lag(closingprice.ts,k=-1))/lag(closingprice.ts,k=-1)
> returns
> LogReturn1<-log(closingprice.ts)-log(lag(closingprice.ts,k=-1))
> LogReturn<-LogReturn1/log(lag(closingprice.ts,k=-1))
> LogReturn
> T<-252^0.5
> historicalvolatility<-sd(returns)*T
> historicalvolatility
> acf(LogReturn)
> adf.test(returns)

Augmented Dickey-Fuller Test

data: returns
Dickey-Fuller = -5.6265, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(returns) : p-value smaller than printed p-value

PIC 1:

Assignment 2:

Create Acf plot and interpret the output of above log returns data. Do ADF test and interpret.

Commands:

> acf(LogReturn)
> adf.test(LogReturn)

Augmented Dickey-Fuller Test

data: LogReturn
Dickey-Fuller = -5.6217, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(LogReturn) : p-value smaller than printed p-value

PLOT

Conclusion:
From ADF test, P-value< alpha i.e 0.01<0.05. We reject the null hypothesis and accept the alternate hypothesis. ACF plots the data within the confidence interval. The above test justifies the stationary property of time series.

Thursday, 7 February 2013

Assignment 5- session 5

Assignment-5

Assignment1: To find and plot returns for NSE data of more than months.

sol:

> z<-read.csv(file.choose(),header=T)
> head(z)
Date Open High Low Close Shares.Traded Turnover..Rs..Cr.
1 02-Jul-2012 5283.85 5302.15 5263.35 5278.60 126161441 4991.57
2 03-Jul-2012 5298.85 5317. 00 5265.95 5287.95 133117055 5161.82
3 04-Jul-2012 5310.40 5317.65 5273.30 5302.55 155995887 5750.10
4 05-Jul-2012 5297.05 5333.65 5288.85 5327.30 118915392 4709.79
5 06-Jul-2012 5324.70 5327.20 5287.75 5316.95 113300726 4760.51
6 09-Jul-2012 5283.70 5300.60 5257.75 5275.15 101169926 4189.25
> open<-z$Open[10:95]
> open.ts<-ts(open,deltat=1/252)
> open.ts
Time Series:
Start = c(1, 1)
End = c(1, 86)
Frequency = 252
[1] 5242.75 5232.35 5228.05 5199.10 5249.85 5233.55 5163.25 5128.80 5118.40
[10] 5126.30 5124.30 5129.75 5214.85 5220.70 5233.10 5195.60 5260.85 5295.40
[19] 5345.25 5348.30 5308.20 5316.35 5343.25 5385.95 5368.60 5368.70 5395.75
[28] 5426.15 5392.60 5387.85 5348.05 5343.85 5268.60 5298.20 5276.50 5249.15
[37] 5243.90 5217.65 5309.45 5343.65 5361.90 5336.10 5404.45 5435.20 5528.35
[46] 5631.75 5602.40 5536.95 5577.00 5691.95 5674.90 5653.40 5673.75 5684.80
[55] 5704.75 5727.70 5751.55 5815.00 5751.85 5708.15 5671.15 5663.50 5681.70
[64] 5674.25 5705.60 5681.10 5675.30 5703.30 5667.60 5715.65 5688.80 5683.55
[73] 5665.20 5656.35 5596.75 5609.85 5696.35 5693.05 5694.10 5718.60 5709.00
[82] 5731.10 5688.45 5689.70 5650.35 5624.80
> summary(open.ts)
Min. 1st Qu. Median Mean 3rd Qu. Max.
5118 5281 5431 5474 5682 5815
> z.diff<-diff(open.ts)
> z.diff
Time Series:
Start = c(1, 2)
End = c(1, 86)
Frequency = 252
[1] -10.40 -4.30 -28.95 50.75 -16.30 -70.30 -34.45 -10.40 7.90 -2.00
[11] 5.45 85.10 5.85 12.40 -37.50 65.25 34.55 49.85 3.05 -40.10
[21] 8.15 26.90 42.70 -17.35 0.10 27.05 30.40 -33.55 -4.75 -39.80
[31] -4.20 -75.25 29.60 -21.70 -27.35 -5.25 -26.25 91.80 34.20 18.25
[41] -25.80 68.35 30.75 93.15 103.40 -29.35 -65.45 40.05 114.95 -17.05
[51] -21.50 20.35 11.05 19.95 22.95 23.85 63.45 -63.15 -43.70 -37.00
[61] -7.65 18.20 -7.45 31.35 -24.50 -5.80 28.00 -35.70 48.05 -26.85
[71] -5.25 -18.35 -8.85 -59.60 13.10 86.50 -3.30 1.05 24.50 -9.60
[81] 22.10 -42.65 1.25 -39.35 -25.55
> returns<-cbind(open.ts,z.diff,lag(open.ts,k=-1))
> returns
Time Series:
Start = c(1, 1)
End = c(1, 87)
Frequency = 252
open.ts z.diff lag(open.ts, k = -1)
1.000000 5242.75 NA NA
1.003968 5232.35 -10.40 5242.75
1.007937 5228.05 -4.30 5232.35
1.011905 5199.10 -28.95 5228.05
1.015873 5249.85 50.75 5199.10
1.019841 5233.55 -16.30 5249.85
1.023810 5163.25 -70.30 5233.55
1.027778 5128.80 -34.45 5163.25
1.031746 5118.40 -10.40 5128.80
1.035714 5126.30 7.90 5118.40
1.039683 5124.30 -2.00 5126.30
1.043651 5129.75 5.45 5124.30
1.047619 5214.85 85.10 5129.75
1.051587 5220.70 5.85 5214.85
1.055556 5233.10 12.40 5220.70
1.059524 5195.60 -37.50 5233.10
1.063492 5260.85 65.25 5195.60
1.067460 5295.40 34.55 5260.85
1.071429 5345.25 49.85 5295.40
1.075397 5348.30 3.05 5345.25
1.079365 5308.20 -40.10 5348.30
1.083333 5316.35 8.15 5308.20
1.087302 5343.25 26.90 5316.35
1.091270 5385.95 42.70 5343.25
1.095238 5368.60 -17.35 5385.95
1.099206 5368.70 0.10 5368.60
1.103175 5395.75 27.05 5368.70
1.107143 5426.15 30.40 5395.75
1.111111 5392.60 -33.55 5426.15
1.115079 5387.85 -4.75 5392.60
1.119048 5348.05 -39.80 5387.85
1.123016 5343.85 -4.20 5348.05
1.126984 5268.60 -75.25 5343.85
1.130952 5298.20 29.60 5268.60
1.134921 5276.50 -21.70 5298.20
1.138889 5249.15 -27.35 5276.50
1.142857 5243.90 -5.25 5249.15
1.146825 5217.65 -26.25 5243.90
1.150794 5309.45 91.80 5217.65
1.154762 5343.65 34.20 5309.45
1.158730 5361.90 18.25 5343.65
1.162698 5336.10 -25.80 5361.90
1.166667 5404.45 68.35 5336.10
1.170635 5435.20 30.75 5404.45
1.174603 5528.35 93.15 5435.20
1.178571 5631.75 103.40 5528.35
1.182540 5602.40 -29.35 5631.75
1.186508 5536.95 -65.45 5602.40
1.190476 5577.00 40.05 5536.95
1.194444 5691.95 114.95 5577.00
1.198413 5674.90 -17.05 5691.95
1.202381 5653.40 -21.50 5674.90
1.206349 5673.75 20.35 5653.40
1.210317 5684.80 11.05 5673.75
1.214286 5704.75 19.95 5684.80
1.218254 5727.70 22.95 5704.75
1.222222 5751.55 23.85 5727.70
1.226190 5815.00 63.45 5751.55
1.230159 5751.85 -63.15 5815.00
1.234127 5708.15 -43.70 5751.85
1.238095 5671.15 -37.00 5708.15
1.242063 5663.50 -7.65 5671.15
1.246032 5681.70 18.20 5663.50
1.250000 5674.25 -7.45 5681.70
1.253968 5705.60 31.35 5674.25
1.257937 5681.10 -24.50 5705.60
1.261905 5675.30 -5.80 5681.10
1.265873 5703.30 28.00 5675.30
1.269841 5667.60 -35.70 5703.30
1.273810 5715.65 48.05 5667.60
1.277778 5688.80 -26.85 5715.65
1.281746 5683.55 -5.25 5688.80
1.285714 5665.20 -18.35 5683.55
1.289683 5656.35 -8.85 5665.20
1.293651 5596.75 -59.60 5656.35
1.297619 5609.85 13.10 5596.75
1.301587 5696.35 86.50 5609.85
1.305556 5693.05 -3.30 5696.35
1.309524 5694.10 1.05 5693.05
1.313492 5718.60 24.50 5694.10
1.317460 5709.00 -9.60 5718.60
1.321429 5731.10 22.10 5709.00
1.325397 5688.45 -42.65 5731.10
1.329365 5689.70 1.25 5688.45
1.333333 5650.35 -39.35 5689.70
1.337302 5624.80 -25.55 5650.35
1.341270 NA NA 5624.80
> plot(returns)
> returns<-z.diff/lag(open.ts,k=-1)
> returns
Time Series:
Start = c(1, 2)
End = c(1, 86)
Frequency = 252
[1] -1.983692e-03 -8.218105e-04 -5.537437e-03 9.761305e-03 -3.104851e-03
[6] -1.343256e-02 -6.672154e-03 -2.027765e-03 1.543451e-03 -3.901449e-04
[11] 1.063560e-03 1.658950e-02 1.121796e-03 2.375160e-03 -7.165925e-03
[16] 1.255870e-02 6.567380e-03 9.413831e-03 5.706001e-04 -7.497710e-03
[21] 1.535360e-03 5.059862e-03 7.991391e-03 -3.221344e-03 1.862683e-05
[26] 5.038464e-03 5.634064e-03 -6.183021e-03 -8.808367e-04 -7.386991e-03
[31] -7.853330e-04 -1.408161e-02 5.618191e-03 -4.095731e-03 -5.183360e-03
[36] -1.000162e-03 -5.005816e-03 1.759413e-02 6.441345e-03 3.415269e-03
[41] -4.811727e-03 1.280898e-02 5.689756e-03 1.713828e-02 1.870359e-02
[46] -5.211524e-03 -1.168249e-02 7.233224e-03 2.061144e-02 -2.995458e-03
[51] -3.788613e-03 3.599604e-03 1.947566e-03 3.509358e-03 4.022963e-03
[56] 4.163975e-03 1.103181e-02 -1.085985e-02 -7.597556e-03 -6.481960e-03
[61] -1.348933e-03 3.213561e-03 -1.311227e-03 5.524959e-03 -4.294027e-03
[66] -1.020929e-03 4.933660e-03 -6.259534e-03 8.478015e-03 -4.697628e-03
[71] -9.228660e-04 -3.228616e-03 -1.562169e-03 -1.053683e-02 2.340644e-03
[76] 1.541931e-02 -5.793183e-04 1.844354e-04 4.302699e-03 -1.678733e-03
[81] 3.871081e-03 -7.441852e-03 2.197435e-04 -6.916006e-03 -4.521844e-03
> plot(returns)

Assignment 2: Do logit analysis for 700 data points and then predict for 150 data points.

sol:

z<-read.csv(file.choose(),header=T)

head(z)

z.data<-z[1:700,1:9]

sapply(z.data,mean)

z.data$ed<-factor(z.data$ed)

logit.est<-glm(default~age+employ+address+income+debtinc+creddebt+othdebt,data=z.data,family="binomial")

summary(logit.est)

confint.default(logit.est)

logit.eg2<-with(z[701:850,1:8],data.frame(age=mean(age),employ=mean(employ),address=mean(address),income=mean(income),debtinc=mean(debtinc),creddebt=mean(creddebt),othdebt=mean(othdebt),ed=factor(1:3)))

logit.eg2$prob<-predict(logit.est,newdata=logit.eg2,type="response")

head(logit.eg2)

Wednesday, 23 January 2013

Assignment 3- session 3

Assignment 3

Assignment 3-1) Do an aggression analysis on data provided (mileage vs grooves) and comment on the results derived.

sol:

> z<- read.csv(file.choose(),header=T)
> z
mileage grove
1 0 394.33
2 4 329.50
3 8 291.00
4 12 255.17
5 16 229.33
6 20 204.83
7 24 179.00
8 28 163.83
9 32 150.33

> x<-z$grove

> y<-z$mileage

> reg1<-lm(y~x)

> summary(reg1)

Call:

lm(formula = y ~ x)

Residuals:

Min 1Q Median 3Q Max

-2.5577 -1.8696 -0.8322 1.4912 3.7249

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 47.94458 2.82389 16.98 6.03e-07 ***

x -0.13084 0.01103 -11.86 6.87e-06 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.549 on 7 degrees of freedom

Multiple R-squared: 0.9526, Adjusted R-squared: 0.9458

F-statistic: 140.7 on 1 and 7 DF, p-value: 6.871e-06

to plot the graph of the above regression

we use a function

plot(x, res)
qqnorm(res)
qqline(res)

Comment: Since the residual plot is not scattered but shows a parabolic pattern, we can say that linearity is not applicable in this case.

ASSINMENT 3- 2:: Do the regression analysis on the data provided and comment on the applicability of regression

The first picure is to retive data

The second graph is to plot the graph of the regression

This above screen shot is to show the residual values of the data

This is to plot the residual values of the data

plot(x, res)

The command used for this plot is as follows:

qqnorm(res)

this is a qq strightline plot:

qqline(res)

Comment:: As this is a random plot and satisfy linearity hence regression can be applied

Assignment 3-3:: To do the avova test and comment on the output

sol::

To retrieve the data

> z<- read.csv(file.choose(),header=T)
> z

chair com chair1
1 1 2 a
2 1 3 a
3 1 5 a
4 1 3 a
5 1 2 a
6 1 3 a
7 2 5 b
8 2 4 b
9 2 5 b
10 2 4 b
11 2 1 b
12 2 3 b
13 3 3 c
14 3 4 c
15 3 4 c
16 3 5 c
17 3 1 c
18 3 2 c
> z.anova<- aov(z$com~z$chair1)
summary(z.annova)

then we get a plot ::

Conclusion:: assume that the significance level is 5% and confidence interval is 95% then, if p=.687 we can reject the null hypothesis which is all means are the same