Thursday, 28 March 2013

IT & BA LAB Session 10: 26/03/2013

IT & BA LAB Session 10: 26/03/2013

Assignment 1: Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,
T<- cbind(x,y,z)
Create 3 dimensional plot of the same (of all the 3 types as taught) 

Commands : 
> Random1<-rnorm(30,mean=0,sd=1)
> Random1
> x<-Random1[1:10]
> x
> y<-Random1[11:20]
> y
> z<-Random1[21:30]
> z
> T<-cbind(x,y,z)
> T
> plot3d(T[,1:3])

 > plot3d(T[,1:3],col=rainbow(64))


> plot3d(T[,1:3],col=rainbow(64),type= 's')


 Screenshots:
Assignment no 2:
Read the documentation of rnorm and pnorm,
Create 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories) Hint: ?factor
3. Color code and draw the graph
4. Smooth and best fit line for the curve

Commands : 
> x<-rnorm(200,mean=5,sd=1)
> y<-rnorm(200,mean=3,sd=1)
> z1<-sample(letters,5)
> z2<-sample(z1,200,replace=TRUE)
> z<-as.factor(z2)
> t<-cbind(x,y,z)
> qplot(x,y)



 > qplot(x,z,alpha=I(2/10))

 > qplot(x,z)
> qplot(x,y,geom=c("point","smooth"))
> qplot(x,y,colour=z)
> qplot(log(x),log(y),colour=z)
Screenshots

Saturday, 23 March 2013

IT & BA Lab session 9:19/3/2013

DATA VISUALIZATION & INFOGRAPHICS.

Tool used::zeebly.com and visually
 Our society is very socialized where people have become addicted to their facebook pages  and linkedin profiles, in this competitive world where time is money it is important for us to manage our time well. People who stick to their social networking sites for long time like me  get bored seeing the same old plain pages and it takes a sufficient time for them to explore a profile and read everything about them this is a old fasioned technique. But now it is easy for me because i have come across this site called 

http://visual.ly/-which says tell stories with dataNow any kind of data in my opinion is always an un-welcomed guest..!!! But this dogma of mine was challenged & compelled to change when I went through this website. One could present his/her resume in colourful, comical & cool stylish pictographic designs. I know maybe I will never use such a resume in any company's interview, but even to have it on my blog or FB page would be a shining medal with promising likes on my posts.

Another similar site I came across is zeebly.com
and analyzing my FB Page has never been a revelation like this : http://www.zeebly.com/social_me/369805/all3/479acc48363

zeebly.com
my facebook proflile summary is provided this photo is my profile picture and taking about my likes and my intersts 






This shows my total no of friends with the percentages of female and male friend this is very helpful because this sorts data out according to age groups.



 Usage::
This tool is very interesting and attractive, the data is all provided in piecharts and bargraphs . The data is all graphical and very precise people like this and it is very easy to look and figure the required information.

Visual.ly


Visual.ly is a community platform for data visualization and infographics. It was founded by Stew Langille, Lee Sherman, Tal Siach, and Adam Breckler in 2011.

Visual.ly is structured as both as a showcase for infographics as well as a marketplace and community for publishers, designers, and researchers. The site allows users to search images through description, tags, and sources in a variety of categories, ranging from Education to Business or Politics.Users can publish infographics to their personal profile, which they can subsequently share through their social networks.

Visual.ly maintains a team of data analysts, journalists, and designers that create infographics and data visualizations using the Visual.ly tools. They are currently developing a tool that allows anyone to create and publish their own data visualizations.Through this tool, users will be able to gather information from databases and APIs in an automated service to produce an infographic. 

By tapping into Visually's vibrant community of more than 35,000 designers, Marketplace is able to match infographic commissioners – brands, companies, agencies – with designers, Once matched, commissioners have direct access to the designers working on their projects and can communicate and transact with them in Visually's Project Center. Through such unique features as the Project Timeline, commissioners always know where their project stands and can ensure that it stays on time and on budget.

Visually partners with the world's leading publications and brands, bringing  tools, community, and talented team to bear data visualization needs, wherever bespoke creation is needed.


Some points that I found were wonderful about this tool were:

UI is very user friendly
it is open source
numerous options regarding visual presentation of different types of data are available
the full tool is available online and it is not necessary to install any software on your PC
it is fast
the results are attractive and elegant
themes and options suiting everyone's style and taste are available.
once the visual presentation of data is ready, all possible options to retain and avail that data are available.
Here is the picture of my resume, hope you will like it.......



Friday, 15 March 2013

Session #8 -12 Mar Assignment

Session #8 -12 Mar Assignment Submission



Problem: 

Perform Panel Data Analysis of "Produc" data

Solution:


There are three types of models:
      Pooled affect model
      Fixed affect model
      Random affect model 

We will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed

The data can be loaded using the following command
data(Produc , package ="plm")
head(Produc)

Pooled Affect Model 

pool <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("pooling"),index =c("state","year"))
summary(pool)
Fixed Affect Model:

fixed<-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("within"),index =c("state","year"))
summary(fixed)
Random Affect Model:

random <-plm( log(pcap) ~log(hwy)+ log(water)+ log(util) + log(pc) + log(gsp) + log(emp) + log(unemp), data=Produc,model=("random"),index =c("state","year"))
> summary(random)


Testing of Model

This can be done through Hypothesis testing between the models as follows:

H0: Null Hypothesis: the individual index and time based params are all zero
H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)


Result:
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) 
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects 
Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis: Random Affect Model

Command :
> plmtest(pool)

Result:

  Lagrange Multiplier Test - (Honda)
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects 

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model
Alternate Hypothesis: Fixed Affect Model

Command:
 > phtest(fixed,random)

Result:

 Hausman Test
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent 

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion: 

So after making all the tests we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation. 



Wednesday, 13 February 2013

session 6- assignment 6

Assignment 6


Assignment 1:

Create log(returndata). There are two ways to do
a>[Log St -Log S(t-1)]/Log S(t-1)
b>Log[(St-S(t-1))/S(t-1)]

Take the log returns and calculate the historical data

Commands:
> stockprice=read.csv(file.choose(), header=T)
> head(stockprice)
> closingprice<-stockprice[,5]
> closingprice<-ts(closingprice,frequency=252)
> closingprice.ts<-ts(closingprice,frequency=252)
> lagtable<-cbind(closingprice.ts,lag(closingprice.ts,k=-1),(closingprice.ts-lag(closingprice.ts,k=-1)))
> head(lagtable)
> returns<-(closingprice.ts-lag(closingprice.ts,k=-1))/lag(closingprice.ts,k=-1)
> returns
> LogReturn1<-log(closingprice.ts)-log(lag(closingprice.ts,k=-1))
> LogReturn<-LogReturn1/log(lag(closingprice.ts,k=-1))
> LogReturn
> T<-252^0.5
> historicalvolatility<-sd(returns)*T
>  historicalvolatility
> acf(LogReturn)
> adf.test(returns)

        Augmented Dickey-Fuller Test

data:  returns 
Dickey-Fuller = -5.6265, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary 

Warning message:
In adf.test(returns) : p-value smaller than printed p-value


PIC 1:







Assignment 2:
Create Acf plot and interpret the output of above log returns data. Do ADF test and interpret.

Commands:
> acf(LogReturn)
> adf.test(LogReturn)

        Augmented Dickey-Fuller Test

data:  LogReturn
Dickey-Fuller = -5.6217, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(LogReturn) : p-value smaller than printed p-value
PLOT
Conclusion:
From ADF test, P-value< alpha i.e 0.01<0.05. We reject the null hypothesis and accept the alternate hypothesis. ACF plots the data within the confidence interval. The above test justifies the stationary property of  time series. 

Thursday, 7 February 2013

Assignment 5- session 5

Assignment-5



Assignment1: To find and plot returns for NSE data of more than     months.

sol:

> z<-read.csv(file.choose(),header=T)
> head(z)
         Date    Open    High     Low   Close Shares.Traded Turnover..Rs..Cr.
1 02-Jul-2012 5283.85 5302.15 5263.35 5278.60      126161441           4991.57
2 03-Jul-2012 5298.85 5317. 00 5265.95 5287.95     133117055           5161.82
3 04-Jul-2012 5310.40 5317.65 5273.30 5302.55     155995887           5750.10
4 05-Jul-2012 5297.05 5333.65 5288.85 5327.30     118915392           4709.79
5 06-Jul-2012 5324.70 5327.20 5287.75 5316.95     113300726           4760.51
6 09-Jul-2012 5283.70 5300.60 5257.75 5275.15     101169926           4189.25
> open<-z$Open[10:95]
> open.ts<-ts(open,deltat=1/252)
> open.ts
Time Series:
Start = c(1, 1)
End = c(1, 86)
Frequency = 252
 [1] 5242.75 5232.35 5228.05 5199.10 5249.85 5233.55 5163.25 5128.80 5118.40
[10] 5126.30 5124.30 5129.75 5214.85 5220.70 5233.10 5195.60 5260.85 5295.40
[19] 5345.25 5348.30 5308.20 5316.35 5343.25 5385.95 5368.60 5368.70 5395.75
[28] 5426.15 5392.60 5387.85 5348.05 5343.85 5268.60 5298.20 5276.50 5249.15
[37] 5243.90 5217.65 5309.45 5343.65 5361.90 5336.10 5404.45 5435.20 5528.35
[46] 5631.75 5602.40 5536.95 5577.00 5691.95 5674.90 5653.40 5673.75 5684.80
[55] 5704.75 5727.70 5751.55 5815.00 5751.85 5708.15 5671.15 5663.50 5681.70
[64] 5674.25 5705.60 5681.10 5675.30 5703.30 5667.60 5715.65 5688.80 5683.55
[73] 5665.20 5656.35 5596.75 5609.85 5696.35 5693.05 5694.10 5718.60 5709.00
[82] 5731.10 5688.45 5689.70 5650.35 5624.80
> summary(open.ts)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   5118    5281    5431    5474    5682    5815
> z.diff<-diff(open.ts)
> z.diff
Time Series:
Start = c(1, 2)
End = c(1, 86)
Frequency = 252
 [1] -10.40  -4.30 -28.95  50.75 -16.30 -70.30 -34.45 -10.40   7.90  -2.00
[11]   5.45  85.10   5.85  12.40 -37.50  65.25  34.55  49.85   3.05 -40.10
[21]   8.15  26.90  42.70 -17.35   0.10  27.05  30.40 -33.55  -4.75 -39.80
[31]  -4.20 -75.25  29.60 -21.70 -27.35  -5.25 -26.25  91.80  34.20  18.25
[41] -25.80  68.35  30.75  93.15 103.40 -29.35 -65.45  40.05 114.95 -17.05
[51] -21.50  20.35  11.05  19.95  22.95  23.85  63.45 -63.15 -43.70 -37.00
[61]  -7.65  18.20  -7.45  31.35 -24.50  -5.80  28.00 -35.70  48.05 -26.85
[71]  -5.25 -18.35  -8.85 -59.60  13.10  86.50  -3.30   1.05  24.50  -9.60
[81]  22.10 -42.65   1.25 -39.35 -25.55
> returns<-cbind(open.ts,z.diff,lag(open.ts,k=-1))
> returns
Time Series:
Start = c(1, 1)
End = c(1, 87)
Frequency = 252
         open.ts z.diff lag(open.ts, k = -1)
1.000000 5242.75     NA                   NA
1.003968 5232.35 -10.40              5242.75
1.007937 5228.05  -4.30              5232.35
1.011905 5199.10 -28.95              5228.05
1.015873 5249.85  50.75              5199.10
1.019841 5233.55 -16.30              5249.85
1.023810 5163.25 -70.30              5233.55
1.027778 5128.80 -34.45              5163.25
1.031746 5118.40 -10.40              5128.80
1.035714 5126.30   7.90              5118.40
1.039683 5124.30  -2.00              5126.30
1.043651 5129.75   5.45              5124.30
1.047619 5214.85  85.10              5129.75
1.051587 5220.70   5.85              5214.85
1.055556 5233.10  12.40              5220.70
1.059524 5195.60 -37.50              5233.10
1.063492 5260.85  65.25              5195.60
1.067460 5295.40  34.55              5260.85
1.071429 5345.25  49.85              5295.40
1.075397 5348.30   3.05              5345.25
1.079365 5308.20 -40.10              5348.30
1.083333 5316.35   8.15              5308.20
1.087302 5343.25  26.90              5316.35
1.091270 5385.95  42.70              5343.25
1.095238 5368.60 -17.35              5385.95
1.099206 5368.70   0.10              5368.60
1.103175 5395.75  27.05              5368.70
1.107143 5426.15  30.40              5395.75
1.111111 5392.60 -33.55              5426.15
1.115079 5387.85  -4.75              5392.60
1.119048 5348.05 -39.80              5387.85
1.123016 5343.85  -4.20              5348.05
1.126984 5268.60 -75.25              5343.85
1.130952 5298.20  29.60              5268.60
1.134921 5276.50 -21.70              5298.20
1.138889 5249.15 -27.35              5276.50
1.142857 5243.90  -5.25              5249.15
1.146825 5217.65 -26.25              5243.90
1.150794 5309.45  91.80              5217.65
1.154762 5343.65  34.20              5309.45
1.158730 5361.90  18.25              5343.65
1.162698 5336.10 -25.80              5361.90
1.166667 5404.45  68.35              5336.10
1.170635 5435.20  30.75              5404.45
1.174603 5528.35  93.15              5435.20
1.178571 5631.75 103.40              5528.35
1.182540 5602.40 -29.35              5631.75
1.186508 5536.95 -65.45              5602.40
1.190476 5577.00  40.05              5536.95
1.194444 5691.95 114.95              5577.00
1.198413 5674.90 -17.05              5691.95
1.202381 5653.40 -21.50              5674.90
1.206349 5673.75  20.35              5653.40
1.210317 5684.80  11.05              5673.75
1.214286 5704.75  19.95              5684.80
1.218254 5727.70  22.95              5704.75
1.222222 5751.55  23.85              5727.70
1.226190 5815.00  63.45              5751.55
1.230159 5751.85 -63.15              5815.00
1.234127 5708.15 -43.70              5751.85
1.238095 5671.15 -37.00              5708.15
1.242063 5663.50  -7.65              5671.15
1.246032 5681.70  18.20              5663.50
1.250000 5674.25  -7.45              5681.70
1.253968 5705.60  31.35              5674.25
1.257937 5681.10 -24.50              5705.60
1.261905 5675.30  -5.80              5681.10
1.265873 5703.30  28.00              5675.30
1.269841 5667.60 -35.70              5703.30
1.273810 5715.65  48.05              5667.60
1.277778 5688.80 -26.85              5715.65
1.281746 5683.55  -5.25              5688.80
1.285714 5665.20 -18.35              5683.55
1.289683 5656.35  -8.85              5665.20
1.293651 5596.75 -59.60              5656.35
1.297619 5609.85  13.10              5596.75
1.301587 5696.35  86.50              5609.85
1.305556 5693.05  -3.30              5696.35
1.309524 5694.10   1.05              5693.05
1.313492 5718.60  24.50              5694.10
1.317460 5709.00  -9.60              5718.60
1.321429 5731.10  22.10              5709.00
1.325397 5688.45 -42.65              5731.10
1.329365 5689.70   1.25              5688.45
1.333333 5650.35 -39.35              5689.70
1.337302 5624.80 -25.55              5650.35
1.341270      NA     NA              5624.80
> plot(returns)
> returns<-z.diff/lag(open.ts,k=-1)
> returns
Time Series:
Start = c(1, 2)
End = c(1, 86)
Frequency = 252
 [1] -1.983692e-03 -8.218105e-04 -5.537437e-03  9.761305e-03 -3.104851e-03
 [6] -1.343256e-02 -6.672154e-03 -2.027765e-03  1.543451e-03 -3.901449e-04
[11]  1.063560e-03  1.658950e-02  1.121796e-03  2.375160e-03 -7.165925e-03
[16]  1.255870e-02  6.567380e-03  9.413831e-03  5.706001e-04 -7.497710e-03
[21]  1.535360e-03  5.059862e-03  7.991391e-03 -3.221344e-03  1.862683e-05
[26]  5.038464e-03  5.634064e-03 -6.183021e-03 -8.808367e-04 -7.386991e-03
[31] -7.853330e-04 -1.408161e-02  5.618191e-03 -4.095731e-03 -5.183360e-03
[36] -1.000162e-03 -5.005816e-03  1.759413e-02  6.441345e-03  3.415269e-03
[41] -4.811727e-03  1.280898e-02  5.689756e-03  1.713828e-02  1.870359e-02
[46] -5.211524e-03 -1.168249e-02  7.233224e-03  2.061144e-02 -2.995458e-03
[51] -3.788613e-03  3.599604e-03  1.947566e-03  3.509358e-03  4.022963e-03
[56]  4.163975e-03  1.103181e-02 -1.085985e-02 -7.597556e-03 -6.481960e-03
[61] -1.348933e-03  3.213561e-03 -1.311227e-03  5.524959e-03 -4.294027e-03
[66] -1.020929e-03  4.933660e-03 -6.259534e-03  8.478015e-03 -4.697628e-03
[71] -9.228660e-04 -3.228616e-03 -1.562169e-03 -1.053683e-02  2.340644e-03
[76]  1.541931e-02 -5.793183e-04  1.844354e-04  4.302699e-03 -1.678733e-03
[81]  3.871081e-03 -7.441852e-03  2.197435e-04 -6.916006e-03 -4.521844e-03
> plot(returns)





Assignment 2: Do logit analysis for 700 data points and then predict for 150 data points.

sol:

z<-read.csv(file.choose(),header=T)

head(z)

z.data<-z[1:700,1:9]

sapply(z.data,mean)

z.data$ed<-factor(z.data$ed)

logit.est<-glm(default~age+employ+address+income+debtinc+creddebt+othdebt,data=z.data,family="binomial")

summary(logit.est)

confint.default(logit.est)

logit.eg2<-with(z[701:850,1:8],data.frame(age=mean(age),employ=mean(employ),address=mean(address),income=mean(income),debtinc=mean(debtinc),creddebt=mean(creddebt),othdebt=mean(othdebt),ed=factor(1:3)))

logit.eg2$prob<-predict(logit.est,newdata=logit.eg2,type="response")

head(logit.eg2)







Wednesday, 23 January 2013

Assignment 3- session 3

Assignment 3

Assignment 3-1) Do an aggression analysis on data provided (mileage vs grooves)  and comment on the results derived.

sol:
> z<- read.csv(file.choose(),header=T)
> z
  mileage  grove
1       0 394.33
2       4 329.50
3       8 291.00
4      12 255.17
5      16 229.33
6      20 204.83
7      24 179.00
8      28 163.83
9      32 150.33

> x<-z$grove
> y<-z$mileage
> reg1<-lm(y~x)
> summary(reg1)

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5577 -1.8696 -0.8322  1.4912  3.7249 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 47.94458    2.82389   16.98 6.03e-07 ***
x           -0.13084    0.01103  -11.86 6.87e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 2.549 on 7 degrees of freedom
Multiple R-squared: 0.9526,     Adjusted R-squared: 0.9458 
F-statistic: 140.7 on 1 and 7 DF,  p-value: 6.871e-06 

to plot the graph of the above regression
we use a function 
plot(x, res)
qqnorm(res)
qqline(res)




Comment: Since the residual plot is not scattered but shows a parabolic pattern, we can say that linearity is not applicable in this case.




ASSINMENT 3- 2:: Do the regression analysis on the data provided and comment on the applicability of regression 
The first picure is to retive data




The second graph is to plot the graph of the regression



This above screen shot is to show the residual values of the data





This is to plot the residual values of the data
plot(x, res)



The command used for this plot is as follows:
qqnorm(res)




this is a qq strightline plot:
qqline(res)



Comment:: As this is a random plot and satisfy linearity hence regression can be applied



Assignment 3-3:: To do the avova test and comment on the output


sol::
To retrieve the data
> z<- read.csv(file.choose(),header=T)
> z

   chair com chair1
1      1   2      a
2      1   3      a
3      1   5      a
4      1   3      a
5      1   2      a
6      1   3      a
7      2   5      b
8      2   4      b
9      2   5      b
10     2   4      b
11     2   1      b
12     2   3      b
13     3   3      c
14     3   4      c
15     3   4      c
16     3   5      c
17     3   1      c
18     3   2      c
> z.anova<- aov(z$com~z$chair1)
summary(z.annova)

then we get a plot ::

Conclusion:: assume that the significance level is 5% and confidence interval is 95% then, if p=.687 we can reject the null hypothesis which is all means are the same