More Related Content Similar to 関東第2回r勉強会 (20) 関東第2回r勉強会3. for
> for (i in 1:10) cat(pnorm(i)," ")
0.8413447 0.9772499 0.9986501 0.9999683 0.9999997 1 1 1 1 1
for (name in expr_1) expr_2
> for (i in 1:ncol(faithful)){
>
print(c(min(faithful[,i]),max(faithful[,i]),mean(faithful[,i]),median(faithf
ul[,i])))
> }
> xyz = list(42,c(1,2,3),matrix(c(1:4),2,2))
> for (i in 1:length(xyz)) xyz[[i]] = xyz[[i]]+1
[1] 43
[1] 2 3 4
[,1] [,2]
[1,] 2 4
[2,] 3 5
4. while
> while (i < 5){
print(dnorm(i))
i = i+1
}
[1] 0.3989423
[1] 0.2419707
[1] 0.05399097
[1] 0.004431848
[1] 0.0001338302
while (condition) expr
5. > x = 2
> repeat{
> print(x)
> x = x^2
> }
...
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1]
...
repeat expr
repeat
x = 2
repeat{
print(x)
x = x^2
if(x == Inf ) break
}
[1] 2
[1] 4
[1] 16
[1] 256
[1] 65536
[1] 4294967296
[1] 1.844674e+19
[1] 3.402824e+38
[1] 1.157921e+77
[1] 1.340781e+154
Break 以外には next (AKA ”continue”) もあります
7. Warning: for() loops are used in R code much less often tha
n in
compiled languages. Code that takes a ‘whole object’ view is l
ikely
to be both clearer and faster in R
An introduction to R
(http://cran.r-project.org)
けど。。。
8. 1. Vectorization
> A = matrix(1:4,nrow=2,ncol=2)
> A
[,1] [,2]
[1,] 1 3
[2,] 2 4
> B = matrix(2,nrow=2,ncol=2)
> B
[,1] [,2]
[1,] 2 2
[2,] 2 2
> A+B
[,1] [,2]
[1,] 3 5
[2,] 4 6
> A*B
[,1] [,2]
[1,] 2 6
[2,] 4 8
> A %*% B
[,1] [,2]
[1,] 8 8
[2,] 12 12
配列掛け算:
> A = 1:4
> A
[1] 1 2 3 4
> B = c(2,2,2,2)
> B
[1] 2 2 2 2
> A+B
[1] 3 4 5 6
> A*B
[1] 2 4 6 8
> A^2
[1] 1 4 9 16
R言語の関数はVectorizationということ使って
ベクターのようなオブジェクト一緒に扱って処理します
10. 1. Vectorization – MSE
mse = function(Q0,Q1,X,Y){
sum = 0
for(i in 1:length(X)){
temp_sum = (Y[i] -
(Q0+X[i]*Q1))^2
sum = sum + temp_sum
}
return(sum / length(X))
}
mse = 1/n*sum(((q0 + q1*X)-Y)^2)
MSE=
1
n
∑
i=1
n
( ̂Yi−Yi)
2
̂Yi=Q0+ Q1×Xi
MSE=
1
n
∑
i=1
n
((Q0+ Q1×X i)−Yi)2
MSE=
1
n
∑
i=1
n
((Q0+ Q1×X i)−Yi)
2
11. > y = sort(rnorm(10e5,mean=20,sd=10),decreasing=TRUE)
> x = seq(y[1],y[length(y)],length.out=10e5)
> system.time(sum((y-(q0 +
x*q1))^2)/length(x))
ユーザ システム 経過
0.00 0.02 0.01
> system.time(mse(-
1.5,1,x,y))
ユーザ システム 経過
2.57 0.00 2.57
Vectorizeの件:
関数の件:
> X = iris[,1]
> Y = iris[,3]
> system.time(mse(-1.5,1,X,Y))
ユーザ システム 経過
0 0 0
> system.time(sum((Y-(q0 + X*q1))^2)/length(X))
ユーザ システム 経過
0 0 0
Vectorizeの件:
関数の件:
12. 2. Built in functions
maximum = function(mtrx){
mx = mtrx[1,1]
for(i in 1:ncol(abc)){
for(j in 1:length(abc[i,])){
if(mtrx[i,j]>mx) mx = mtrx[i,j]
}
}
return(mx)
}
maximum(abc)
> max(abc)
> system.time(maximum(abc))
ユーザ システム 経過
0.90 0.00 0.91
> system.time(max(abc))
ユーザ システム 経過
0.02 0.00 0.01
> set.seed(42)
> abc =
matrix(matrix(rnorm(10e5),nrow=10e2))
様子はー万だけ
13. 2. Built in functions
数学の手段 R言語の関数
全額 sum
平均 mean
中央値 median
分散 var
共分散 cov
相関 cor
対数 log
値の範囲 range
尺度 scale
14. ,2 ,3
1,
2,
3,
,1
58.5
59.558.056.5
157.5156.0 0
57.055.5> apply(myMatrix,1,mean)
[1] 57.0 104.5 58.0
58.0
104.5
57.0
平均
272.5268.0 118.0全額
> apply(myMatrix,2,sum)
[1] 268.0 272.5 118.0
3. apply, lapply, sapply...
Apply – apply function over array margins
> apply(myMatrix,c(1,2),sqrt)
[,1] [,2] [,3]
[1,] 7.449832 7.549834 7.648529
[2,] 12.489996 12.549900 0.000000
[3,] 7.516648 7.615773 7.713624
> sqrt(myMatrix)
15. 3. apply, lapply, sapply...
> set.seed(42)
> abc =
matrix(matrix(rnorm(10e5),nrow=10e2))
my.mean = function(mat){
res.vec = vector()
for (i in 1:ncol(mat)){
my.sum = 0
for (j in 1:length(mat[,i])){
my.sum = my.sum + mat[j,i]
}
my.col.mean = my.sum / length(mat[,i])
res.vec = append(res.vec,my.col.mean)
}
return (res.vec)
}
> apply(abc,1,mean)
> system.time(apply(abc,2,mean))
ユーザ システム 経過
0.01 0.01 0.03
> system.time(my.mean(abc))
ユーザ システム 経過
0.92 0.00 0.92
16. 3. apply, lapply, sapply...
● lapply – apply function over list or vector
● sapply – user-friendly version of lapply
● vapply – similar but return specified values
● rapply – recursive apply
● tapply – apply function to a ragged array
● mapply – apply a function to a multiple list
input
???
output
???
17. 4. plyrのパッケージ
AP PLY + R
パッケージの名前について:
data frame data frame
list array
Input Output 関数名前
ddply
laply
array none a_ply
18. 4. plyrのパッケージ
GET https://api.twitter.com/1.1/statuses/home_timeline.json
[
{
"coordinates": null,
"truncated": false,
"favorited": false,
"created_at": "Mon Jun 27 19:32:19 +0000 2011",
"id_str": "85430275915526144",
"user": {
"profile_sidebar_border_color": "0094C2",
"profile_background_tile": false,
"profile_sidebar_fill_color": "a9d9f1",
"name": "Twitter API",
},
…
...
}
]
20. nestedParser = function(element,data){
resultMat =
matrix(nrow=length(data),ncol=length(element))
colnames(resultMat) = element
for (i in 1:length(element)){
levels = strsplit(element[i],"$",fixed=TRUE)
for(k in 1:length(data)){
for(j in 1:(length(levels[[1]]))){
if(j == 1){
temp = data[[k]][levels[[1]][j]]
}else{
temp = temp[[1]][levels[[1]][j]]
}
}
resultMat[k,i] = as.character(temp)
}
}
return(resultMat)
}
4. plyrのパッケージ
> nestedParser(c("user$name","text","lang","created_at","id"),tw.data$statuses)
21. 4. plyrのパッケージ
tw.data = laply(tw.data, .fun = function(x){x[c("text","id",...)]})
tw.data = laply(tw.data, function(x) laply(x, identity))
ひとつずつの列を取って:
リストの帰納を避ける: