関東第2回r勉強会

第三回Ｒ勉強会
Ｒ言語のループ処理について

R言語でほかのプログラミング言語と同じ：
●
for文
●
while文
●
repeat文
実行を振り返すためにループがある。。。

for
> for (i in 1:10) cat(pnorm(i)," ")
0.8413447 0.9772499 0.9986501 0.9999683 0.9999997 1 1 1 1 1
for (name in expr_1) expr_2
> for (i in 1:ncol(faithful)){
>
print(c(min(faithful[,i]),max(faithful[,i]),mean(faithful[,i]),median(faithf
ul[,i])))
> }
> xyz = list(42,c(1,2,3),matrix(c(1:4),2,2))
> for (i in 1:length(xyz)) xyz[[i]] = xyz[[i]]+1
[1] 43
[1] 2 3 4
[,1] [,2]
[1,] 2 4
[2,] 3 5

while
> while (i < 5){
print(dnorm(i))
i = i+1
}
[1] 0.3989423
[1] 0.2419707
[1] 0.05399097
[1] 0.004431848
[1] 0.0001338302
while (condition) expr

> x = 2
> repeat{
> print(x)
> x = x^2
> }
...
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1]
...
repeat expr
repeat
x = 2
repeat{
print(x)
x = x^2
if(x == Inf ) break
}
[1] 2
[1] 4
[1] 16
[1] 256
[1] 65536
[1] 4294967296
[1] 1.844674e+19
[1] 3.402824e+38
[1] 1.157921e+77
[1] 1.340781e+154
Break 以外には　next　(AKA ”continue”)　もあります

Warning: for() loops are used in R code much less often tha
n in
compiled languages. Code that takes a ‘whole object’ view is l
ikely
to be both clearer and faster in R
An introduction to R
(http://cran.r-project.org)
けど。。。

1. Vectorization
> A = matrix(1:4,nrow=2,ncol=2)
> A
[,1] [,2]
[1,] 1 3
[2,] 2 4
> B = matrix(2,nrow=2,ncol=2)
> B
[,1] [,2]
[1,] 2 2
[2,] 2 2
> A+B
[,1] [,2]
[1,] 3 5
[2,] 4 6
> A*B
[,1] [,2]
[1,] 2 6
[2,] 4 8
> A %*% B
[,1] [,2]
[1,] 8 8
[2,] 12 12
配列掛け算：
> A = 1:4
> A
[1] 1 2 3 4
> B = c(2,2,2,2)
> B
[1] 2 2 2 2
> A+B
[1] 3 4 5 6
> A*B
[1] 2 4 6 8
> A^2
[1] 1 4 9 16
R言語の関数はVectorizationということ使って
ベクターのようなオブジェクト一緒に扱って処理します

1. Vectorization – MSE
mse = function(Q0,Q1,X,Y){
sum = 0
for(i in 1:length(X)){
temp_sum = (Y[i] -
(Q0+X[i]*Q1))^2
sum = sum + temp_sum
}
return(sum / length(X))
}
mse = 1/n*sum(((q0 + q1*X)-Y)^2)
MSE=
1
n
∑
i=1
n
( ̂Yi−Yi)
2
̂Yi=Q0+ Q1×Xi
MSE=
1
n
∑
i=1
n
((Q0+ Q1×X i)−Yi)2
MSE=
1
n
∑
i=1
n
((Q0+ Q1×X i)−Yi)
2

> y = sort(rnorm(10e5,mean=20,sd=10),decreasing=TRUE)
> x = seq(y[1],y[length(y)],length.out=10e5)
> system.time(sum((y-(q0 +
x*q1))^2)/length(x))
ユーザシステム経過
0.00 0.02 0.01
> system.time(mse(-
1.5,1,x,y))
2.57 0.00 2.57
Vectorizeの件：
関数の件：
>　X = iris[,1]
>　Y = iris[,3]
> system.time(mse(-1.5,1,X,Y))
0 0 0
> system.time(sum((Y-(q0 + X*q1))^2)/length(X))
0 0 0
Vectorizeの件：
関数の件：

2. Built in functions
maximum = function(mtrx){
mx = mtrx[1,1]
for(i in 1:ncol(abc)){
for(j in 1:length(abc[i,])){
if(mtrx[i,j]>mx) mx = mtrx[i,j]
}
}
return(mx)
}
maximum(abc)
> max(abc)
> system.time(maximum(abc))
0.90 0.00 0.91
> system.time(max(abc))
0.02 0.00 0.01
> set.seed(42)
> abc =
matrix(matrix(rnorm(10e5),nrow=10e2))
様子はー万だけ

2. Built in functions
数学の手段 R言語の関数
全額 sum
平均 mean
中央値 median
分散 var
共分散 cov
相関 cor
対数 log
値の範囲 range
尺度 scale

,2 ,3
1,
2,
3,
,１
58.5
59.558.056.5
157.5156.0 0
57.055.5> apply(myMatrix,1,mean)
[1] 57.0 104.5 58.0
58.0
104.5
57.0
平均
272.5268.0 118.0全額
> apply(myMatrix,2,sum)
[1] 268.0 272.5 118.0
3. apply, lapply, sapply...
Apply – apply function over array margins
> apply(myMatrix,c(1,2),sqrt)
[,1] [,2] [,3]
[1,] 7.449832 7.549834 7.648529
[2,] 12.489996 12.549900 0.000000
[3,] 7.516648 7.615773 7.713624
> sqrt(myMatrix)

> set.seed(42)
> abc =
matrix(matrix(rnorm(10e5),nrow=10e2))
my.mean = function(mat){
res.vec = vector()
for (i in 1:ncol(mat)){
my.sum = 0
for (j in 1:length(mat[,i])){
my.sum = my.sum + mat[j,i]
}
my.col.mean = my.sum / length(mat[,i])
res.vec = append(res.vec,my.col.mean)
}
return (res.vec)
}
> apply(abc,1,mean)
> system.time(apply(abc,2,mean))
0.01 0.01 0.03
> system.time(my.mean(abc))
0.92 0.00 0.92

● lapply – apply function over list or vector
● sapply – user-friendly version of lapply
● vapply – similar but return specified values
● rapply – recursive apply
● tapply – apply function to a ragged array
● mapply – apply a function to a multiple list
input
???
output
???

4. plyrのパッケージ
AP PLY + R
パッケージの名前について：
data frame data frame
list array
Input Output 関数名前
ddply
laply
array none a_ply

GET https://api.twitter.com/1.1/statuses/home_timeline.json
[
{
"coordinates": null,
"truncated": false,
"favorited": false,
"created_at": "Mon Jun 27 19:32:19 +0000 2011",
"id_str": "85430275915526144",
"user": {
"profile_sidebar_border_color": "0094C2",
"profile_background_tile": false,
"profile_sidebar_fill_color": "a9d9f1",
"name": "Twitter API",
},
…
...
}
]

in_reply_to_
status_id_s
in_reply_to_
user_id
... user.id user.name ….
$statuses[[99]]$in_reply_to_status_id_str
NULL
$statuses[[99]]$in_reply_to_user_id
NULL
$statuses[[99]]$in_reply_to_user_id_str
NULL
$statuses[[99]]$user
$statuses[[99]]$user$id
[1] 375912401
$statuses[[99]]$user$name
[1] "Super Villain "

nestedParser = function(element,data){
resultMat =
matrix(nrow=length(data),ncol=length(element))
colnames(resultMat) = element
for (i in 1:length(element)){
levels = strsplit(element[i],"$",fixed=TRUE)
for(k in 1:length(data)){
for(j in 1:(length(levels[[1]]))){
if(j == 1){
temp = data[[k]][levels[[1]][j]]
}else{
temp = temp[[1]][levels[[1]][j]]
}
}
resultMat[k,i] = as.character(temp)
}
}
return(resultMat)
}
> nestedParser(c("user$name","text","lang","created_at","id"),tw.data$statuses)

tw.data = laply(tw.data, .fun = function(x){x[c("text","id",...)]})
tw.data = laply(tw.data, function(x) laply(x, identity))
ひとつずつの列を取って：
リストの帰納を避ける：

●
R言語でループ（for, while, repeat）があるけ
どできる限り使わないほうがいいです
●
Vectorizeを使って関数と方法はループより早
くて読みやすいです
● ループはバッグっぽくてデバッグやりにくい
●
R言語のBuild-inの関数やApplyやplyrパッケー
ジなど使いましょう！
5. まとめ

関東第2回r勉強会

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to 関東第2回r勉強会

Similar to 関東第2回r勉強会 (20)

関東第2回r勉強会