第三回R勉強会

第三回Ｒ勉強会
Ｒ言語のループ処理について

R言語でほかのプログラミング言語と同じ：
●
for文
●
while文
●
repeat文
実行を振り返すためにループがある。。。

for
> for (i in 1:10) cat(pnorm(i)," ")
0.8413447 0.9772499 0.9986501 0.9999683 0.9999997 1 1 1 1 1
for (name in expr_1) expr_2
> for (i in 1:ncol(faithful)){
>
print(c(min(faithful[,i]),max(faithful[,i]),mean(faithful[,i]),median(faithf
ul[,i])))
> }
> xyz = list(42,c(1,2,3),matrix(c(1:4),2,2))
> for (i in 1:length(xyz)) xyz[[i]] = xyz[[i]]+1
[1] 43
[1] 2 3 4
[,1] [,2]
[1,] 2 4
[2,] 3 5

while
> while (i < 5){
print(dnorm(i))
i = i+1
}
[1] 0.3989423
[1] 0.2419707
[1] 0.05399097
[1] 0.004431848
[1] 0.0001338302
while (condition) expr

> x = 2
> repeat{
> print(x)
> x = x^2
> }
...
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1] Inf
[1]
...
repeat expr
repeat
x = 2
repeat{
print(x)
x = x^2
if(x == Inf ) break
}
[1] 2
[1] 4
[1] 16
[1] 256
[1] 65536
[1] 4294967296
[1] 1.844674e+19
[1] 3.402824e+38
[1] 1.157921e+77
[1] 1.340781e+154
Break 以外には　next　(AKA ”continue”)　もあります

Warning: for() loops are used in R code much less often tha
n in
compiled languages. Code that takes a ‘whole object’ view is l
ikely
to be both clearer and faster in R
An introduction to R
(http://cran.r-project.org)
けど。。。

1. Built in functions
maximum = function(mtrx){
mx = mtrx[1,1]
for(i in 1:ncol(abc)){
for(j in 1:length(abc[i,])){
if(mtrx[i,j]>mx) mx = mtrx[i,j]
}
}
return(mx)
}
maximum(abc)
> max(abc)
> system.time(maximum(abc))
ユーザシステム経過
0.90 0.00 0.91
> system.time(max(abc))
0.02 0.00 0.01
> set.seed(42)
> abc =
matrix(matrix(rnorm(10e5),nrow=10e2))
様子はー万だけ

1. Built in functions
数学の手段 R言語の関数
全額 sum
平均 mean
中央値 median
分散 var
共分散 cov
相関 cor
対数 log
値の範囲 range
尺度 scale
R言語の関数はVectorizationということ使って
ベクターのようなオブジェクト一緒に扱って処理します

,2 ,3
1,
2,
3,
,１
58.5
59.558.056.5
157.5156.0 0
57.055.5> apply(myMatrix,1,mean)
[1] 57.0 104.5 58.0
58.0
104.5
57.0
平均
272.5268.0 118.0全額
> apply(myMatrix,2,sum)
[1] 268.0 272.5 118.0
2. apply, lapply, sapply...
Apply – apply function over array margins
> apply(myMatrix,c(1,2),sqrt)
[,1] [,2] [,3]
[1,] 7.449832 7.549834 7.648529
[2,] 12.489996 12.549900 0.000000
[3,] 7.516648 7.615773 7.713624
> sqrt(myMatrix)

> set.seed(42)
> abc =
matrix(matrix(rnorm(10e5),nrow=10e2))
my.mean = function(mat){
res.vec = vector()
for (i in 1:ncol(mat)){
my.sum = 0
for (j in 1:length(mat[,i])){
my.sum = my.sum + mat[j,i]
}
my.col.mean = my.sum / length(mat[,i])
res.vec = append(res.vec,my.col.mean)
}
return (res.vec)
}
> apply(abc,1,mean)
> system.time(apply(abc,2,mean))
0.01 0.01 0.03
> system.time(my.mean(abc))
0.92 0.00 0.92

● lapply – apply function over list or vector
● sapply – user-friendly version of lapply
● vapply – similar but return specified values
● rapply – recursive apply
● tapply – apply function to a ragged array
● mapply – apply a function to a multiple list
input
???
output
???

3. plyrのパッケージ
AP PLY + R
パッケージの名前について：
data frame data frame
list array
Input Output 関数名前
ddply
laply
array none a_ply

GET https://api.twitter.com/1.1/statuses/home_timeline.json
[
{
"coordinates": null,
"truncated": false,
"favorited": false,
"created_at": "Mon Jun 27 19:32:19 +0000 2011",
"id_str": "85430275915526144",
"user": {
"profile_sidebar_border_color": "0094C2",
"profile_background_tile": false,
"profile_sidebar_fill_color": "a9d9f1",
"name": "Twitter API",
},
…
...
}
]

in_reply_to_
status_id_s
in_reply_to_
user_id
... user.id user.name ….
$statuses[[99]]$in_reply_to_status_id_str
NULL
$statuses[[99]]$in_reply_to_user_id
NULL
$statuses[[99]]$in_reply_to_user_id_str
NULL
$statuses[[99]]$user
$statuses[[99]]$user$id
[1] 375912401
$statuses[[99]]$user$name
[1] "Super Villain "

nestedParser = function(element,data){
resultMat =
matrix(nrow=length(data),ncol=length(element))
colnames(resultMat) = element
for (i in 1:length(element)){
levels = strsplit(element[i],"$",fixed=TRUE)
for(k in 1:length(data)){
for(j in 1:(length(levels[[1]]))){
if(j == 1){
temp = data[[k]][levels[[1]][j]]
}else{
temp = temp[[1]][levels[[1]][j]]
}
}
resultMat[k,i] = as.character(temp)
}
}
return(resultMat)
}
> nestedParser(c("user$name","text","lang","created_at","id"),tw.data$statuses)

tw.data = laply(tw.data, .fun = function(x){x[c("text","id",...)]})
tw.data = laply(tw.data, function(x) laply(x, identity))
ひとつずつの列を取って：
リストの帰納を避ける：

●
R言語でループ（for, while, repeat）があるけ
どできる限り使わないほうがいいです
●
Vectorizeを使って関数と方法はループより早
くて読みやすいです
● ループはバッグっぽくてデバッグやりにくい
●
R言語のBuild-inの関数やApplyやplyrパッケー
ジなど使いましょう！
４. まとめ

第三回R勉強会

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Ähnlich wie 第三回R勉強会

Ähnlich wie 第三回R勉強会 (20)

Mehr von Paweł Rusin

Mehr von Paweł Rusin (6)

第三回R勉強会