111015 tokyo scipy2_discussionquestionaire_i_python
1. Q3: Sum with NaN and Inf
nanやInfを含む値の列x = numpy.array([[1,0,nan,1,Inf,1,....]])が与えられたとき、NaNやInf以外のx の要素の合計を計算す
る方法が直ぐに思い浮かびますか?
In [1]: # 準備
i m p o r t numpy
i m p o r t numpy a s np
x = numpy.array([[1,0,nan,1,Inf,1]])
x
Out[1]: array([[ 1., 0., nan, 1., inf, 1.]])
In [2]: # 回答者1
x[isnan(x)]=0
x[isinf(x)]=0
sum(x)
Out[2]: 3.0
In [3]: % timeit x[isnan(x)]=0
% timeit x[isinf(x)]=0
% timeit sum(x)
100000 loops, best of 3: 4.47 us per loop
100000 loops, best of 3: 4.31 us per loop
100000 loops, best of 3: 3.46 us per loop
In [4]: # 回答者2
# どういうときに Inf を除きたいのかわからないけど
sum_with_finite = x[numpy.isfinite(x)].sum()
sum_with_finite
# NaN を除くだけなら sum_without_nan = numpy.nansum(x)
Out[4]: 3.0
In [5]: % timeit x[numpy.isfinite(x)].sum()
100000 loops, best of 3: 7.47 us per loop
In [6]: # 回答者4
sum(filter(l a m b d a x: x! = float('inf') a n d x= = x, x[0]))
l ! =
Out[6]: 3.0
In [7]: % timeit sum(filter(l a m b d a x: x! = float('inf') a n d x= = x, x[0]))
l ! =
10000 loops, best of 3: 39 us per loop
In [8]: # 回答者5
np.nansum(x[x! = np.inf])
!
Out[8]: 3.0
2. Out[8]: 3.0
In [9]: % timeit np.nansum(x[x! = np.inf])
!
10000 loops, best of 3: 29.3 us per loop
In [10]: # 回答者6
x[numpy.isfinite(x)].sum()
Out[10]: 3.0
In [11]: % timeit x[numpy.isfinite(x)].sum()
100000 loops, best of 3: 8.33 us per loop
In [12]: # 回答者8
numpy.sum(x[numpy.isfinite(x)])
Out[12]: 3.0
In [13]: % timeit numpy.sum(x[numpy.isfinite(x)])
100000 loops, best of 3: 10.2 us per loop
In [14]: # 回答者9
np.sum(x[np.isfinite(x)])
Out[14]: 3.0
In [15]: % timeit np.sum(x[np.isfinite(x)])
100000 loops, best of 3: 10.1 us per loop
Q4: Missing values in ndarray
nanを含む4x2行列m = numpy.array([[1,nan,-1,0],[0,0,nan,1]])が与えられたとき、nanを含む行を削除して2x2行列にする
方法が直ぐに思い浮かびますか?
In [16]: # 準備
i m p o r t numpy
i m p o r t numpy a s np
m = numpy.array([[1,nan,- 1,0],[0,0,nan,1]])
-
m
Out[16]: array([[ 1., nan, -1., 0.],
[ 0., 0., nan, 1.]])
In [17]: # 回答者1
delete(m,argmax(isnan(m),axis=1),axis=1)
Out[17]: array([[ 1., 0.],
[ 0., 1.]])
In [18]: % timeit delete(m,argmax(isnan(m),axis=1),axis=1)
3. In [18]: % timeit delete(m,argmax(isnan(m),axis=1),axis=1)
10000 loops, best of 3: 110 us per loop
In [19]: # 回答者2
selected_m = m[:,numpy.isfinite(m.sum(axis=0))]
selected_m
Out[19]: array([[ 1., 0.],
[ 0., 1.]])
In [20]: % timeit m[:,numpy.isfinite(m.sum(axis=0))]
100000 loops, best of 3: 11.2 us per loop
In [21]: # 回答者4
index = [xx f o r xx i n range(len(m[0])) i f sum(m[:,xx])= = sum(m[:,xx])]
=
m[:,index]
Out[21]: array([[ 1., 0.],
[ 0., 1.]])
In [22]: % timeit index = [xx f o r xx i n range(len(m[0])) i f sum(m[:,xx])= = sum(m[:,xx])]
=
% timeit m[:,index]
10000 loops, best of 3: 38.9 us per loop
100000 loops, best of 3: 12.8 us per loop
In [23]: # 回答者6
nans = logical_or(isnan(m[0]), isnan(m[1]))
mask = tile(logical_not(nans), (2,1))
res = m[mask].reshape(2,2)
res
Out[23]: array([[ 1., 0.],
[ 0., 1.]])
In [24]: % timeit nans = logical_or(isnan(m[0]), isnan(m[1]))
% timeit mask = tile(logical_not(nans), (2,1))
% timeit res = m[mask].reshape(2,2)
100000 loops, best of 3: 5.46 us per loop
100000 loops, best of 3: 13.2 us per loop
100000 loops, best of 3: 4.46 us per loop
In [25]: # 回答者8
m[:,numpy.apply_along_axis(numpy.all,0,numpy.isfinite(m))]
Out[25]: array([[ 1., 0.],
[ 0., 1.]])
In [26]: % timeit m[:,numpy.apply_along_axis(numpy.all,0,numpy.isfinite(m))]
1000 loops, best of 3: 160 us per loop
In [27]: # 回答者9
m[:, np.isfinite(np.sum(m, axis=0))]
Out[27]: array([[ 1., 0.],
4. Out[27]: array([[ 1., 0.],
[ 0., 1.]])
In [28]: % timeit m[:, np.isfinite(np.sum(m, axis=0))]
100000 loops, best of 3: 12.8 us per loop
Q5: 1-of-K representation
numpy.array([[1,3,2]])を、1-of-K表記法変換してnumpy.array([[1,0,0],[0,0,1],[0,1,0]])にする処理方法が直ぐに思い浮か
びますか?
In [29]: # 準備
i m p o r t numpy
i m p o r t numpy a s np
y = numpy.array([[1,3,2]])
y
Out[29]: array([[1, 3, 2]])
In [30]: #回答者1
t = numpy.array([1,3,2])
# pattern 1
z = numpy.fromfunction(l a m b d a i,j:j= = t[i]- 1,(t.size,t.max()),dtype=int)+ 0
l = - +
p r i n t (z)
# pattern 2
z = numpy.array([numpy.identity(t.max())[x- 1,:] f o r x i n t])
-
p r i n t (z)
# pattern 3(numpy 1.6 以降)
z = numpy.array([numpy.bincount([x- 1],minlength=t.max()) f o r x i n t])
-
p r i n t (z)
[[1 0 0]
[0 0 1]
[0 1 0]]
[[ 1. 0. 0.]
[ 0. 0. 1.]
[ 0. 1. 0.]]
[[1 0 0]
[0 0 1]
[0 1 0]]
In [31]: % timeit z = numpy.fromfunction(l a m b d a i,j:j= = t[i]- 1,(t.size,t.max()),dtype=int)+ 0
l = - +
% timeit z = numpy.array([numpy.identity(t.max())[x- 1,:] f o r x i n t])
-
% timeit z = numpy.array([numpy.bincount([x- 1],minlength=t.max()) f o r x i n t])
-
10000 loops, best of 3: 61.7 us per loop
10000 loops, best of 3: 55.3 us per loop
10000 loops, best of 3: 65.6 us per loop
In [32]: #回答者2
N=y.shape[1]
yy=zeros(N* * 2)
*
yy[N* arange(N)+ y- 1]=1 #編集者注:yy[N*arange(N)+y-1].reshape(N,N)では動かず
* + -
yy.reshape(N,N)
5. Out[32]: array([[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.]])
In [33]: % timeit N=y.shape[1]
% timeit yy=zeros(N* * 2)
*
% timeit yy[N* arange(N)+ y- 1]=1 #編集者注:yy[N*arange(N)+y-1].reshape(N,N)では動かず
* + -
% timeit yy.reshape(N,N)
10000000 loops, best of 3: 203 ns per loop
1000000 loops, best of 3: 1.37 us per loop
10000 loops, best of 3: 14.6 us per loop
1000000 loops, best of 3: 846 ns per loop
In [34]: #回答者4
K=3
d e f my_func(i):
z = numpy.zeros(K,dtype=int)
z[i- 1] = 1
-
return z
numpy.array(map(my_func,y[0]))
Out[34]: array([[1, 0, 0],
[0, 0, 1],
[0, 1, 0]])
In [35]: % timeit numpy.array(map(my_func,y[0]))
10000 loops, best of 3: 29.8 us per loop
In [36]: #回答者6
res = zeros((3, 3))
indices = [i* 3+ c- 1 f o r i, c i n enumerate(y[0])]
* + -
res.put(indices, 1)
res
Out[36]: array([[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.]])
In [37]: % timeit res = zeros((3, 3))
% timeit indices = [i* 3+ c- 1 f o r i, c i n enumerate(y[0])]
* + -
% timeit res.put(indices, 1)
1000000 loops, best of 3: 814 ns per loop
100000 loops, best of 3: 10.2 us per loop
100000 loops, best of 3: 10.3 us per loop
In [38]: #回答者8
numpy.fromfunction(l a m b d a i, j: numpy.array(y[0][i]= = j+ 1, dtype=int), (3, 3), dtype
l = +
Out[38]: array([[1, 0, 0],
[0, 0, 1],
[0, 1, 0]])
6. In [39]: % timeit numpy.fromfunction(l a m b d a i, j: numpy.array(y[0][i]= = j+ 1, dtype=int), (3, 3
l = +
10000 loops, best of 3: 44.3 us per loop
In [40]: #回答者9
#これは逆の方が問題だな…
ans = np.zeros((3, 3))
ans[np.arange(3, dtype=np.int), y- 1] = 1 #編集者注:0-origin対応でy-1とした
-
ans
Out[40]: array([[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.]])
In [41]: % timeit ans = np.zeros((3, 3))
% timeit ans[np.arange(3, dtype=np.int), y- 1] = 1
-
1000000 loops, best of 3: 761 ns per loop
100000 loops, best of 3: 7.97 us per loop
Q6: Useful snippets
In [45]: d e f _main():
pass
i f __name__= = _main():
=
_main()