SlideShare ist ein Scribd-Unternehmen logo
1 von 79
Downloaden Sie, um offline zu lesen
Introduction into
Python for Scientific Computing
Jo˜ao Machado • Ricardo Cruz
Introduction
a d g
b e h
c f i
×
0 0 1
0 1 0
1 0 0
=
? ? ?
? ? ?
? ? ?
What is the result of this operation?
Introduction
a d g
b e h
c f i
×
0 0 1
0 1 0
1 0 0
=
? ? ?
? ? ?
? ? ?
What is the result of this operation?
a d g
b e h
c f i
×
0 0 1
0 1 0
1 0 0
=
g d a
h e b
i f c
Introduction
a d g
b e h
c f i
×
0 0 1
0 1 0
1 0 0
=
? ? ?
? ? ?
? ? ?
What is the result of this operation?
a d g
b e h
c f i
×
0 0 1
0 1 0
1 0 0
=
g d a
h e b
i f c
1 from numpy import *
2 cout = p r i n t
3
4 A = random.random ((3, 3));
5 B = fliplr(eye (3));
6 C = dot(A, B);
7 cout(C);
What programming language is this?
Introduction
a d g
b e h
c f i
×
0 0 1
0 1 0
1 0 0
=
? ? ?
? ? ?
? ? ?
What is the result of this operation?
a d g
b e h
c f i
×
0 0 1
0 1 0
1 0 0
=
g d a
h e b
i f c
1 from numpy import *
2 cout = p r i n t
3
4 A = random.random ((3, 3));
5 B = fliplr(eye (3));
6 C = dot(A, B);
7 cout(C);
It’s Python!
1 from numpy import *
2 cout = p r i n t
3
4 A = random.random ((3, 3));
5 B = fliplr(eye (3));
6 C = dot(A, B);
7 cout(C);
What programming language is this?
Introduction
1 #i n c l u d e <armadillo >
2 using namespace arma;
3 using namespace std;
4
5 mat A(3 ,3), B(3 ,3);
6 A.randu ();
7 B = fliplr(B.eye ());
8 M3 = M1 * M2;
9 cout << M3 << endl;
What about this programming language?
a d g
b e h
c f i
×
0 0 1
0 1 0
1 0 0
=
g d a
h e b
i f c
1 from numpy import *
2 cout = p r i n t
3
4 A = random.random ((3, 3));
5 B = fliplr(eye (3));
6 C = dot(A, B);
7 cout(C);
It’s Python!
1 from numpy import *
2 cout = p r i n t
3
4 A = random.random ((3, 3));
5 B = fliplr(eye (3));
6 C = dot(A, B);
7 cout(C);
What programming language is this?
Introduction
1 #i n c l u d e <armadillo >
2 using namespace arma;
3 using namespace std;
4
5 mat A(3 ,3), B(3 ,3);
6 A.randu ();
7 B = fliplr(B.eye ());
8 M3 = M1 * M2;
9 cout << M3 << endl;
What about this programming language?
Why use Python?
More important than the programming
language is the ecosystem – and Python
has a great scientific community
Python has good interoperability with
other systems
The entire stack can be developed in
Python: machine learning, flask, etc
Computations do not run in Python; the
slow stuff is implemented in Fortran and
C
1 from numpy import *
2 cout = p r i n t
3
4 A = random.random ((3, 3));
5 B = fliplr(eye (3));
6 C = dot(A, B);
7 cout(C);
It’s Python!
1 from numpy import *
2 cout = p r i n t
3
4 A = random.random ((3, 3));
5 B = fliplr(eye (3));
6 C = dot(A, B);
7 cout(C);
What programming language is this?
Python
matplotlib
numpy
sklearn
pandas
R
ggplot2
rpart
foreign
dplyr
survival
ggmaps
zooMATLAB
Statistics
Toolbox
Biostatistics
Toolbox
Neural Network
Toolbox
Why Python?
Good data mining ecosystem.
Not as centralized/monopolistic as
Matlab’s
Not as decentralized and messy as R :P
Why Python?
Source: http://www.kdnuggets.com/2017/01/most-popular-language-machine-learning-data-science.html
Some notes on Numpy
Numpy Notes
Let A and B be matrices,
Python/Numpy MATLAB R
A.dot(B) A * B A %*% B
A * B A .* B A * B
Operations are elementwise by default (like
R)
Numpy Notes
Let A and B be matrices,
Python/Numpy MATLAB R
A.dot(B) A * B A %*% B
A * B A .* B A * B
Operations are elementwise by default (like
R)
Python/Numpy MATLAB R
A.shape size(A) length, nrow, ncol
A[0:4,:] or
A[0:4] or A[:4] A(1:4,:) A[1:4,]
A[0:10:2] A[seq(0, 9, 2)]
A[-4:] A(end-4:end,:) A[nrow(A)-4:nrow(A),]
A.T A.’ t(A)
Numpy in general allows for more succinct
writing.
Furthermore:
Indexing starts at zero.
Intervals are of the form [i, j[
Numpy Notes
Let A and B be matrices,
Python/Numpy MATLAB R
A.dot(B) A * B A %*% B
A * B A .* B A * B
Operations are elementwise by default (like
R)
Python/Numpy MATLAB R
A.shape size(A) length, nrow, ncol
A[0:4,:] or
A[0:4] or A[:4] A(1:4,:) A[1:4,]
A[0:10:2] A[seq(0, 9, 2)]
A[-4:] A(end-4:end,:) A[nrow(A)-4:nrow(A),]
A.T A.’ t(A)
Numpy in general allows for more succinct
writing.
Furthermore:
Indexing starts at zero.
Intervals are of the form [i, j[
This is further aided by the fact that Numpy
supports arithmetic broadcasting. (unlike
MATLAB or R.)
That is, you can do the following element-
wise multiplication: (6,3) * (6,1). It auto-
matically assumes you want to multiply by
column. In MATLAB, you would have to use
bsxfun(@times,r,A) or first use repmat().
Numpy Notes
Let A and B be matrices,
Python/Numpy MATLAB R
A.dot(B) A * B A %*% B
A * B A .* B A * B
Operations are elementwise by default (like
R)
Python/Numpy MATLAB R
A.shape size(A) length, nrow, ncol
A[0:4,:] or
A[0:4] or A[:4] A(1:4,:) A[1:4,]
A[0:10:2] A[seq(0, 9, 2)]
A[-4:] A(end-4:end,:) A[nrow(A)-4:nrow(A),]
A.T A.’ t(A)
Numpy in general allows for more succinct
writing.
Furthermore:
Indexing starts at zero.
Intervals are of the form [i, j[
Something like the following is valid in
Numpy...
1 import skimage.data
2 img1 = skimage.data.astronaut ()
3 img2 = skimage.data.moon ()
4 p r i n t (img1.shape) # (512 , 512 , 3)
5 p r i n t (img2.shape) # (512 , 512)
6
7 import matplotlib.pyplot as plt
8 plt.subplot (1, 2, 1)
9 plt.imshow(img1)
10 plt.subplot (1, 2, 2)
11 plt.imshow(img2 , cmap=’gray ’)
12 plt.show ()
This is further aided by the fact that Numpy
supports arithmetic broadcasting. (unlike
MATLAB or R.)
That is, you can do the following element-
wise multiplication: (6,3) * (6,1). It auto-
matically assumes you want to multiply by
column. In MATLAB, you would have to use
bsxfun(@times,r,A) or first use repmat().
Numpy Notes
Python/Numpy MATLAB R
A.shape size(A) length, nrow, ncol
A[0:4,:] or
A[0:4] or A[:4] A(1:4,:) A[1:4,]
A[0:10:2] A[seq(0, 9, 2)]
A[-4:] A(end-4:end,:) A[nrow(A)-4:nrow(A),]
A.T A.’ t(A)
Numpy in general allows for more succinct
writing.
Furthermore:
Indexing starts at zero.
Intervals are of the form [i, j[
Something like the following is valid in
Numpy...
1 import skimage.data
2 img1 = skimage.data.astronaut ()
3 img2 = skimage.data.moon ()
4 p r i n t (img1.shape) # (512 , 512 , 3)
5 p r i n t (img2.shape) # (512 , 512)
6
7 import matplotlib.pyplot as plt
8 plt.subplot (1, 2, 1)
9 plt.imshow(img1)
10 plt.subplot (1, 2, 2)
11 plt.imshow(img2 , cmap=’gray ’)
12 plt.show ()
This is further aided by the fact that Numpy
supports arithmetic broadcasting. (unlike
MATLAB or R.)
That is, you can do the following element-
wise multiplication: (6,3) * (6,1). It auto-
matically assumes you want to multiply by
column. In MATLAB, you would have to use
bsxfun(@times,r,A) or first use repmat().
Numpy Notes
Arithmetic mean
1 img2 = img2[:, :, np.newaxis] #(512 ,512 ,1)
2 img1 = img1.astype(np.uint32)
3 img2 = img2.astype(np.uint32)
4 img3 = (img1 + img2)//2
5 img3 = img3.astype(np.uint8)
6 plt.imshow(img3)
7 plt.show ()
Something like the following is valid in
Numpy...
1 import skimage.data
2 img1 = skimage.data.astronaut ()
3 img2 = skimage.data.moon ()
4 p r i n t (img1.shape) # (512 , 512 , 3)
5 p r i n t (img2.shape) # (512 , 512)
6
7 import matplotlib.pyplot as plt
8 plt.subplot (1, 2, 1)
9 plt.imshow(img1)
10 plt.subplot (1, 2, 2)
11 plt.imshow(img2 , cmap=’gray ’)
12 plt.show ()
This is further aided by the fact that Numpy
supports arithmetic broadcasting. (unlike
MATLAB or R.)
That is, you can do the following element-
wise multiplication: (6,3) * (6,1). It auto-
matically assumes you want to multiply by
column. In MATLAB, you would have to use
bsxfun(@times,r,A) or first use repmat().
Numpy Notes
Arithmetic mean
1 img2 = img2[:, :, np.newaxis] #(512 ,512 ,1)
2 img1 = img1.astype(np.uint32)
3 img2 = img2.astype(np.uint32)
4 img3 = (img1 + img2)//2
5 img3 = img3.astype(np.uint8)
6 plt.imshow(img3)
7 plt.show ()
Something like the following is valid in
Numpy...
1 import skimage.data
2 img1 = skimage.data.astronaut ()
3 img2 = skimage.data.moon ()
4 p r i n t (img1.shape) # (512 , 512 , 3)
5 p r i n t (img2.shape) # (512 , 512)
6
7 import matplotlib.pyplot as plt
8 plt.subplot (1, 2, 1)
9 plt.imshow(img1)
10 plt.subplot (1, 2, 2)
11 plt.imshow(img2 , cmap=’gray ’)
12 plt.show ()
Numpy Notes
Arithmetic mean
1 img2 = img2[:, :, np.newaxis] #(512 ,512 ,1)
2 img1 = img1.astype(np.uint32)
3 img2 = img2.astype(np.uint32)
4 img3 = (img1 + img2)//2
5 img3 = img3.astype(np.uint8)
6 plt.imshow(img3)
7 plt.show ()
Geometric mean
1 img2 = img2[:, :, np.newaxis]
2 img1 = img1.astype(np.uint32)
3 img2 = img2.astype(np.uint32)
4 img3 = np.sqrt(img1 * img2)
5 img3 = img3.astype(np.uint8)
6 plt.imshow(img3)
7 plt.show ()
Numpy Notes
Arithmetic mean
1 img2 = img2[:, :, np.newaxis] #(512 ,512 ,1)
2 img1 = img1.astype(np.uint32)
3 img2 = img2.astype(np.uint32)
4 img3 = (img1 + img2)//2
5 img3 = img3.astype(np.uint8)
6 plt.imshow(img3)
7 plt.show ()
Geometric mean
1 img2 = img2[:, :, np.newaxis]
2 img1 = img1.astype(np.uint32)
3 img2 = img2.astype(np.uint32)
4 img3 = np.sqrt(img1 * img2)
5 img3 = img3.astype(np.uint8)
6 plt.imshow(img3)
7 plt.show ()
Pandas and Data Visualization –
Python for Scientific Computing
Jo˜ao Machado • Ricardo Cruz
Pandas
What is Pandas?
A package for data manipulation and
analysis, based on the concept of data
frame in the R language
Optimized for performance, with critical
code paths written in C
Originally developed by Wes McKinney,
while working for AQR Capital (a
quantitative finance firm)
Pandas
What is Pandas?
A package for data manipulation and
analysis, based on the concept of data
frame in the R language
Optimized for performance, with critical
code paths written in C
Originally developed by Wes McKinney,
while working for AQR Capital (a
quantitative finance firm)
Given the previous point, it makes sense
to demonstrate some of the
functionalities of Pandas with a dataset
comprised of financial stocks :)
Data Mining –
Python for Scientific Computing
Jo˜ao Machado • Ricardo Cruz
Models
Models
Let us produce fake data...
y(x) = 2x + 10 + ε1 + ε2
ε1 ∼ N(0, 2)
ε2 ∼
|N(0, 25)| with p = 0.1,
0 otherwise.
Models
Let us produce fake data...
y(x) = 2x + 10 + ε1 + ε2
ε1 ∼ N(0, 2)
ε2 ∼
|N(0, 25)| with p = 0.1,
0 otherwise.
Let us produce fake data...
y(x) = 2x + 10 + ε1 + bε2
ε1 ∼ N(0, 2)
b ∼ B(2, 0.1)
ε2 ∼ |N(0, 25)|
Models
Let us produce fake data...
y(x) = 2x + 10 + ε1 + ε2
ε1 ∼ N(0, 2)
ε2 ∼
|N(0, 25)| with p = 0.1,
0 otherwise.
Translation to numpy:
1 import numpy as np
2 N = 50
3 x = np.linspace (0, 25, N)
4 y = 2*x + 10
5 y += np.random.randn(N)*2
6 y += np.random.binomial (2, 0.10 , N)*np. abs
(np.random.randn(N)*25)
Let us produce fake data...
y(x) = 2x + 10 + ε1 + bε2
ε1 ∼ N(0, 2)
b ∼ B(2, 0.1)
ε2 ∼ |N(0, 25)|
Models
1 import matplotlib.pyplot as plt
2 plt.plot(x, y)
3 plt.title(’Data ’)
4 plt.show ()
Let us produce fake data...
y(x) = 2x + 10 + ε1 + ε2
ε1 ∼ N(0, 2)
ε2 ∼
|N(0, 25)| with p = 0.1,
0 otherwise.
Translation to numpy:
1 import numpy as np
2 N = 50
3 x = np.linspace (0, 25, N)
4 y = 2*x + 10
5 y += np.random.randn(N)*2
6 y += np.random.binomial (2, 0.10 , N)*np. abs
(np.random.randn(N)*25)
Let us produce fake data...
y(x) = 2x + 10 + ε1 + bε2
ε1 ∼ N(0, 2)
b ∼ B(2, 0.1)
ε2 ∼ |N(0, 25)|
Models
1 import matplotlib.pyplot as plt
2 plt.plot(x, y)
3 plt.title(’Data ’)
4 plt.show ()
What model could we create to explain this
data?
Translation to numpy:
1 import numpy as np
2 N = 50
3 x = np.linspace (0, 25, N)
4 y = 2*x + 10
5 y += np.random.randn(N)*2
6 y += np.random.binomial (2, 0.10 , N)*np. abs
(np.random.randn(N)*25)
Let us produce fake data...
y(x) = 2x + 10 + ε1 + bε2
ε1 ∼ N(0, 2)
b ∼ B(2, 0.1)
ε2 ∼ |N(0, 25)|
Models
1 import matplotlib.pyplot as plt
2 plt.plot(x, y)
3 plt.title(’Data ’)
4 plt.show ()
What model could we create to explain this
data?
Translation to numpy:
1 import numpy as np
2 N = 50
3 x = np.linspace (0, 25, N)
4 y = 2*x + 10
5 y += np.random.randn(N)*2
6 y += np.random.binomial (2, 0.10 , N)*np. abs
(np.random.randn(N)*25)
Linear Regression
Model: ˆy = β0 + β1x
Minimize: i (yi − ˆyi )2
Models
1 import matplotlib.pyplot as plt
2 plt.plot(x, y)
3 plt.title(’Data ’)
4 plt.show ()
What model could we create to explain this
data?
1 from sklearn. linear_model import
LinearRegression
2 m = LinearRegression ()
3 m.fit(x[:, np.newaxis], y)
4 yp = m.predict(x[:, np.newaxis ])
5
6 plt.plot(x, y)
7 plt.plot(x, yp)
8 plt.title(’Linear regression ’)
9 plt.text(0, 70, ’m=%.1f b=%.1f’ % (m.coef_
[0], m.intercept_))
10 plt.show ()
Linear Regression
Model: ˆy = β0 + β1x
Minimize: i (yi − ˆyi )2
Models
What model could we create to explain this
data?
1 from sklearn. linear_model import
LinearRegression
2 m = LinearRegression ()
3 m.fit(x[:, np.newaxis], y)
4 yp = m.predict(x[:, np.newaxis ])
5
6 plt.plot(x, y)
7 plt.plot(x, yp)
8 plt.title(’Linear regression ’)
9 plt.text(0, 70, ’m=%.1f b=%.1f’ % (m.coef_
[0], m.intercept_))
10 plt.show ()
Linear Regression
Model: ˆy = β0 + β1x
Minimize: i (yi − ˆyi )2
Models
y(x) = 2x + 10 + ε1 + bε2
ˆy(x) = 2x + 18
What if I want to explain only the trend?
How can I avoid the impact of these spikes?
1 from sklearn. linear_model import
LinearRegression
2 m = LinearRegression ()
3 m.fit(x[:, np.newaxis], y)
4 yp = m.predict(x[:, np.newaxis ])
5
6 plt.plot(x, y)
7 plt.plot(x, yp)
8 plt.title(’Linear regression ’)
9 plt.text(0, 70, ’m=%.1f b=%.1f’ % (m.coef_
[0], m.intercept_))
10 plt.show ()
Linear Regression
Model: ˆy = β0 + β1x
Minimize: i (yi − ˆyi )2
Models
y(x) = 2x + 10 + ε1 + bε2
ˆy(x) = 2x + 18
What if I want to explain only the trend?
How can I avoid the impact of these spikes?
1 from sklearn. linear_model import
LinearRegression
2 m = LinearRegression ()
3 m.fit(x[:, np.newaxis], y)
4 yp = m.predict(x[:, np.newaxis ])
5
6 plt.plot(x, y)
7 plt.plot(x, yp)
8 plt.title(’Linear regression ’)
9 plt.text(0, 70, ’m=%.1f b=%.1f’ % (m.coef_
[0], m.intercept_))
10 plt.show ()
What would a statistician do?
1 res = yp -y
2 plt.boxplot(res)
3 plt.show ()
Models
y(x) = 2x + 10 + ε1 + bε2
ˆy(x) = 2x + 18
What if I want to explain only the trend?
How can I avoid the impact of these spikes?
1 q1 = np.percentile(res , 25)
2 q3 = np.percentile(res , 75)
3 t = np.logical_and(res > q1 , res < q3)
4 x2 = x[t]
5 y2 = y[t]
6
7 m = LinearRegression ()
8 m.fit(x2[:, np.newaxis], y2)
9 yp = m.predict(x[:, np.newaxis ])
What would a statistician do?
1 res = yp -y
2 plt.boxplot(res)
3 plt.show ()
Models
y(x) = 2x + 10 + ε1 + bε2
ˆy(x) = 2x + 18
What if I want to explain only the trend?
How can I avoid the impact of these spikes?
1 q1 = np.percentile(res , 25)
2 q3 = np.percentile(res , 75)
3 t = np.logical_and(res > q1 , res < q3)
4 x2 = x[t]
5 y2 = y[t]
6
7 m = LinearRegression ()
8 m.fit(x2[:, np.newaxis], y2)
9 yp = m.predict(x[:, np.newaxis ])
What would a statistician do?
1 res = yp -y
2 plt.boxplot(res)
3 plt.show ()
Models
Approach #2: What would a statistician
with some computer science knowledge do?
1 q1 = np.percentile(res , 25)
2 q3 = np.percentile(res , 75)
3 t = np.logical_and(res > q1 , res < q3)
4 x2 = x[t]
5 y2 = y[t]
6
7 m = LinearRegression ()
8 m.fit(x2[:, np.newaxis], y2)
9 yp = m.predict(x[:, np.newaxis ])
What would a statistician do?
1 res = yp -y
2 plt.boxplot(res)
3 plt.show ()
Models
Approach #2: What would a statistician
with some computer science knowledge do?
1 q1 = np.percentile(res , 25)
2 q3 = np.percentile(res , 75)
3 t = np.logical_and(res > q1 , res < q3)
4 x2 = x[t]
5 y2 = y[t]
6
7 m = LinearRegression ()
8 m.fit(x2[:, np.newaxis], y2)
9 yp = m.predict(x[:, np.newaxis ])
Model: ˆy = β0 + β1x
Minimize: i |yi − ˆyi |
Models
Approach #2: What would a statistician
with some computer science knowledge do?
1 from statsmodels.regression.
quantile_regression import QuantReg
2
3 m = QuantReg(y, np.c_[np.ones(N), x])
4 m = m.fit (0.5)
5 yp = m.predict ()
Model: ˆy = β0 + β1x
Minimize: i |yi − ˆyi |
Models
Approach #2: What would a statistician
with some computer science knowledge do?
1 from statsmodels.regression.
quantile_regression import QuantReg
2
3 m = QuantReg(y, np.c_[np.ones(N), x])
4 m = m.fit (0.5)
5 yp = m.predict ()
Model: ˆy = β0 + β1x
Minimize: i |yi − ˆyi |
Models
Approach #3: What would a crazy com-
puter scientist do?
1 from statsmodels.regression.
quantile_regression import QuantReg
2
3 m = QuantReg(y, np.c_[np.ones(N), x])
4 m = m.fit (0.5)
5 yp = m.predict ()
Model: ˆy = β0 + β1x
Minimize: i |yi − ˆyi |
Models
Approach #3: What would a crazy com-
puter scientist do?
1 from statsmodels.regression.
quantile_regression import QuantReg
2
3 m = QuantReg(y, np.c_[np.ones(N), x])
4 m = m.fit (0.5)
5 yp = m.predict ()
1 plt.plot(x, y)
2 f o r it i n range (10):
3 t = np.random.choice(N, N//10 , replace
=False)
4 x2 = x[t]
5 y2 = y[t]
6 m.fit(x2[:, np.newaxis], y2)
7 yp = m.predict(x[:, np.newaxis ])
8 plt.plot(x, yp , color=’black ’, alpha
=0.4)
9 plt.show ()
Models
Approach #3: What would a crazy com-
puter scientist do?
1 plt.plot(x, y)
2 f o r it i n range (10):
3 t = np.random.choice(N, N//10 , replace
=False)
4 x2 = x[t]
5 y2 = y[t]
6 m.fit(x2[:, np.newaxis], y2)
7 yp = m.predict(x[:, np.newaxis ])
8 plt.plot(x, yp , color=’black ’, alpha
=0.4)
9 plt.show ()
Models
Sklearn already comes with this crazy model
too:
1 from sklearn. linear_model import
RANSACRegressor
2 m = RANSACRegressor ()
3 m.fit(x[:, np.newaxis], y)
4
5 plt.plot(x, y)
6 plt.plot(x, m.predict(x[:, np.newaxis ]))
7 plt.title(’RANSAC ’)
8 plt.show ()
Approach #3: What would a crazy com-
puter scientist do?
1 plt.plot(x, y)
2 f o r it i n range (10):
3 t = np.random.choice(N, N//10 , replace
=False)
4 x2 = x[t]
5 y2 = y[t]
6 m.fit(x2[:, np.newaxis], y2)
7 yp = m.predict(x[:, np.newaxis ])
8 plt.plot(x, yp , color=’black ’, alpha
=0.4)
9 plt.show ()
Models
Sklearn already comes with this crazy model
too:
1 from sklearn. linear_model import
RANSACRegressor
2 m = RANSACRegressor ()
3 m.fit(x[:, np.newaxis], y)
4
5 plt.plot(x, y)
6 plt.plot(x, m.predict(x[:, np.newaxis ]))
7 plt.title(’RANSAC ’)
8 plt.show ()
1 plt.plot(x, y)
2 f o r it i n range (10):
3 t = np.random.choice(N, N//10 , replace
=False)
4 x2 = x[t]
5 y2 = y[t]
6 m.fit(x2[:, np.newaxis], y2)
7 yp = m.predict(x[:, np.newaxis ])
8 plt.plot(x, yp , color=’black ’, alpha
=0.4)
9 plt.show ()
What kind of things can we use data mining /
machine learning for?
Data Mining Problems
Regression: predict a continuous
variable
e.g.
House Price = 100 + 20 × Land Size
In scikit-learn, LinearRegression, Gradient-
BoostingRegressor, etc (:: RegressorMixin)
.fit(X, y)
.predict(X) -> yp
Data Mining Problems
Regression: predict a continuous
variable
e.g.
House Price = 100 + 20 × Land Size
In scikit-learn, LinearRegression, Gradient-
BoostingRegressor, etc (:: RegressorMixin)
.fit(X, y)
.predict(X) -> yp
Classification: predict a discrete variable
e.g. House Price =
Expensive if in the city center
Cheap if outside the city
In scikit-learn, LogisticRegression, Gradient-
BoostingClassifier, etc (:: ClassifierMixin)
.fit(X, y)
.predict(X) -> yp
Data Mining Problems
Regression: predict a continuous
variable
e.g.
House Price = 100 + 20 × Land Size
In scikit-learn, LinearRegression, Gradient-
BoostingRegressor, etc (:: RegressorMixin)
.fit(X, y)
.predict(X) -> yp
Classification: predict a discrete variable
e.g. House Price =
Expensive if in the city center
Cheap if outside the city
In scikit-learn, LogisticRegression, Gradient-
BoostingClassifier, etc (:: ClassifierMixin)
.fit(X, y)
.predict(X) -> yp
Clustering: not predict, aggregate
In scikit-learn, KMeans, LatentDirichletAllo-
cation, etc (:: ClusterMixin)
.fit(X)
.transform(X) -> X’
.fit transform(X) -> X’
Data Mining Problems
Regression: predict a continuous
variable
e.g.
House Price = 100 + 20 × Land Size
In scikit-learn, LinearRegression, Gradient-
BoostingRegressor, etc (:: RegressorMixin)
.fit(X, y)
.predict(X) -> yp
Classification: predict a discrete variable
e.g. House Price =
Expensive if in the city center
Cheap if outside the city
In scikit-learn, LogisticRegression, Gradient-
BoostingClassifier, etc (:: ClassifierMixin)
.fit(X, y)
.predict(X) -> yp
Re-inforcement learning: (predict best
move)
Clustering: not predict, aggregate
In scikit-learn, KMeans, LatentDirichletAllo-
cation, etc (:: ClusterMixin)
.fit(X)
.transform(X) -> X’
.fit transform(X) -> X’
Use Cases
Jo˜ao Machado • Ricardo Cruz
Signal processing:
Packages:
numpy
pandas
scipy
matplotlib
Text Mining w/ Twitter
Packages:
tweepy
numpy
matplotlib
scikit-learn
Text Mining
1 import tweepy
2 auth = tweepy. OAuthHandler (api_key ,
api_secret)
3 auth. set_access_token (access_token ,
access_secret )
4 api = tweepy.API(auth)
5
6 timeline = api. user_timeline (’
realDonaldTrump ’, count =100)
7 texts = [tweet.text f o r tweet i n timeline]
Text Mining
1 import tweepy
2 auth = tweepy. OAuthHandler (api_key ,
api_secret)
3 auth. set_access_token (access_token ,
access_secret )
4 api = tweepy.API(auth)
5
6 timeline = api. user_timeline (’
realDonaldTrump ’, count =100)
7 texts = [tweet.text f o r tweet i n timeline]
1 from sklearn. feature_extraction .text
import CountVectorizer
2 m = CountVectorizer (stop_words=’english ’,
min_df =5, max_df =16)
3 X = m. fit_transform (texts)
4 words = sorted (m.vocabulary_ , key=m.
vocabulary_.get)
5
6 import pandas as pd
7 p r i n t (pd.DataFrame(X.todense (), columns=
words).ix[:5, :5]. to_latex ())
america big comey day dems
0 0 0 0 0 0
1 0 1 0 0 0
2 1 0 0 0 0
Text Mining
1 import tweepy
2 auth = tweepy. OAuthHandler (api_key ,
api_secret)
3 auth. set_access_token (access_token ,
access_secret )
4 api = tweepy.API(auth)
5
6 timeline = api. user_timeline (’
realDonaldTrump ’, count =100)
7 texts = [tweet.text f o r tweet i n timeline]
1 from sklearn. feature_extraction .text
import CountVectorizer
2 m = CountVectorizer (stop_words=’english ’,
min_df =5, max_df =16)
3 X = m. fit_transform (texts)
4 words = sorted (m.vocabulary_ , key=m.
vocabulary_.get)
5
6 import pandas as pd
7 p r i n t (pd.DataFrame(X.todense (), columns=
words).ix[:5, :5]. to_latex ())
america big comey day dems
0 0 0 0 0 0
1 0 1 0 0 0
2 1 0 0 0 0
1 import matplotlib.pyplot as plt
2 counts = np.asarray(X.sum(0))[0]
3 plt.barh( range ( len (counts)), counts)
4 plt.xticks( range (0, 14, 2))
5 plt.yticks( range ( len (counts)), words)
6 plt.show ()
Text Mining
1 import tweepy
2 auth = tweepy. OAuthHandler (api_key ,
api_secret)
3 auth. set_access_token (access_token ,
access_secret )
4 api = tweepy.API(auth)
5
6 timeline = api. user_timeline (’
realDonaldTrump ’, count =100)
7 texts = [tweet.text f o r tweet i n timeline]
1 from sklearn. feature_extraction .text
import CountVectorizer
2 m = CountVectorizer (stop_words=’english ’,
min_df =5, max_df =16)
3 X = m. fit_transform (texts)
4 words = sorted (m.vocabulary_ , key=m.
vocabulary_.get)
5
6 import pandas as pd
7 p r i n t (pd.DataFrame(X.todense (), columns=
words).ix[:5, :5]. to_latex ())
america big comey day dems
0 0 0 0 0 0
1 0 1 0 0 0
2 1 0 0 0 0
1 import matplotlib.pyplot as plt
2 counts = np.asarray(X.sum(0))[0]
3 plt.barh( range ( len (counts)), counts)
4 plt.xticks( range (0, 14, 2))
5 plt.yticks( range ( len (counts)), words)
6 plt.show ()
Text Mining
1 from sklearn. decomposition import
LatentDirichletAllocation
2 lda = LatentDirichletAllocation (2,
learning_method =’online ’)
3 lda.fit(X)
4 topics = lda. components_
newword1 = β11word1 + β12word2 + . . .
newword2 = β21word1 + β22word2 + . . .
1 from sklearn. feature_extraction .text
import CountVectorizer
2 m = CountVectorizer (stop_words=’english ’,
min_df =5, max_df =16)
3 X = m. fit_transform (texts)
4 words = sorted (m.vocabulary_ , key=m.
vocabulary_.get)
5
6 import pandas as pd
7 p r i n t (pd.DataFrame(X.todense (), columns=
words).ix[:5, :5]. to_latex ())
america big comey day dems
0 0 0 0 0 0
1 0 1 0 0 0
2 1 0 0 0 0
1 import matplotlib.pyplot as plt
2 counts = np.asarray(X.sum(0))[0]
3 plt.barh( range ( len (counts)), counts)
4 plt.xticks( range (0, 14, 2))
5 plt.yticks( range ( len (counts)), words)
6 plt.show ()
Text Mining
1 from sklearn. decomposition import
LatentDirichletAllocation
2 lda = LatentDirichletAllocation (2,
learning_method =’online ’)
3 lda.fit(X)
4 topics = lda. components_
newword1 = β11word1 + β12word2 + . . .
newword2 = β21word1 + β22word2 + . . .
1 topics = topics / topics.max(1)[:, np.
newaxis]
2 topics += np.random.randn (* topics.shape)
*0.02
3 f o r i, word i n enumerate(words):
4 plt.text(topics [0, i], topics [1, i],
word , ha=’center ’)
5 plt.show ()
1 import matplotlib.pyplot as plt
2 counts = np.asarray(X.sum(0))[0]
3 plt.barh( range ( len (counts)), counts)
4 plt.xticks( range (0, 14, 2))
5 plt.yticks( range ( len (counts)), words)
6 plt.show ()
Text Mining
1 from sklearn. decomposition import
LatentDirichletAllocation
2 lda = LatentDirichletAllocation (2,
learning_method =’online ’)
3 lda.fit(X)
4 topics = lda. components_
newword1 = β11word1 + β12word2 + . . .
newword2 = β21word1 + β22word2 + . . .
1 topics = topics / topics.max(1)[:, np.
newaxis]
2 topics += np.random.randn (* topics.shape)
*0.02
3 f o r i, word i n enumerate(words):
4 plt.text(topics [0, i], topics [1, i],
word , ha=’center ’)
5 plt.show ()
Text Mining
1 from sklearn. decomposition import
LatentDirichletAllocation
2 lda = LatentDirichletAllocation (2,
learning_method =’online ’)
3 lda.fit(X)
4 topics = lda. components_
newword1 = β11word1 + β12word2 + . . .
newword2 = β21word1 + β22word2 + . . .
1 topics = topics / topics.max(1)[:, np.
newaxis]
2 topics += np.random.randn (* topics.shape)
*0.02
3 f o r i, word i n enumerate(words):
4 plt.text(topics [0, i], topics [1, i],
word , ha=’center ’)
5 plt.show ()
1 timeline = api. user_timeline (’
marcelorebelo_ ’, count =100)
Traditional Learning vs Deep Learning
Traditionally, hand-crafted features would be extracted from the dataset and learning
would happen on top of those features. Deep learning learns from the raw data.
Packages:
scikit-image
numpy
keras
Traditional Learning
Cats vs Dogs – Kaggle Competition – https:
//www.kaggle.com/c/dogs-vs-cats
25,000 images of cats and dogs
Traditional Learning
Cats vs Dogs – Kaggle Competition – https:
//www.kaggle.com/c/dogs-vs-cats
25,000 images of cats and dogs
Feature #1: Extract histogram of colors
1 from skimage.io import imread
2 from skimage.transform import rgb2gray
3
4 f o r filename i n os.listdir(’train ’):
5 im = imread(os.path.join(’train ’,
filename))
6 im = rgb2gray(im)
7 f1 = np.histogram(im.flatten (), 10) [0]
8 f1 = (f1/f1.sum()).cumsum ()
Traditional Learning
Cats vs Dogs – Kaggle Competition – https:
//www.kaggle.com/c/dogs-vs-cats
25,000 images of cats and dogs
Feature #1: Extract histogram of colors
1 from skimage.io import imread
2 from skimage.transform import rgb2gray
3
4 f o r filename i n os.listdir(’train ’):
5 im = imread(os.path.join(’train ’,
filename))
6 im = rgb2gray(im)
7 f1 = np.histogram(im.flatten (), 10) [0]
8 f1 = (f1/f1.sum()).cumsum ()
Feature #2: Histogram of Oriented Gradi-
ents
1 im2 = resize(im , (32, 32) , mode=’reflect
’)
2 im2 = np.sqrt(im2)
3 f2 = hog(im2 , block_norm=’L2 -Hys ’)
Traditional Learning
Cats vs Dogs – Kaggle Competition – https:
//www.kaggle.com/c/dogs-vs-cats
25,000 images of cats and dogs
Feature #1: Extract histogram of colors
1 from skimage.io import imread
2 from skimage.transform import rgb2gray
3
4 f o r filename i n os.listdir(’train ’):
5 im = imread(os.path.join(’train ’,
filename))
6 im = rgb2gray(im)
7 f1 = np.histogram(im.flatten (), 10) [0]
8 f1 = (f1/f1.sum()).cumsum ()
1 from sklearn.tree import
DecisionTreeClassifier ,
export_graphviz
2 m = DecisionTreeClassifier (max_depth =3)
3 m.fit(X, y)
Feature #2: Histogram of Oriented Gradi-
ents
1 im2 = resize(im , (32, 32) , mode=’reflect
’)
2 im2 = np.sqrt(im2)
3 f2 = hog(im2 , block_norm=’L2 -Hys ’)
Traditional Learning
1 from sklearn. model_selection import
cross_val_score
2 from sklearn.ensemble import
RandomForestClassifier
3 p r i n t ( cross_val_score (
RandomForestClassifier (100) , X, y))
1 [ 0.69642429 0.70086393 0.69851176]
Feature #1: Extract histogram of colors
1 from skimage.io import imread
2 from skimage.transform import rgb2gray
3
4 f o r filename i n os.listdir(’train ’):
5 im = imread(os.path.join(’train ’,
filename))
6 im = rgb2gray(im)
7 f1 = np.histogram(im.flatten (), 10) [0]
8 f1 = (f1/f1.sum()).cumsum ()
1 from sklearn.tree import
DecisionTreeClassifier ,
export_graphviz
2 m = DecisionTreeClassifier (max_depth =3)
3 m.fit(X, y)
Feature #2: Histogram of Oriented Gradi-
ents
1 im2 = resize(im , (32, 32) , mode=’reflect
’)
2 im2 = np.sqrt(im2)
3 f2 = hog(im2 , block_norm=’L2 -Hys ’)
Deep Learning
1 from sklearn. model_selection import
cross_val_score
2 from sklearn.ensemble import
RandomForestClassifier
3 p r i n t ( cross_val_score (
RandomForestClassifier (100) , X, y))
1 [ 0.69642429 0.70086393 0.69851176]
Linear Regression
ˆy = β0 + β1x1 + β2x2 + . . .
Multilayer perceptron / neural network
ˆy = β00σ(β10 + β11x1 + β12x2 + . . . )
+ β01σ(β20 + β21x1 + β22x2 + . . . ) + . . .
1 from sklearn.tree import
DecisionTreeClassifier ,
export_graphviz
2 m = DecisionTreeClassifier (max_depth =3)
3 m.fit(X, y)
Feature #2: Histogram of Oriented Gradi-
ents
1 im2 = resize(im , (32, 32) , mode=’reflect
’)
2 im2 = np.sqrt(im2)
3 f2 = hog(im2 , block_norm=’L2 -Hys ’)
Deep Learning
1 from sklearn. model_selection import
cross_val_score
2 from sklearn.ensemble import
RandomForestClassifier
3 p r i n t ( cross_val_score (
RandomForestClassifier (100) , X, y))
1 [ 0.69642429 0.70086393 0.69851176]
Linear Regression
ˆy = β0 + β1x1 + β2x2 + . . .
Multilayer perceptron / neural network
ˆy = β00σ(β10 + β11x1 + β12x2 + . . . )
+ β01σ(β20 + β21x1 + β22x2 + . . . ) + . . .
1 from sklearn.tree import
DecisionTreeClassifier ,
export_graphviz
2 m = DecisionTreeClassifier (max_depth =3)
3 m.fit(X, y)
Deep Learning
1 from sklearn. model_selection import
cross_val_score
2 from sklearn.ensemble import
RandomForestClassifier
3 p r i n t ( cross_val_score (
RandomForestClassifier (100) , X, y))
1 [ 0.69642429 0.70086393 0.69851176]
Linear Regression
ˆy = β0 + β1x1 + β2x2 + . . .
Multilayer perceptron / neural network
ˆy = β00σ(β10 + β11x1 + β12x2 + . . . )
+ β01σ(β20 + β21x1 + β22x2 + . . . ) + . . .
1 model = Sequential ()
2 model.add(Conv2D (8, 3, 1, activation=’relu
’, input_shape =(32 , 32, 1)))
3 model.add( MaxPooling2D ())
4 model.add(Conv2D (16, 3, 1, activation=’
relu ’))
5 model.add( MaxPooling2D ())
6 model.add(Flatten ())
7 model.add(Dense (16, activation=’relu ’))
8 model.add(Dense (8, activation=’relu ’))
9 model.add(Dense (1, activation=’sigmoid ’))
10
11 sgd = SGD ()
12 model. compile (sgd , ’binary_crossentropy ’)
13
14 model.fit(X[tr], y[tr], validation_data =(X
[ts], y[ts]),
15 epochs =10, batch_size =100)
Deep Learning
1 f o r tr , ts i n StratifiedKFold ().split(X, y
):
2 model = ...
3 ...
4 yp = (model.predict(X[ts])[:, -1] > 0.5)
.astype( i n t )
5 p r i n t ( accuracy_score (y[ts], yp))
1 [0.57 , 0.57 , 0.63]
Linear Regression
ˆy = β0 + β1x1 + β2x2 + . . .
Multilayer perceptron / neural network
ˆy = β00σ(β10 + β11x1 + β12x2 + . . . )
+ β01σ(β20 + β21x1 + β22x2 + . . . ) + . . .
1 model = Sequential ()
2 model.add(Conv2D (8, 3, 1, activation=’relu
’, input_shape =(32 , 32, 1)))
3 model.add( MaxPooling2D ())
4 model.add(Conv2D (16, 3, 1, activation=’
relu ’))
5 model.add( MaxPooling2D ())
6 model.add(Flatten ())
7 model.add(Dense (16, activation=’relu ’))
8 model.add(Dense (8, activation=’relu ’))
9 model.add(Dense (1, activation=’sigmoid ’))
10
11 sgd = SGD ()
12 model. compile (sgd , ’binary_crossentropy ’)
13
14 model.fit(X[tr], y[tr], validation_data =(X
[ts], y[ts]),
15 epochs =10, batch_size =100)
Deep Learning
1 f o r tr , ts i n StratifiedKFold ().split(X, y
):
2 model = ...
3 ...
4 yp = (model.predict(X[ts])[:, -1] > 0.5)
.astype( i n t )
5 p r i n t ( accuracy_score (y[ts], yp))
1 [0.57 , 0.57 , 0.63]
Overview of Python deep learning landscape:
Theano TensorFlow PyTorch
KerasLasagne
1 model = Sequential ()
2 model.add(Conv2D (8, 3, 1, activation=’relu
’, input_shape =(32 , 32, 1)))
3 model.add( MaxPooling2D ())
4 model.add(Conv2D (16, 3, 1, activation=’
relu ’))
5 model.add( MaxPooling2D ())
6 model.add(Flatten ())
7 model.add(Dense (16, activation=’relu ’))
8 model.add(Dense (8, activation=’relu ’))
9 model.add(Dense (1, activation=’sigmoid ’))
10
11 sgd = SGD ()
12 model. compile (sgd , ’binary_crossentropy ’)
13
14 model.fit(X[tr], y[tr], validation_data =(X
[ts], y[ts]),
15 epochs =10, batch_size =100)
Deep Learning
1 f o r tr , ts i n StratifiedKFold ().split(X, y
):
2 model = ...
3 ...
4 yp = (model.predict(X[ts])[:, -1] > 0.5)
.astype( i n t )
5 p r i n t ( accuracy_score (y[ts], yp))
1 [0.57 , 0.57 , 0.63]
Overview of Python deep learning landscape:
Theano TensorFlow PyTorch
KerasLasagne
1 model = Sequential ()
2 model.add(Conv2D (8, 3, 1, activation=’relu
’, input_shape =(32 , 32, 1)))
3 model.add( MaxPooling2D ())
4 model.add(Conv2D (16, 3, 1, activation=’
relu ’))
5 model.add( MaxPooling2D ())
6 model.add(Flatten ())
7 model.add(Dense (16, activation=’relu ’))
8 model.add(Dense (8, activation=’relu ’))
9 model.add(Dense (1, activation=’sigmoid ’))
10
11 sgd = SGD ()
12 model. compile (sgd , ’binary_crossentropy ’)
13
14 model.fit(X[tr], y[tr], validation_data =(X
[ts], y[ts]),
15 epochs =10, batch_size =100)
Deep learning architectures:
Fully connected perceptrons
Convolutional neural networks
Recurrent neural networks
Neural Turing Machines
Autoencoders
Conclusions –
Python for Scientific Computing
Jo˜ao Machado • Ricardo Cruz
Conclusions
Packages to know:
Numpy: basic linear algebra
Scipy: extensions to numpy
sparse matrices, pdfs, hypothesis tests
Statsmodels: several statistics models,
incl. timeseries
Pandas: extension to numpy for
dataframes support
Matplotlib, seaborn: drawing graphics
Conclusions
Packages to know:
Numpy: basic linear algebra
Scipy: extensions to numpy
sparse matrices, pdfs, hypothesis tests
Statsmodels: several statistics models,
incl. timeseries
Pandas: extension to numpy for
dataframes support
Matplotlib, seaborn: drawing graphics
scikit-learn: complete machine learning
toolkit
xgboost: famous gradient boosting
model
Keras: deep learning (and TensorFlow,
Theano, Lasagne)
OpenCV, scikit-image: image
processing
NLTK: natural language toolkit
Gensim: natural language models
Final remarks
Python’s a “jack of all trades” type of language;
Its speed and ease of development is really apt for scientific computing;
Ever increasingly adopted by scientists and engineers, due to the available third-party
scientific libraries contributed by a large community;
Has become a ’de-facto’ language present in advances in some fields, such as Deep
Learning.
About us
Jo˜ao Machado
machadojpf@gmail.com
Fraunhofer Portugal research engineer
Masters in Electrical and Computer Engineering
http://www.linkedin.com/in/machadojpf
Ricardo Cruz
rpcruz@inesctec.pt
INESC TEC researcher
Computer Science & Applied Mathematics graduate
https://rpmcruz.github.io/
Subscribe workshops:
http://tinyurl.com/cruz-workshops

Weitere ähnliche Inhalte

Was ist angesagt?

"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...Edge AI and Vision Alliance
 
Function recap
Function recapFunction recap
Function recapalish sha
 
.NET 2015: Будущее рядом
.NET 2015: Будущее рядом.NET 2015: Будущее рядом
.NET 2015: Будущее рядомAndrey Akinshin
 
The matplotlib Library
The matplotlib LibraryThe matplotlib Library
The matplotlib LibraryHaim Michael
 
Memory efficient pytorch
Memory efficient pytorchMemory efficient pytorch
Memory efficient pytorchHyungjoo Cho
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005Jules Krdenas
 
機械学習によるデータ分析 実践編
機械学習によるデータ分析 実践編機械学習によるデータ分析 実践編
機械学習によるデータ分析 実践編Ryota Kamoshida
 
Matplotlib 簡介與使用
Matplotlib 簡介與使用Matplotlib 簡介與使用
Matplotlib 簡介與使用Vic Yang
 
TCO in Python via bytecode manipulation.
TCO in Python via bytecode manipulation.TCO in Python via bytecode manipulation.
TCO in Python via bytecode manipulation.lnikolaeva
 
C - aptitude3
C - aptitude3C - aptitude3
C - aptitude3Srikanth
 
Pythonで機械学習入門以前
Pythonで機械学習入門以前Pythonで機械学習入門以前
Pythonで機械学習入門以前Kimikazu Kato
 
Heap sort &amp; bubble sort
Heap sort &amp; bubble sortHeap sort &amp; bubble sort
Heap sort &amp; bubble sortShanmuga Raju
 
Garbage Collection
Garbage CollectionGarbage Collection
Garbage CollectionEelco Visser
 
Aae oop xp_06
Aae oop xp_06Aae oop xp_06
Aae oop xp_06Niit Care
 

Was ist angesagt? (19)

"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
 
Function recap
Function recapFunction recap
Function recap
 
.NET 2015: Будущее рядом
.NET 2015: Будущее рядом.NET 2015: Будущее рядом
.NET 2015: Будущее рядом
 
The matplotlib Library
The matplotlib LibraryThe matplotlib Library
The matplotlib Library
 
Memory efficient pytorch
Memory efficient pytorchMemory efficient pytorch
Memory efficient pytorch
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005
 
機械学習によるデータ分析 実践編
機械学習によるデータ分析 実践編機械学習によるデータ分析 実践編
機械学習によるデータ分析 実践編
 
Matplotlib 簡介與使用
Matplotlib 簡介與使用Matplotlib 簡介與使用
Matplotlib 簡介與使用
 
TCO in Python via bytecode manipulation.
TCO in Python via bytecode manipulation.TCO in Python via bytecode manipulation.
TCO in Python via bytecode manipulation.
 
Dive Into PyTorch
Dive Into PyTorchDive Into PyTorch
Dive Into PyTorch
 
C - aptitude3
C - aptitude3C - aptitude3
C - aptitude3
 
C questions
C questionsC questions
C questions
 
Pythonで機械学習入門以前
Pythonで機械学習入門以前Pythonで機械学習入門以前
Pythonで機械学習入門以前
 
Heap sort &amp; bubble sort
Heap sort &amp; bubble sortHeap sort &amp; bubble sort
Heap sort &amp; bubble sort
 
Revision1schema C programming
Revision1schema C programmingRevision1schema C programming
Revision1schema C programming
 
Brief Introduction to Cython
Brief Introduction to CythonBrief Introduction to Cython
Brief Introduction to Cython
 
Garbage Collection
Garbage CollectionGarbage Collection
Garbage Collection
 
Aae oop xp_06
Aae oop xp_06Aae oop xp_06
Aae oop xp_06
 
Data Structures - Lecture 6 [queues]
Data Structures - Lecture 6 [queues]Data Structures - Lecture 6 [queues]
Data Structures - Lecture 6 [queues]
 

Ähnlich wie Python for Scientific Computing -- Ricardo Cruz

PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...Andrey Karpov
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2goMoriyoshi Koizumi
 
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcComparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcYukio Okuda
 
Lecture 5 of Stanford university about python librarys
Lecture 5 of Stanford university about python librarysLecture 5 of Stanford university about python librarys
Lecture 5 of Stanford university about python librarysnirmensalama
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow규영 허
 
Q1 Consider the below omp_trap1.c implantation, modify the code so t.pdf
Q1 Consider the below omp_trap1.c implantation, modify the code so t.pdfQ1 Consider the below omp_trap1.c implantation, modify the code so t.pdf
Q1 Consider the below omp_trap1.c implantation, modify the code so t.pdfabdulrahamanbags
 
Data Structure in C (Lab Programs)
Data Structure in C (Lab Programs)Data Structure in C (Lab Programs)
Data Structure in C (Lab Programs)Saket Pathak
 
Python 培训讲义
Python 培训讲义Python 培训讲义
Python 培训讲义leejd
 
Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Piotr Milanowski
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyTravis Oliphant
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaFerdinand Jamitzky
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CSteffen Wenz
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Best C++ Programming Homework Help
Best C++ Programming Homework HelpBest C++ Programming Homework Help
Best C++ Programming Homework HelpC++ Homework Help
 
Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009A Jorge Garcia
 
Task4output.txt 2 5 9 13 15 10 1 0 3 7 11 14 1.docx
Task4output.txt 2  5  9 13 15 10  1  0  3  7 11 14 1.docxTask4output.txt 2  5  9 13 15 10  1  0  3  7 11 14 1.docx
Task4output.txt 2 5 9 13 15 10 1 0 3 7 11 14 1.docxjosies1
 

Ähnlich wie Python for Scientific Computing -- Ricardo Cruz (20)

PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...PVS-Studio team experience: checking various open source projects, or mistake...
PVS-Studio team experience: checking various open source projects, or mistake...
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
 
Python grass
Python grassPython grass
Python grass
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etcComparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
Comparing On-The-Fly Accelerating Packages: Numba, TensorFlow, Dask, etc
 
Lecture 5 of Stanford university about python librarys
Lecture 5 of Stanford university about python librarysLecture 5 of Stanford university about python librarys
Lecture 5 of Stanford university about python librarys
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
Q1 Consider the below omp_trap1.c implantation, modify the code so t.pdf
Q1 Consider the below omp_trap1.c implantation, modify the code so t.pdfQ1 Consider the below omp_trap1.c implantation, modify the code so t.pdf
Q1 Consider the below omp_trap1.c implantation, modify the code so t.pdf
 
Data Structure in C (Lab Programs)
Data Structure in C (Lab Programs)Data Structure in C (Lab Programs)
Data Structure in C (Lab Programs)
 
Boosting Developer Productivity with Clang
Boosting Developer Productivity with ClangBoosting Developer Productivity with Clang
Boosting Developer Productivity with Clang
 
Python 培训讲义
Python 培训讲义Python 培训讲义
Python 培训讲义
 
Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.Statistical inference for (Python) Data Analysis. An introduction.
Statistical inference for (Python) Data Analysis. An introduction.
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPy
 
SCIPY-SYMPY.pdf
SCIPY-SYMPY.pdfSCIPY-SYMPY.pdf
SCIPY-SYMPY.pdf
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Best C++ Programming Homework Help
Best C++ Programming Homework HelpBest C++ Programming Homework Help
Best C++ Programming Homework Help
 
Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009Seven waystouseturtle pycon2009
Seven waystouseturtle pycon2009
 
Task4output.txt 2 5 9 13 15 10 1 0 3 7 11 14 1.docx
Task4output.txt 2  5  9 13 15 10  1  0  3  7 11 14 1.docxTask4output.txt 2  5  9 13 15 10  1  0  3  7 11 14 1.docx
Task4output.txt 2 5 9 13 15 10 1 0 3 7 11 14 1.docx
 

Kürzlich hochgeladen

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 

Kürzlich hochgeladen (20)

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

Python for Scientific Computing -- Ricardo Cruz

  • 1. Introduction into Python for Scientific Computing Jo˜ao Machado • Ricardo Cruz
  • 2. Introduction a d g b e h c f i × 0 0 1 0 1 0 1 0 0 = ? ? ? ? ? ? ? ? ? What is the result of this operation?
  • 3. Introduction a d g b e h c f i × 0 0 1 0 1 0 1 0 0 = ? ? ? ? ? ? ? ? ? What is the result of this operation? a d g b e h c f i × 0 0 1 0 1 0 1 0 0 = g d a h e b i f c
  • 4. Introduction a d g b e h c f i × 0 0 1 0 1 0 1 0 0 = ? ? ? ? ? ? ? ? ? What is the result of this operation? a d g b e h c f i × 0 0 1 0 1 0 1 0 0 = g d a h e b i f c 1 from numpy import * 2 cout = p r i n t 3 4 A = random.random ((3, 3)); 5 B = fliplr(eye (3)); 6 C = dot(A, B); 7 cout(C); What programming language is this?
  • 5. Introduction a d g b e h c f i × 0 0 1 0 1 0 1 0 0 = ? ? ? ? ? ? ? ? ? What is the result of this operation? a d g b e h c f i × 0 0 1 0 1 0 1 0 0 = g d a h e b i f c 1 from numpy import * 2 cout = p r i n t 3 4 A = random.random ((3, 3)); 5 B = fliplr(eye (3)); 6 C = dot(A, B); 7 cout(C); It’s Python! 1 from numpy import * 2 cout = p r i n t 3 4 A = random.random ((3, 3)); 5 B = fliplr(eye (3)); 6 C = dot(A, B); 7 cout(C); What programming language is this?
  • 6. Introduction 1 #i n c l u d e <armadillo > 2 using namespace arma; 3 using namespace std; 4 5 mat A(3 ,3), B(3 ,3); 6 A.randu (); 7 B = fliplr(B.eye ()); 8 M3 = M1 * M2; 9 cout << M3 << endl; What about this programming language? a d g b e h c f i × 0 0 1 0 1 0 1 0 0 = g d a h e b i f c 1 from numpy import * 2 cout = p r i n t 3 4 A = random.random ((3, 3)); 5 B = fliplr(eye (3)); 6 C = dot(A, B); 7 cout(C); It’s Python! 1 from numpy import * 2 cout = p r i n t 3 4 A = random.random ((3, 3)); 5 B = fliplr(eye (3)); 6 C = dot(A, B); 7 cout(C); What programming language is this?
  • 7. Introduction 1 #i n c l u d e <armadillo > 2 using namespace arma; 3 using namespace std; 4 5 mat A(3 ,3), B(3 ,3); 6 A.randu (); 7 B = fliplr(B.eye ()); 8 M3 = M1 * M2; 9 cout << M3 << endl; What about this programming language? Why use Python? More important than the programming language is the ecosystem – and Python has a great scientific community Python has good interoperability with other systems The entire stack can be developed in Python: machine learning, flask, etc Computations do not run in Python; the slow stuff is implemented in Fortran and C 1 from numpy import * 2 cout = p r i n t 3 4 A = random.random ((3, 3)); 5 B = fliplr(eye (3)); 6 C = dot(A, B); 7 cout(C); It’s Python! 1 from numpy import * 2 cout = p r i n t 3 4 A = random.random ((3, 3)); 5 B = fliplr(eye (3)); 6 C = dot(A, B); 7 cout(C); What programming language is this?
  • 9. Why Python? Good data mining ecosystem. Not as centralized/monopolistic as Matlab’s Not as decentralized and messy as R :P
  • 11. Some notes on Numpy
  • 12. Numpy Notes Let A and B be matrices, Python/Numpy MATLAB R A.dot(B) A * B A %*% B A * B A .* B A * B Operations are elementwise by default (like R)
  • 13. Numpy Notes Let A and B be matrices, Python/Numpy MATLAB R A.dot(B) A * B A %*% B A * B A .* B A * B Operations are elementwise by default (like R) Python/Numpy MATLAB R A.shape size(A) length, nrow, ncol A[0:4,:] or A[0:4] or A[:4] A(1:4,:) A[1:4,] A[0:10:2] A[seq(0, 9, 2)] A[-4:] A(end-4:end,:) A[nrow(A)-4:nrow(A),] A.T A.’ t(A) Numpy in general allows for more succinct writing. Furthermore: Indexing starts at zero. Intervals are of the form [i, j[
  • 14. Numpy Notes Let A and B be matrices, Python/Numpy MATLAB R A.dot(B) A * B A %*% B A * B A .* B A * B Operations are elementwise by default (like R) Python/Numpy MATLAB R A.shape size(A) length, nrow, ncol A[0:4,:] or A[0:4] or A[:4] A(1:4,:) A[1:4,] A[0:10:2] A[seq(0, 9, 2)] A[-4:] A(end-4:end,:) A[nrow(A)-4:nrow(A),] A.T A.’ t(A) Numpy in general allows for more succinct writing. Furthermore: Indexing starts at zero. Intervals are of the form [i, j[ This is further aided by the fact that Numpy supports arithmetic broadcasting. (unlike MATLAB or R.) That is, you can do the following element- wise multiplication: (6,3) * (6,1). It auto- matically assumes you want to multiply by column. In MATLAB, you would have to use bsxfun(@times,r,A) or first use repmat().
  • 15. Numpy Notes Let A and B be matrices, Python/Numpy MATLAB R A.dot(B) A * B A %*% B A * B A .* B A * B Operations are elementwise by default (like R) Python/Numpy MATLAB R A.shape size(A) length, nrow, ncol A[0:4,:] or A[0:4] or A[:4] A(1:4,:) A[1:4,] A[0:10:2] A[seq(0, 9, 2)] A[-4:] A(end-4:end,:) A[nrow(A)-4:nrow(A),] A.T A.’ t(A) Numpy in general allows for more succinct writing. Furthermore: Indexing starts at zero. Intervals are of the form [i, j[ Something like the following is valid in Numpy... 1 import skimage.data 2 img1 = skimage.data.astronaut () 3 img2 = skimage.data.moon () 4 p r i n t (img1.shape) # (512 , 512 , 3) 5 p r i n t (img2.shape) # (512 , 512) 6 7 import matplotlib.pyplot as plt 8 plt.subplot (1, 2, 1) 9 plt.imshow(img1) 10 plt.subplot (1, 2, 2) 11 plt.imshow(img2 , cmap=’gray ’) 12 plt.show () This is further aided by the fact that Numpy supports arithmetic broadcasting. (unlike MATLAB or R.) That is, you can do the following element- wise multiplication: (6,3) * (6,1). It auto- matically assumes you want to multiply by column. In MATLAB, you would have to use bsxfun(@times,r,A) or first use repmat().
  • 16. Numpy Notes Python/Numpy MATLAB R A.shape size(A) length, nrow, ncol A[0:4,:] or A[0:4] or A[:4] A(1:4,:) A[1:4,] A[0:10:2] A[seq(0, 9, 2)] A[-4:] A(end-4:end,:) A[nrow(A)-4:nrow(A),] A.T A.’ t(A) Numpy in general allows for more succinct writing. Furthermore: Indexing starts at zero. Intervals are of the form [i, j[ Something like the following is valid in Numpy... 1 import skimage.data 2 img1 = skimage.data.astronaut () 3 img2 = skimage.data.moon () 4 p r i n t (img1.shape) # (512 , 512 , 3) 5 p r i n t (img2.shape) # (512 , 512) 6 7 import matplotlib.pyplot as plt 8 plt.subplot (1, 2, 1) 9 plt.imshow(img1) 10 plt.subplot (1, 2, 2) 11 plt.imshow(img2 , cmap=’gray ’) 12 plt.show () This is further aided by the fact that Numpy supports arithmetic broadcasting. (unlike MATLAB or R.) That is, you can do the following element- wise multiplication: (6,3) * (6,1). It auto- matically assumes you want to multiply by column. In MATLAB, you would have to use bsxfun(@times,r,A) or first use repmat().
  • 17. Numpy Notes Arithmetic mean 1 img2 = img2[:, :, np.newaxis] #(512 ,512 ,1) 2 img1 = img1.astype(np.uint32) 3 img2 = img2.astype(np.uint32) 4 img3 = (img1 + img2)//2 5 img3 = img3.astype(np.uint8) 6 plt.imshow(img3) 7 plt.show () Something like the following is valid in Numpy... 1 import skimage.data 2 img1 = skimage.data.astronaut () 3 img2 = skimage.data.moon () 4 p r i n t (img1.shape) # (512 , 512 , 3) 5 p r i n t (img2.shape) # (512 , 512) 6 7 import matplotlib.pyplot as plt 8 plt.subplot (1, 2, 1) 9 plt.imshow(img1) 10 plt.subplot (1, 2, 2) 11 plt.imshow(img2 , cmap=’gray ’) 12 plt.show () This is further aided by the fact that Numpy supports arithmetic broadcasting. (unlike MATLAB or R.) That is, you can do the following element- wise multiplication: (6,3) * (6,1). It auto- matically assumes you want to multiply by column. In MATLAB, you would have to use bsxfun(@times,r,A) or first use repmat().
  • 18. Numpy Notes Arithmetic mean 1 img2 = img2[:, :, np.newaxis] #(512 ,512 ,1) 2 img1 = img1.astype(np.uint32) 3 img2 = img2.astype(np.uint32) 4 img3 = (img1 + img2)//2 5 img3 = img3.astype(np.uint8) 6 plt.imshow(img3) 7 plt.show () Something like the following is valid in Numpy... 1 import skimage.data 2 img1 = skimage.data.astronaut () 3 img2 = skimage.data.moon () 4 p r i n t (img1.shape) # (512 , 512 , 3) 5 p r i n t (img2.shape) # (512 , 512) 6 7 import matplotlib.pyplot as plt 8 plt.subplot (1, 2, 1) 9 plt.imshow(img1) 10 plt.subplot (1, 2, 2) 11 plt.imshow(img2 , cmap=’gray ’) 12 plt.show ()
  • 19. Numpy Notes Arithmetic mean 1 img2 = img2[:, :, np.newaxis] #(512 ,512 ,1) 2 img1 = img1.astype(np.uint32) 3 img2 = img2.astype(np.uint32) 4 img3 = (img1 + img2)//2 5 img3 = img3.astype(np.uint8) 6 plt.imshow(img3) 7 plt.show () Geometric mean 1 img2 = img2[:, :, np.newaxis] 2 img1 = img1.astype(np.uint32) 3 img2 = img2.astype(np.uint32) 4 img3 = np.sqrt(img1 * img2) 5 img3 = img3.astype(np.uint8) 6 plt.imshow(img3) 7 plt.show ()
  • 20. Numpy Notes Arithmetic mean 1 img2 = img2[:, :, np.newaxis] #(512 ,512 ,1) 2 img1 = img1.astype(np.uint32) 3 img2 = img2.astype(np.uint32) 4 img3 = (img1 + img2)//2 5 img3 = img3.astype(np.uint8) 6 plt.imshow(img3) 7 plt.show () Geometric mean 1 img2 = img2[:, :, np.newaxis] 2 img1 = img1.astype(np.uint32) 3 img2 = img2.astype(np.uint32) 4 img3 = np.sqrt(img1 * img2) 5 img3 = img3.astype(np.uint8) 6 plt.imshow(img3) 7 plt.show ()
  • 21. Pandas and Data Visualization – Python for Scientific Computing Jo˜ao Machado • Ricardo Cruz
  • 22. Pandas What is Pandas? A package for data manipulation and analysis, based on the concept of data frame in the R language Optimized for performance, with critical code paths written in C Originally developed by Wes McKinney, while working for AQR Capital (a quantitative finance firm)
  • 23. Pandas What is Pandas? A package for data manipulation and analysis, based on the concept of data frame in the R language Optimized for performance, with critical code paths written in C Originally developed by Wes McKinney, while working for AQR Capital (a quantitative finance firm) Given the previous point, it makes sense to demonstrate some of the functionalities of Pandas with a dataset comprised of financial stocks :)
  • 24. Data Mining – Python for Scientific Computing Jo˜ao Machado • Ricardo Cruz
  • 26. Models Let us produce fake data... y(x) = 2x + 10 + ε1 + ε2 ε1 ∼ N(0, 2) ε2 ∼ |N(0, 25)| with p = 0.1, 0 otherwise.
  • 27. Models Let us produce fake data... y(x) = 2x + 10 + ε1 + ε2 ε1 ∼ N(0, 2) ε2 ∼ |N(0, 25)| with p = 0.1, 0 otherwise. Let us produce fake data... y(x) = 2x + 10 + ε1 + bε2 ε1 ∼ N(0, 2) b ∼ B(2, 0.1) ε2 ∼ |N(0, 25)|
  • 28. Models Let us produce fake data... y(x) = 2x + 10 + ε1 + ε2 ε1 ∼ N(0, 2) ε2 ∼ |N(0, 25)| with p = 0.1, 0 otherwise. Translation to numpy: 1 import numpy as np 2 N = 50 3 x = np.linspace (0, 25, N) 4 y = 2*x + 10 5 y += np.random.randn(N)*2 6 y += np.random.binomial (2, 0.10 , N)*np. abs (np.random.randn(N)*25) Let us produce fake data... y(x) = 2x + 10 + ε1 + bε2 ε1 ∼ N(0, 2) b ∼ B(2, 0.1) ε2 ∼ |N(0, 25)|
  • 29. Models 1 import matplotlib.pyplot as plt 2 plt.plot(x, y) 3 plt.title(’Data ’) 4 plt.show () Let us produce fake data... y(x) = 2x + 10 + ε1 + ε2 ε1 ∼ N(0, 2) ε2 ∼ |N(0, 25)| with p = 0.1, 0 otherwise. Translation to numpy: 1 import numpy as np 2 N = 50 3 x = np.linspace (0, 25, N) 4 y = 2*x + 10 5 y += np.random.randn(N)*2 6 y += np.random.binomial (2, 0.10 , N)*np. abs (np.random.randn(N)*25) Let us produce fake data... y(x) = 2x + 10 + ε1 + bε2 ε1 ∼ N(0, 2) b ∼ B(2, 0.1) ε2 ∼ |N(0, 25)|
  • 30. Models 1 import matplotlib.pyplot as plt 2 plt.plot(x, y) 3 plt.title(’Data ’) 4 plt.show () What model could we create to explain this data? Translation to numpy: 1 import numpy as np 2 N = 50 3 x = np.linspace (0, 25, N) 4 y = 2*x + 10 5 y += np.random.randn(N)*2 6 y += np.random.binomial (2, 0.10 , N)*np. abs (np.random.randn(N)*25) Let us produce fake data... y(x) = 2x + 10 + ε1 + bε2 ε1 ∼ N(0, 2) b ∼ B(2, 0.1) ε2 ∼ |N(0, 25)|
  • 31. Models 1 import matplotlib.pyplot as plt 2 plt.plot(x, y) 3 plt.title(’Data ’) 4 plt.show () What model could we create to explain this data? Translation to numpy: 1 import numpy as np 2 N = 50 3 x = np.linspace (0, 25, N) 4 y = 2*x + 10 5 y += np.random.randn(N)*2 6 y += np.random.binomial (2, 0.10 , N)*np. abs (np.random.randn(N)*25) Linear Regression Model: ˆy = β0 + β1x Minimize: i (yi − ˆyi )2
  • 32. Models 1 import matplotlib.pyplot as plt 2 plt.plot(x, y) 3 plt.title(’Data ’) 4 plt.show () What model could we create to explain this data? 1 from sklearn. linear_model import LinearRegression 2 m = LinearRegression () 3 m.fit(x[:, np.newaxis], y) 4 yp = m.predict(x[:, np.newaxis ]) 5 6 plt.plot(x, y) 7 plt.plot(x, yp) 8 plt.title(’Linear regression ’) 9 plt.text(0, 70, ’m=%.1f b=%.1f’ % (m.coef_ [0], m.intercept_)) 10 plt.show () Linear Regression Model: ˆy = β0 + β1x Minimize: i (yi − ˆyi )2
  • 33. Models What model could we create to explain this data? 1 from sklearn. linear_model import LinearRegression 2 m = LinearRegression () 3 m.fit(x[:, np.newaxis], y) 4 yp = m.predict(x[:, np.newaxis ]) 5 6 plt.plot(x, y) 7 plt.plot(x, yp) 8 plt.title(’Linear regression ’) 9 plt.text(0, 70, ’m=%.1f b=%.1f’ % (m.coef_ [0], m.intercept_)) 10 plt.show () Linear Regression Model: ˆy = β0 + β1x Minimize: i (yi − ˆyi )2
  • 34. Models y(x) = 2x + 10 + ε1 + bε2 ˆy(x) = 2x + 18 What if I want to explain only the trend? How can I avoid the impact of these spikes? 1 from sklearn. linear_model import LinearRegression 2 m = LinearRegression () 3 m.fit(x[:, np.newaxis], y) 4 yp = m.predict(x[:, np.newaxis ]) 5 6 plt.plot(x, y) 7 plt.plot(x, yp) 8 plt.title(’Linear regression ’) 9 plt.text(0, 70, ’m=%.1f b=%.1f’ % (m.coef_ [0], m.intercept_)) 10 plt.show () Linear Regression Model: ˆy = β0 + β1x Minimize: i (yi − ˆyi )2
  • 35. Models y(x) = 2x + 10 + ε1 + bε2 ˆy(x) = 2x + 18 What if I want to explain only the trend? How can I avoid the impact of these spikes? 1 from sklearn. linear_model import LinearRegression 2 m = LinearRegression () 3 m.fit(x[:, np.newaxis], y) 4 yp = m.predict(x[:, np.newaxis ]) 5 6 plt.plot(x, y) 7 plt.plot(x, yp) 8 plt.title(’Linear regression ’) 9 plt.text(0, 70, ’m=%.1f b=%.1f’ % (m.coef_ [0], m.intercept_)) 10 plt.show () What would a statistician do? 1 res = yp -y 2 plt.boxplot(res) 3 plt.show ()
  • 36. Models y(x) = 2x + 10 + ε1 + bε2 ˆy(x) = 2x + 18 What if I want to explain only the trend? How can I avoid the impact of these spikes? 1 q1 = np.percentile(res , 25) 2 q3 = np.percentile(res , 75) 3 t = np.logical_and(res > q1 , res < q3) 4 x2 = x[t] 5 y2 = y[t] 6 7 m = LinearRegression () 8 m.fit(x2[:, np.newaxis], y2) 9 yp = m.predict(x[:, np.newaxis ]) What would a statistician do? 1 res = yp -y 2 plt.boxplot(res) 3 plt.show ()
  • 37. Models y(x) = 2x + 10 + ε1 + bε2 ˆy(x) = 2x + 18 What if I want to explain only the trend? How can I avoid the impact of these spikes? 1 q1 = np.percentile(res , 25) 2 q3 = np.percentile(res , 75) 3 t = np.logical_and(res > q1 , res < q3) 4 x2 = x[t] 5 y2 = y[t] 6 7 m = LinearRegression () 8 m.fit(x2[:, np.newaxis], y2) 9 yp = m.predict(x[:, np.newaxis ]) What would a statistician do? 1 res = yp -y 2 plt.boxplot(res) 3 plt.show ()
  • 38. Models Approach #2: What would a statistician with some computer science knowledge do? 1 q1 = np.percentile(res , 25) 2 q3 = np.percentile(res , 75) 3 t = np.logical_and(res > q1 , res < q3) 4 x2 = x[t] 5 y2 = y[t] 6 7 m = LinearRegression () 8 m.fit(x2[:, np.newaxis], y2) 9 yp = m.predict(x[:, np.newaxis ]) What would a statistician do? 1 res = yp -y 2 plt.boxplot(res) 3 plt.show ()
  • 39. Models Approach #2: What would a statistician with some computer science knowledge do? 1 q1 = np.percentile(res , 25) 2 q3 = np.percentile(res , 75) 3 t = np.logical_and(res > q1 , res < q3) 4 x2 = x[t] 5 y2 = y[t] 6 7 m = LinearRegression () 8 m.fit(x2[:, np.newaxis], y2) 9 yp = m.predict(x[:, np.newaxis ]) Model: ˆy = β0 + β1x Minimize: i |yi − ˆyi |
  • 40. Models Approach #2: What would a statistician with some computer science knowledge do? 1 from statsmodels.regression. quantile_regression import QuantReg 2 3 m = QuantReg(y, np.c_[np.ones(N), x]) 4 m = m.fit (0.5) 5 yp = m.predict () Model: ˆy = β0 + β1x Minimize: i |yi − ˆyi |
  • 41. Models Approach #2: What would a statistician with some computer science knowledge do? 1 from statsmodels.regression. quantile_regression import QuantReg 2 3 m = QuantReg(y, np.c_[np.ones(N), x]) 4 m = m.fit (0.5) 5 yp = m.predict () Model: ˆy = β0 + β1x Minimize: i |yi − ˆyi |
  • 42. Models Approach #3: What would a crazy com- puter scientist do? 1 from statsmodels.regression. quantile_regression import QuantReg 2 3 m = QuantReg(y, np.c_[np.ones(N), x]) 4 m = m.fit (0.5) 5 yp = m.predict () Model: ˆy = β0 + β1x Minimize: i |yi − ˆyi |
  • 43. Models Approach #3: What would a crazy com- puter scientist do? 1 from statsmodels.regression. quantile_regression import QuantReg 2 3 m = QuantReg(y, np.c_[np.ones(N), x]) 4 m = m.fit (0.5) 5 yp = m.predict () 1 plt.plot(x, y) 2 f o r it i n range (10): 3 t = np.random.choice(N, N//10 , replace =False) 4 x2 = x[t] 5 y2 = y[t] 6 m.fit(x2[:, np.newaxis], y2) 7 yp = m.predict(x[:, np.newaxis ]) 8 plt.plot(x, yp , color=’black ’, alpha =0.4) 9 plt.show ()
  • 44. Models Approach #3: What would a crazy com- puter scientist do? 1 plt.plot(x, y) 2 f o r it i n range (10): 3 t = np.random.choice(N, N//10 , replace =False) 4 x2 = x[t] 5 y2 = y[t] 6 m.fit(x2[:, np.newaxis], y2) 7 yp = m.predict(x[:, np.newaxis ]) 8 plt.plot(x, yp , color=’black ’, alpha =0.4) 9 plt.show ()
  • 45. Models Sklearn already comes with this crazy model too: 1 from sklearn. linear_model import RANSACRegressor 2 m = RANSACRegressor () 3 m.fit(x[:, np.newaxis], y) 4 5 plt.plot(x, y) 6 plt.plot(x, m.predict(x[:, np.newaxis ])) 7 plt.title(’RANSAC ’) 8 plt.show () Approach #3: What would a crazy com- puter scientist do? 1 plt.plot(x, y) 2 f o r it i n range (10): 3 t = np.random.choice(N, N//10 , replace =False) 4 x2 = x[t] 5 y2 = y[t] 6 m.fit(x2[:, np.newaxis], y2) 7 yp = m.predict(x[:, np.newaxis ]) 8 plt.plot(x, yp , color=’black ’, alpha =0.4) 9 plt.show ()
  • 46. Models Sklearn already comes with this crazy model too: 1 from sklearn. linear_model import RANSACRegressor 2 m = RANSACRegressor () 3 m.fit(x[:, np.newaxis], y) 4 5 plt.plot(x, y) 6 plt.plot(x, m.predict(x[:, np.newaxis ])) 7 plt.title(’RANSAC ’) 8 plt.show () 1 plt.plot(x, y) 2 f o r it i n range (10): 3 t = np.random.choice(N, N//10 , replace =False) 4 x2 = x[t] 5 y2 = y[t] 6 m.fit(x2[:, np.newaxis], y2) 7 yp = m.predict(x[:, np.newaxis ]) 8 plt.plot(x, yp , color=’black ’, alpha =0.4) 9 plt.show ()
  • 47. What kind of things can we use data mining / machine learning for?
  • 48. Data Mining Problems Regression: predict a continuous variable e.g. House Price = 100 + 20 × Land Size In scikit-learn, LinearRegression, Gradient- BoostingRegressor, etc (:: RegressorMixin) .fit(X, y) .predict(X) -> yp
  • 49. Data Mining Problems Regression: predict a continuous variable e.g. House Price = 100 + 20 × Land Size In scikit-learn, LinearRegression, Gradient- BoostingRegressor, etc (:: RegressorMixin) .fit(X, y) .predict(X) -> yp Classification: predict a discrete variable e.g. House Price = Expensive if in the city center Cheap if outside the city In scikit-learn, LogisticRegression, Gradient- BoostingClassifier, etc (:: ClassifierMixin) .fit(X, y) .predict(X) -> yp
  • 50. Data Mining Problems Regression: predict a continuous variable e.g. House Price = 100 + 20 × Land Size In scikit-learn, LinearRegression, Gradient- BoostingRegressor, etc (:: RegressorMixin) .fit(X, y) .predict(X) -> yp Classification: predict a discrete variable e.g. House Price = Expensive if in the city center Cheap if outside the city In scikit-learn, LogisticRegression, Gradient- BoostingClassifier, etc (:: ClassifierMixin) .fit(X, y) .predict(X) -> yp Clustering: not predict, aggregate In scikit-learn, KMeans, LatentDirichletAllo- cation, etc (:: ClusterMixin) .fit(X) .transform(X) -> X’ .fit transform(X) -> X’
  • 51. Data Mining Problems Regression: predict a continuous variable e.g. House Price = 100 + 20 × Land Size In scikit-learn, LinearRegression, Gradient- BoostingRegressor, etc (:: RegressorMixin) .fit(X, y) .predict(X) -> yp Classification: predict a discrete variable e.g. House Price = Expensive if in the city center Cheap if outside the city In scikit-learn, LogisticRegression, Gradient- BoostingClassifier, etc (:: ClassifierMixin) .fit(X, y) .predict(X) -> yp Re-inforcement learning: (predict best move) Clustering: not predict, aggregate In scikit-learn, KMeans, LatentDirichletAllo- cation, etc (:: ClusterMixin) .fit(X) .transform(X) -> X’ .fit transform(X) -> X’
  • 52. Use Cases Jo˜ao Machado • Ricardo Cruz
  • 54. Text Mining w/ Twitter Packages: tweepy numpy matplotlib scikit-learn
  • 55. Text Mining 1 import tweepy 2 auth = tweepy. OAuthHandler (api_key , api_secret) 3 auth. set_access_token (access_token , access_secret ) 4 api = tweepy.API(auth) 5 6 timeline = api. user_timeline (’ realDonaldTrump ’, count =100) 7 texts = [tweet.text f o r tweet i n timeline]
  • 56. Text Mining 1 import tweepy 2 auth = tweepy. OAuthHandler (api_key , api_secret) 3 auth. set_access_token (access_token , access_secret ) 4 api = tweepy.API(auth) 5 6 timeline = api. user_timeline (’ realDonaldTrump ’, count =100) 7 texts = [tweet.text f o r tweet i n timeline] 1 from sklearn. feature_extraction .text import CountVectorizer 2 m = CountVectorizer (stop_words=’english ’, min_df =5, max_df =16) 3 X = m. fit_transform (texts) 4 words = sorted (m.vocabulary_ , key=m. vocabulary_.get) 5 6 import pandas as pd 7 p r i n t (pd.DataFrame(X.todense (), columns= words).ix[:5, :5]. to_latex ()) america big comey day dems 0 0 0 0 0 0 1 0 1 0 0 0 2 1 0 0 0 0
  • 57. Text Mining 1 import tweepy 2 auth = tweepy. OAuthHandler (api_key , api_secret) 3 auth. set_access_token (access_token , access_secret ) 4 api = tweepy.API(auth) 5 6 timeline = api. user_timeline (’ realDonaldTrump ’, count =100) 7 texts = [tweet.text f o r tweet i n timeline] 1 from sklearn. feature_extraction .text import CountVectorizer 2 m = CountVectorizer (stop_words=’english ’, min_df =5, max_df =16) 3 X = m. fit_transform (texts) 4 words = sorted (m.vocabulary_ , key=m. vocabulary_.get) 5 6 import pandas as pd 7 p r i n t (pd.DataFrame(X.todense (), columns= words).ix[:5, :5]. to_latex ()) america big comey day dems 0 0 0 0 0 0 1 0 1 0 0 0 2 1 0 0 0 0 1 import matplotlib.pyplot as plt 2 counts = np.asarray(X.sum(0))[0] 3 plt.barh( range ( len (counts)), counts) 4 plt.xticks( range (0, 14, 2)) 5 plt.yticks( range ( len (counts)), words) 6 plt.show ()
  • 58. Text Mining 1 import tweepy 2 auth = tweepy. OAuthHandler (api_key , api_secret) 3 auth. set_access_token (access_token , access_secret ) 4 api = tweepy.API(auth) 5 6 timeline = api. user_timeline (’ realDonaldTrump ’, count =100) 7 texts = [tweet.text f o r tweet i n timeline] 1 from sklearn. feature_extraction .text import CountVectorizer 2 m = CountVectorizer (stop_words=’english ’, min_df =5, max_df =16) 3 X = m. fit_transform (texts) 4 words = sorted (m.vocabulary_ , key=m. vocabulary_.get) 5 6 import pandas as pd 7 p r i n t (pd.DataFrame(X.todense (), columns= words).ix[:5, :5]. to_latex ()) america big comey day dems 0 0 0 0 0 0 1 0 1 0 0 0 2 1 0 0 0 0 1 import matplotlib.pyplot as plt 2 counts = np.asarray(X.sum(0))[0] 3 plt.barh( range ( len (counts)), counts) 4 plt.xticks( range (0, 14, 2)) 5 plt.yticks( range ( len (counts)), words) 6 plt.show ()
  • 59. Text Mining 1 from sklearn. decomposition import LatentDirichletAllocation 2 lda = LatentDirichletAllocation (2, learning_method =’online ’) 3 lda.fit(X) 4 topics = lda. components_ newword1 = β11word1 + β12word2 + . . . newword2 = β21word1 + β22word2 + . . . 1 from sklearn. feature_extraction .text import CountVectorizer 2 m = CountVectorizer (stop_words=’english ’, min_df =5, max_df =16) 3 X = m. fit_transform (texts) 4 words = sorted (m.vocabulary_ , key=m. vocabulary_.get) 5 6 import pandas as pd 7 p r i n t (pd.DataFrame(X.todense (), columns= words).ix[:5, :5]. to_latex ()) america big comey day dems 0 0 0 0 0 0 1 0 1 0 0 0 2 1 0 0 0 0 1 import matplotlib.pyplot as plt 2 counts = np.asarray(X.sum(0))[0] 3 plt.barh( range ( len (counts)), counts) 4 plt.xticks( range (0, 14, 2)) 5 plt.yticks( range ( len (counts)), words) 6 plt.show ()
  • 60. Text Mining 1 from sklearn. decomposition import LatentDirichletAllocation 2 lda = LatentDirichletAllocation (2, learning_method =’online ’) 3 lda.fit(X) 4 topics = lda. components_ newword1 = β11word1 + β12word2 + . . . newword2 = β21word1 + β22word2 + . . . 1 topics = topics / topics.max(1)[:, np. newaxis] 2 topics += np.random.randn (* topics.shape) *0.02 3 f o r i, word i n enumerate(words): 4 plt.text(topics [0, i], topics [1, i], word , ha=’center ’) 5 plt.show () 1 import matplotlib.pyplot as plt 2 counts = np.asarray(X.sum(0))[0] 3 plt.barh( range ( len (counts)), counts) 4 plt.xticks( range (0, 14, 2)) 5 plt.yticks( range ( len (counts)), words) 6 plt.show ()
  • 61. Text Mining 1 from sklearn. decomposition import LatentDirichletAllocation 2 lda = LatentDirichletAllocation (2, learning_method =’online ’) 3 lda.fit(X) 4 topics = lda. components_ newword1 = β11word1 + β12word2 + . . . newword2 = β21word1 + β22word2 + . . . 1 topics = topics / topics.max(1)[:, np. newaxis] 2 topics += np.random.randn (* topics.shape) *0.02 3 f o r i, word i n enumerate(words): 4 plt.text(topics [0, i], topics [1, i], word , ha=’center ’) 5 plt.show ()
  • 62. Text Mining 1 from sklearn. decomposition import LatentDirichletAllocation 2 lda = LatentDirichletAllocation (2, learning_method =’online ’) 3 lda.fit(X) 4 topics = lda. components_ newword1 = β11word1 + β12word2 + . . . newword2 = β21word1 + β22word2 + . . . 1 topics = topics / topics.max(1)[:, np. newaxis] 2 topics += np.random.randn (* topics.shape) *0.02 3 f o r i, word i n enumerate(words): 4 plt.text(topics [0, i], topics [1, i], word , ha=’center ’) 5 plt.show () 1 timeline = api. user_timeline (’ marcelorebelo_ ’, count =100)
  • 63. Traditional Learning vs Deep Learning Traditionally, hand-crafted features would be extracted from the dataset and learning would happen on top of those features. Deep learning learns from the raw data. Packages: scikit-image numpy keras
  • 64. Traditional Learning Cats vs Dogs – Kaggle Competition – https: //www.kaggle.com/c/dogs-vs-cats 25,000 images of cats and dogs
  • 65. Traditional Learning Cats vs Dogs – Kaggle Competition – https: //www.kaggle.com/c/dogs-vs-cats 25,000 images of cats and dogs Feature #1: Extract histogram of colors 1 from skimage.io import imread 2 from skimage.transform import rgb2gray 3 4 f o r filename i n os.listdir(’train ’): 5 im = imread(os.path.join(’train ’, filename)) 6 im = rgb2gray(im) 7 f1 = np.histogram(im.flatten (), 10) [0] 8 f1 = (f1/f1.sum()).cumsum ()
  • 66. Traditional Learning Cats vs Dogs – Kaggle Competition – https: //www.kaggle.com/c/dogs-vs-cats 25,000 images of cats and dogs Feature #1: Extract histogram of colors 1 from skimage.io import imread 2 from skimage.transform import rgb2gray 3 4 f o r filename i n os.listdir(’train ’): 5 im = imread(os.path.join(’train ’, filename)) 6 im = rgb2gray(im) 7 f1 = np.histogram(im.flatten (), 10) [0] 8 f1 = (f1/f1.sum()).cumsum () Feature #2: Histogram of Oriented Gradi- ents 1 im2 = resize(im , (32, 32) , mode=’reflect ’) 2 im2 = np.sqrt(im2) 3 f2 = hog(im2 , block_norm=’L2 -Hys ’)
  • 67. Traditional Learning Cats vs Dogs – Kaggle Competition – https: //www.kaggle.com/c/dogs-vs-cats 25,000 images of cats and dogs Feature #1: Extract histogram of colors 1 from skimage.io import imread 2 from skimage.transform import rgb2gray 3 4 f o r filename i n os.listdir(’train ’): 5 im = imread(os.path.join(’train ’, filename)) 6 im = rgb2gray(im) 7 f1 = np.histogram(im.flatten (), 10) [0] 8 f1 = (f1/f1.sum()).cumsum () 1 from sklearn.tree import DecisionTreeClassifier , export_graphviz 2 m = DecisionTreeClassifier (max_depth =3) 3 m.fit(X, y) Feature #2: Histogram of Oriented Gradi- ents 1 im2 = resize(im , (32, 32) , mode=’reflect ’) 2 im2 = np.sqrt(im2) 3 f2 = hog(im2 , block_norm=’L2 -Hys ’)
  • 68. Traditional Learning 1 from sklearn. model_selection import cross_val_score 2 from sklearn.ensemble import RandomForestClassifier 3 p r i n t ( cross_val_score ( RandomForestClassifier (100) , X, y)) 1 [ 0.69642429 0.70086393 0.69851176] Feature #1: Extract histogram of colors 1 from skimage.io import imread 2 from skimage.transform import rgb2gray 3 4 f o r filename i n os.listdir(’train ’): 5 im = imread(os.path.join(’train ’, filename)) 6 im = rgb2gray(im) 7 f1 = np.histogram(im.flatten (), 10) [0] 8 f1 = (f1/f1.sum()).cumsum () 1 from sklearn.tree import DecisionTreeClassifier , export_graphviz 2 m = DecisionTreeClassifier (max_depth =3) 3 m.fit(X, y) Feature #2: Histogram of Oriented Gradi- ents 1 im2 = resize(im , (32, 32) , mode=’reflect ’) 2 im2 = np.sqrt(im2) 3 f2 = hog(im2 , block_norm=’L2 -Hys ’)
  • 69. Deep Learning 1 from sklearn. model_selection import cross_val_score 2 from sklearn.ensemble import RandomForestClassifier 3 p r i n t ( cross_val_score ( RandomForestClassifier (100) , X, y)) 1 [ 0.69642429 0.70086393 0.69851176] Linear Regression ˆy = β0 + β1x1 + β2x2 + . . . Multilayer perceptron / neural network ˆy = β00σ(β10 + β11x1 + β12x2 + . . . ) + β01σ(β20 + β21x1 + β22x2 + . . . ) + . . . 1 from sklearn.tree import DecisionTreeClassifier , export_graphviz 2 m = DecisionTreeClassifier (max_depth =3) 3 m.fit(X, y) Feature #2: Histogram of Oriented Gradi- ents 1 im2 = resize(im , (32, 32) , mode=’reflect ’) 2 im2 = np.sqrt(im2) 3 f2 = hog(im2 , block_norm=’L2 -Hys ’)
  • 70. Deep Learning 1 from sklearn. model_selection import cross_val_score 2 from sklearn.ensemble import RandomForestClassifier 3 p r i n t ( cross_val_score ( RandomForestClassifier (100) , X, y)) 1 [ 0.69642429 0.70086393 0.69851176] Linear Regression ˆy = β0 + β1x1 + β2x2 + . . . Multilayer perceptron / neural network ˆy = β00σ(β10 + β11x1 + β12x2 + . . . ) + β01σ(β20 + β21x1 + β22x2 + . . . ) + . . . 1 from sklearn.tree import DecisionTreeClassifier , export_graphviz 2 m = DecisionTreeClassifier (max_depth =3) 3 m.fit(X, y)
  • 71. Deep Learning 1 from sklearn. model_selection import cross_val_score 2 from sklearn.ensemble import RandomForestClassifier 3 p r i n t ( cross_val_score ( RandomForestClassifier (100) , X, y)) 1 [ 0.69642429 0.70086393 0.69851176] Linear Regression ˆy = β0 + β1x1 + β2x2 + . . . Multilayer perceptron / neural network ˆy = β00σ(β10 + β11x1 + β12x2 + . . . ) + β01σ(β20 + β21x1 + β22x2 + . . . ) + . . . 1 model = Sequential () 2 model.add(Conv2D (8, 3, 1, activation=’relu ’, input_shape =(32 , 32, 1))) 3 model.add( MaxPooling2D ()) 4 model.add(Conv2D (16, 3, 1, activation=’ relu ’)) 5 model.add( MaxPooling2D ()) 6 model.add(Flatten ()) 7 model.add(Dense (16, activation=’relu ’)) 8 model.add(Dense (8, activation=’relu ’)) 9 model.add(Dense (1, activation=’sigmoid ’)) 10 11 sgd = SGD () 12 model. compile (sgd , ’binary_crossentropy ’) 13 14 model.fit(X[tr], y[tr], validation_data =(X [ts], y[ts]), 15 epochs =10, batch_size =100)
  • 72. Deep Learning 1 f o r tr , ts i n StratifiedKFold ().split(X, y ): 2 model = ... 3 ... 4 yp = (model.predict(X[ts])[:, -1] > 0.5) .astype( i n t ) 5 p r i n t ( accuracy_score (y[ts], yp)) 1 [0.57 , 0.57 , 0.63] Linear Regression ˆy = β0 + β1x1 + β2x2 + . . . Multilayer perceptron / neural network ˆy = β00σ(β10 + β11x1 + β12x2 + . . . ) + β01σ(β20 + β21x1 + β22x2 + . . . ) + . . . 1 model = Sequential () 2 model.add(Conv2D (8, 3, 1, activation=’relu ’, input_shape =(32 , 32, 1))) 3 model.add( MaxPooling2D ()) 4 model.add(Conv2D (16, 3, 1, activation=’ relu ’)) 5 model.add( MaxPooling2D ()) 6 model.add(Flatten ()) 7 model.add(Dense (16, activation=’relu ’)) 8 model.add(Dense (8, activation=’relu ’)) 9 model.add(Dense (1, activation=’sigmoid ’)) 10 11 sgd = SGD () 12 model. compile (sgd , ’binary_crossentropy ’) 13 14 model.fit(X[tr], y[tr], validation_data =(X [ts], y[ts]), 15 epochs =10, batch_size =100)
  • 73. Deep Learning 1 f o r tr , ts i n StratifiedKFold ().split(X, y ): 2 model = ... 3 ... 4 yp = (model.predict(X[ts])[:, -1] > 0.5) .astype( i n t ) 5 p r i n t ( accuracy_score (y[ts], yp)) 1 [0.57 , 0.57 , 0.63] Overview of Python deep learning landscape: Theano TensorFlow PyTorch KerasLasagne 1 model = Sequential () 2 model.add(Conv2D (8, 3, 1, activation=’relu ’, input_shape =(32 , 32, 1))) 3 model.add( MaxPooling2D ()) 4 model.add(Conv2D (16, 3, 1, activation=’ relu ’)) 5 model.add( MaxPooling2D ()) 6 model.add(Flatten ()) 7 model.add(Dense (16, activation=’relu ’)) 8 model.add(Dense (8, activation=’relu ’)) 9 model.add(Dense (1, activation=’sigmoid ’)) 10 11 sgd = SGD () 12 model. compile (sgd , ’binary_crossentropy ’) 13 14 model.fit(X[tr], y[tr], validation_data =(X [ts], y[ts]), 15 epochs =10, batch_size =100)
  • 74. Deep Learning 1 f o r tr , ts i n StratifiedKFold ().split(X, y ): 2 model = ... 3 ... 4 yp = (model.predict(X[ts])[:, -1] > 0.5) .astype( i n t ) 5 p r i n t ( accuracy_score (y[ts], yp)) 1 [0.57 , 0.57 , 0.63] Overview of Python deep learning landscape: Theano TensorFlow PyTorch KerasLasagne 1 model = Sequential () 2 model.add(Conv2D (8, 3, 1, activation=’relu ’, input_shape =(32 , 32, 1))) 3 model.add( MaxPooling2D ()) 4 model.add(Conv2D (16, 3, 1, activation=’ relu ’)) 5 model.add( MaxPooling2D ()) 6 model.add(Flatten ()) 7 model.add(Dense (16, activation=’relu ’)) 8 model.add(Dense (8, activation=’relu ’)) 9 model.add(Dense (1, activation=’sigmoid ’)) 10 11 sgd = SGD () 12 model. compile (sgd , ’binary_crossentropy ’) 13 14 model.fit(X[tr], y[tr], validation_data =(X [ts], y[ts]), 15 epochs =10, batch_size =100) Deep learning architectures: Fully connected perceptrons Convolutional neural networks Recurrent neural networks Neural Turing Machines Autoencoders
  • 75. Conclusions – Python for Scientific Computing Jo˜ao Machado • Ricardo Cruz
  • 76. Conclusions Packages to know: Numpy: basic linear algebra Scipy: extensions to numpy sparse matrices, pdfs, hypothesis tests Statsmodels: several statistics models, incl. timeseries Pandas: extension to numpy for dataframes support Matplotlib, seaborn: drawing graphics
  • 77. Conclusions Packages to know: Numpy: basic linear algebra Scipy: extensions to numpy sparse matrices, pdfs, hypothesis tests Statsmodels: several statistics models, incl. timeseries Pandas: extension to numpy for dataframes support Matplotlib, seaborn: drawing graphics scikit-learn: complete machine learning toolkit xgboost: famous gradient boosting model Keras: deep learning (and TensorFlow, Theano, Lasagne) OpenCV, scikit-image: image processing NLTK: natural language toolkit Gensim: natural language models
  • 78. Final remarks Python’s a “jack of all trades” type of language; Its speed and ease of development is really apt for scientific computing; Ever increasingly adopted by scientists and engineers, due to the available third-party scientific libraries contributed by a large community; Has become a ’de-facto’ language present in advances in some fields, such as Deep Learning.
  • 79. About us Jo˜ao Machado machadojpf@gmail.com Fraunhofer Portugal research engineer Masters in Electrical and Computer Engineering http://www.linkedin.com/in/machadojpf Ricardo Cruz rpcruz@inesctec.pt INESC TEC researcher Computer Science & Applied Mathematics graduate https://rpmcruz.github.io/ Subscribe workshops: http://tinyurl.com/cruz-workshops