Improve Your Edge on Machine Learning - Day 1.pptx

Improve Your Edge on
Machine Learning - Part 1

2
https://www.anaconda.com/products/distribution
Python in Anaconda Navigator Tools

Python: Data Types, Assignment, Operator
4
Both " and ' can be used to denote strings. If the apostrophe character should be part of the
string, use " as outer boundaries:
"Barack's last name is Obama"
Alternatively, can be used as an escape character: 'Barack's last name is Obama'
Integers (int)
In [1]:
# Integers
a = 2
b = 239
Floating point numbers (float)
In [2]:
# Floats
c = 2.1
d = 239.0
Strings (str)
In [3]:
e = 'Hello world!'
my_text = 'This is Future Lab'
Boolean (bool)
In [4]:
x = True
y = False
Standard calculator-like
operations
Most basic operations on integers and
floats such as addition, subtraction,
multiplication work as one would expect:
In [5]: 2 * 4
Out[5]: 8
In [6]: 2 / 5
Out[6]: 0.4
In [7]: 3.1 + 7.4
Out[7]: 10.5
Exponents
Exponents are denoted by **:
In [8]: 2**3
Out[8]: 8
Floor division
Floor division is denoted by //. It returns
the integer part of a division result
(removes decimals after division):
In [9]: 10 // 3
Out[9]: 3
Modulo
Modulo is denoted by %. It returns the
remainder after a division:
In [10]: 10 % 3
Out[10]: 1
Operations on strings
Strings can be added (concatenated) by
use of the addition operator +:
In [11]: 'Bruce' + ' ' + 'Wayne'
Out[11]: 'Bruce Wayne'
Multiplication is also allowed:
In [12 'a' * 3
Out[12]:'aaa'

Python: Control Flow
5
In Python, code blocks are separated by use of indentation. See the definition of an if-
statement below:
Syntax of conditional blocks
if condition:
# Code goes here (must be indented!)
# Otherwise, IndentationError will be thrown
# Code placed here is outside of the if-statement
Where evaluation of condition must return a boolean (True or False).
Remember:
1. The : must be present after condition.
2. The line immediately after : must be indented.
3. The if-statement is exited by reverting the indentation as shown
above.
This is how Python interprets the code as a block.
The same indentation rules are required for all types of code blocks, the if-block above is just an
example. Examples of other types of code blocks are for and while loops, functions etc.
All editors will automatically make the indentation upon hitting enter after the :, so it doesn't take long
to get used to this.
if-statements
An if-statement has the following syntax:
In [13]:
x = 2
if x > 1:
print('x is larger than 1')
if / else-statements
In [14]:
y = 1
if y > 1:
print('y is larger than 1')
else:
print('y is less than or equal to 1')
if / elif / else
In [15]:
z = 0
if z > 1:
print('z is larger than 1')
elif z < 1:
print('z is less than 1')
else:
print('z is equal to 1')

Python: Data Structures
6
Data structures are constructs that can contain one or more variables. They are containers that can store a lot of data into a single entity.
Python's four basic data structures are:
● Lists
● Dictionaries
● Tuples
● Sets
Lists
Lists are defined by square brackets [] with elements separated by commas. They
can have elements of any data type.
Lists are arguably the most used data structure in Python.
List syntax
L = [item_1, item_2, ..., item_n]
Mutability
Lists are mutable. They can be changed after creation.
List
In [1]:
# List with integers
a = [10, 20, 30, 40]
# Multiple data types in the same list
b = [1, True, 'Hi!', 4.3]
# List of lists
c = [['Nested', 'lists'], ['are', 'possible']]

7
Dictionaries
Dictionaries have key/value pairs which are enclosed in curly
brackets{}. A value can be fetched by querying the corresponding key.
Referring the data via logically named keys instead of list indexes
makes the code more readable.
Dictionary syntax
d = {key1: value1, key2: value2, ..., key_n: value_n}
Note that values can be of any data type like floats, strings etc., but
they can also be lists or other data structures.
Keys must be unique within the dictionary. Otherwise it would be
hard to extract the value by calling out a certain key, see the section
about indexing and slicing below.
Keys also must be of an immutable type.
Mutability
Dictionaries are mutable. They can be changed after creation.
# Strings as keys and numbers as values
d1 = {'axial_force': 319.2, 'moment': 74, 'shear': 23}
# Strings as keys and lists as values
d2 = {'Point1': [1.3, 51, 10.6], 'Point2': [7.1, 11, 6.7]}
# Keys of different types (int and str, don't do this!)
d3 = {1: True, 'hej': 23}
The first two dictionaries above have a certain trend. For d1 the keys are strings and
the values are integers. For d2 the keys are strings and the values are lists. These
are well-structured dictionaries.
However, d3 has keys that are of mixed types! The first key is an integer and the
second is a string. This is totally valid syntax, but not a good idea to do.
As with most stuff in Python the flexibility is very nice, but it can also be confusing
to have many different types mixed in the same data structure. To make code more
readable, it is often preferred to keep the same trend throughout the dictionary. I.e.
all keys are of same type and all values are of the same type as in d1 and d2.
The keys and values can be extracted separately by the methods dict.keys() and
dict.values():
In [3]: d1.keys()
Out[3]: dict_keys(['axial_force', 'moment', 'shear'])
In [4]: d1.values()
Out[4]: dict_values([319.2, 74, 23])

8
Tuples
Tuples are very comparable to lists, but they are defined by
parentheses (). Most notable difference from lists is that tuples are
immutable.
Tuple syntax
t = (item_1, item_2, ..., item_n)
Mutability
Tuples are immutable. They cannot be changed after creation.
Tuple examples
In [5]:
# Simple tuple of integers
t1 = (1, 24, 56)
# Multiple types as tuple elements
t2 = (1, 1.62, '12', [1, 2 , 3])
# Tuple of tuples
points = ((4, 5), (12, 6), (14, 9))
Sets
Sets are defined with curly brackets {}. They are unordered and don't
have an index. See description of indexing further down. Sets also
have unique items.
Set syntax
s = {item_1, item_2, ..., item_n}
The primary idea about sets is the ability to perform set operations.
These are known from mathematics and can determine the union,
intersection, difference etc. of two given sets.
A list, string or tuple can be converted to a set by
set(sequence_to_convert). Since sets only have unique items, the set
resulting from the operation has same values as the input sequence,
but with duplicates removed. This can be a way to create a list with
only unique elements.
For example:
# Convert list to set and back to list again with now only unique
elements
list_uniques = list(set(list_with_duplicates))

Python: Function
9
A function is a block of code that is first defined, and thereafter can be
called to run as many times as needed. A function might have
arguments, some of which can be optional if a default value is
specified.
A function is called by parentheses: function_name(). Arguments are
placed inside the parentehes and comma separated if there are more
than one. Similar to f(x, y) from mathematics.
A function can return one or more values to the caller. The values to
return are put in the return statement. When the code hits a return
statement the function terminates. If no return statement is given, the
function will return None
def function_name(arg1, arg2, default_arg1=0, default_arg2=None):
'''This is the docstring
The docstring explains what the function does, so it is like a multiline comment. It does not have to be here,
but it is good practice to use them to document the code. They are especially useful for more complicated
functions, although functions should in general be kept as simple as possible.
Arguments could be explained together with their types (e.g. strings, lists, dicts etc.).
'''
# Function code goes here
# Possible 'return' statement terminating the function. If 'return' is not specified, function returns None.
return return_val1, return_val2
If multiple values are to be returned, they can be separated by commas as shown. The returned entity will by default be a tuple.
Note that when using default arguments, it is good practice to only use immutable types. An example further below will demonstrate why this is recommended.
In [5]:
def say_hello_to(name):
''' Say hello to the input name '''
print(f'Hello {name}')
say_hello_to('Anders') # <--- Calling the function
prints 'Hello {name}'
r = say_hello_to('Anders') # <--- Calling the function
prints 'Hello {name}' and assigns None to r
print(r) # <--- Prints None, since
function had no return statement

10
Let us know your feedback as well ☺
QnA Session!

Numpy: Why ?
12
- Numeric Python
- Alternative to Python List: NumPy Array
- Calculations over entire arrays
- Easy and Fast
- Installation In the terminal: pip install numpy
height = [1.73, 1.68, 1.71, 1.89, 1.79]
height
Out[1]: [1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
weight
Out[2]: [65.4, 59.2, 63.6, 88.4, 68.7]
weight / height ** 2
Out[3]: TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'
Solution →

Numpy: Calculation
13
import numpy as np
np_height = np.array(height)
np_height
Out[1]: array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array(weight)
np_weight
Out[2]: array([65.4, 59.2, 63.6, 88.4, 68.7])
bmi = np_weight / np_height ** 2
bmi
Out[3]: array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])
Subsetting
bmi > 23
Out[3]: array([False, False, False, True, False]

Numpy: Data Types
14
Numerical types:
•integers (int)
•unsigned integers (uint)
•floating point (float)
•complex
Other data types:
•booleans (bool)
•string
•datetime
•Python object
Data Type Description
bool_ Boolean (True or False) stored as a byte
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2.15E-9 to 2.15E+9)
int64 Integer (-9.22E-18 to 9.22E+18)
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4.29E+9)
uint64 Unsigned integer (0 to 1.84E+19)
float16 Half precision signed float
float32 Single precision signed float
float64 Double precision signed float
complex64 Complex number: two 32-bit floats (real and
imaginary components)
complex128 Complex number: two 64-bit floats (real and
imaginary components)

Numpy: Creation ndArrays
15
array = np.array([[0,1,2],[2,3,4]])
Out[1]:
[[0 1 2]
[2 3 4]]
array = np.zeros((2,3))
Out[1]:
[[0. 0. 0.]
[0. 0. 0.]]
array = np.ones((2,3))
Out[1]:
[[1. 1. 1.]
[1. 1. 1.]]
array = np.eye(3)
Out[1]:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
array = np.arange(0, 10, 2)
Out[1]:
[0, 2, 4, 6, 8]
array = np.random.randint(0, 10, (3,3))
Out[1]:
[[6 4 3]
[1 5 6]
[9 8 5]]

Numpy: Indexing & Slicing
16
arr = np.arange(10)
print(arr) # [0 1 2 3 4 5 6 7 8 9]
print(arr[5]) #5
print(arr[5:8]) #[5 6 7]
arr[5:8] = 12
print(arr) #[ 0 1 2 3 4 12 12 12 8 9]
One-dimensional arrays are simple; on the surface they act similarly to Python lists:
As you can see, if you assign a scalar value to a slice, as in
arr[5:8] = 12, the value is propagated (or broadcasted) to
the entire selection.
An important first distinction from Python’s built-in lists is
that array slices are views on the original array.
This means that the data is not copied, and any
modifications to the view will be reflected in the source
array.
arr = np.arange(10)
print(arr)
# [0 1 2 3 4 5 6 7 8 9]
arr_slice = arr[5:8]
print(arr_slice)
# [5 6 7]
arr_slice[1] = 12345
print(arr)
# [ 0 1 2 3 4 5
12345 7 8 9]
arr_slice[:] = 64
print(arr)
# [ 0 1 2 3 4 64 64 64 8 9]

17
QnA Session!

Pandas: Dataframe Method & Attribute
19
df.attribute description
dtypes list the types of the columns
columns list the column names
axes list the row labels and column names
ndim number of dimensions
size number of elements
shape return a tuple representing the dimensionality
values numpy representation of the data
df.method() description
head( [n] ), tail( [n] ) first/last n rows
describe() generate descriptive statistics (for numeric columns only)
max(), min() return max/min values for all numeric columns
mean(), median() return mean/median values for all numeric columns
std() standard deviation
sample([n]) returns a random sample of the data frame
dropna() drop all the records with missing values
Unlike attributes, python methods have parenthesis.
All attributes and methods can be listed with a dir() function: dir(df)

Pandas
20
#Read csv file
df = pd.read_csv("http://rcs.bu.edu/examples/python/data_analysis/Salaries.csv")

Pandas: Dataframe Data Types
21
Pandas Type Native Python Type Description
object string The most general dtype. Will be
assigned to your column if column
has mixed types (numbers and
strings).
int64 int Numeric characters. 64 refers to
the memory allocated to hold this
character.
float64 float Numeric characters with decimals.
If a column contains numbers and
NaNs(see below), pandas will
default to float64, in case your
missing value has a decimal.
datetime64, timedelta[ns] N/A (but see the datetime module
in Python’s standard library)
Values meant to hold time data.
Look into these for time series
experiments.

Pandas: Dataframe Group By
22
Using "group by" method we can:
- Split the data into groups based on some criteria
- Calculate statistics (or apply a function) to each group
Once groupby object is create we can calculate various statistics for
each group

Pandas: Dataframe Filtering
23
To subset the data we can apply Boolean indexing.
This indexing is commonly known as a filter.
For example if we want to subset the rows in which the salary value is greater than $120K:
Any Boolean operator can be used to subset the data:
> greater; >= greater or equal;
< less; <= less or equal;
== equal; != not equal;

Pandas: Dataframe Selecting Row
24
If we need to select a range of rows, we can specify the range using ":"
Notice that the first row has a position 0, and the last value in the range is omitted:
So for 0:10 range the first 10 rows are returned with the positions starting with 0 and ending with 9
iloc methods

Pandas: Dataframe Common Aggregate
25
Aggregation - computing a summary statistic
about each group, i.e.
compute group sums or means
compute group sizes/counts
Common aggregation functions:
min, max
count, sum, prod
mean, median, mode, mad
std, var
df.method() description
describe Basic statistics (count, mean, std, min, quantiles, max)
min, max Minimum and maximum values
mean, median, mode Arithmetic average, median and mode
var, std Variance and standard deviation
sem Standard error of mean
skew Sample skewness
kurt kurtosis

26
QnA Session!

Intro ML
28
Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming, coined
the term “Machine Learning” as – “Field of study that gives computers the capability to
learn without being explicitly programmed”.
How it is different from traditional
Programming:
- In Traditional Programming, we feed the
Input, Program logic and run the program to
get output.
- In Machine Learning, we feed the input,
output and run it on machine during training
and the machine creates its own logic, which
is being evaluated while testing.

Intro ML: Terminology
29
Terminologies that one should know before starting Machine Learning:
- Model: A model is a specific representation learned from data by applying some
machine learning algorithm. A model is also called hypothesis.
- Feature: A feature is an individual measurable property of our data. A set of numeric
features can be conveniently described by a feature vector. Feature vectors are fed as
input to the model. For example, in order to predict a fruit, there may be features like
color, smell, taste, etc.
- Target(Label): A target variable or label is the value to be predicted by our model. For
the fruit example discussed in the features section, the label with each set of input
would be the name of the fruit like apple, orange, banana, etc.
- Training: The idea is to give a set of inputs(features) and it’s expected outputs(labels),
so after training, we will have a model (hypothesis) that will then map new data to one
of the categories trained on.
- Prediction: Once our model is ready, it can be fed a set of inputs to which it will provide
a predicted output(label).

Intro ML: Type of Learning
30
- Supervised Learning
- Unsupervised Learning
- Semi-Supervised Learning
1. Supervised Learning:
Supervised learning is when the model is getting trained on a labelled dataset. Labelled
dataset is one which have both input and output parameters. In this type of learning
both training and validation datasets are labelled as shown in the figures below.
Types of Supervised Learning:
- Classification
- Regression

Intro ML: Type of Learning
31
2. Unsupervised Learning:
Unsupervised learning is the training of machine using information that is neither
classified nor labeled and allowing the algorithm to act on that information without
guidance. Here the task of machine is to group unsorted information according to
similarities, patterns and differences without any prior training of data. Unsupervised
machine learning is more challenging than supervised learning due to the absence of
labels.
Types of Supervised Learning:
- Clustering
- Association
3. Semi-supervised machine learning:
To counter these disadvantages, the concept
of Semi-Supervised Learning was introduced.
In this type of learning, the algorithm is trained
upon a combination of labeled and unlabeled
data. Typically, this combination will contain a
very small amount of labeled data and a very
large amount of unlabeled data.

Intro ML: Classification
33
We have a training set of observations (e.g., labeled images) and a test set
that we use only for evaluation.

Intro ML: Classification
Formally, given training set (xi,yi) for i=1…n, we want to create a
classification model f that can predict label y for a new x.
The machine learning algorithm will create the function f.
The predicted value of y for a new x is sign(f(x)).
Classification ?
- Yes/No questions – binary classification
- automatic handwriting recognition, speech recognition, biometrics, document
classification, spam detection, predicting credit default risk, detecting credit
card fraud, predicting customer churn, predicting medical outcomes (strokes,
side effects, etc.)

39
QnA Session!

Intro Scikit-Learn: Why?
A. Simple and efficient tools for predictive data analysis
- Machine Learning methods
- Data processing
- Visualization
A. Accessible to everybody, and reusable in various contexts
- Documented API with lot’s of examples
- Not bound to Training frameworks (e.g. Tensorflow, Pytorch)
- Building blocks for your data analysis
A. Built on NumPy, SciPy, and matplotlib
- No own data types (unlike Pandas)
- Benefit from NumPy and SciPy optimizations
- Extends the most common visualisation tool
Open source, commercially usable - BSD license
Version 1.0 since September 2021
•https://scikit-learn.org/stable/

Intro Scikit-Learn: Tools
A. Classification:
Categorizing objects to one or more classes.
- Support Vector Machines (SVM)
- Nearest Neighbors
- Random Forest
- . . .
A. Regression:
Prediction of one (uni-) or more (multi-variate) continuous-
valued attributes.
- Support Vector Regression (SVR)
- Nearest Neighbors
- Random Forest
- . . .
A. Clustering:
Group objects of a set.
- k-Means
- Spectral Clustering
- Mean-Shift
- . . .
D. Dimensionality reduction:
Reducing the number of random variables.
- Principal Component Analysis (PCA)
- Feature Selection
- non-Negative Matrix Factorization
- . . .
E. Model selection:
Compare, validate and choose parameters/models.
- Grid Search
- Cross Validation
- . . .
F. Pre-Processing:
Prepare/transform data before training models.
- Conversion
- Normalization
- Feature Extract

Intro Scikit-Learn: Supervised ML Flow
Easy install via PIP or Conda for Windows, macOS and Linux, e.g:
$ pip install scikit-learn or
$ conda install -c intel scikit-learn

44
Kindly turn on your camera ☺
Let’s take a group photo!

Improve Your Edge on Machine Learning - Day 1.pptx

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Improve Your Edge on Machine Learning - Day 1.pptx

Ähnlich wie Improve Your Edge on Machine Learning - Day 1.pptx (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Improve Your Edge on Machine Learning - Day 1.pptx

Hinweis der Redaktion