3. Hyperparameter
● Configurable parameters of Transformers & Estimators are
Hyperparameters
● Transformers & Estimators when configured with best hyperparameters, we
get the best model.
● Finding the best hyper-parameters is a tricky task.
4. GridSearch
● Takes bunch of possible hyperparameters
● Creates model with all possible combinations
● Train & validates all the models
● Returns the best model
● Also, the most suited hyper-parameters
● We may further narrow down & repeat.
5. Transformers
● Entities capable of transforming data are called as transformers
● PreProcessing functions returns Transformers
● They support fit(), transform() & fit_transform() methods
● StandardScaler, MinMaxScaler etc. are in-built
● Using FunctionTransformer we can create our own transformers
6. Estimators
● Data after going through right transformation needs be passed to estimator
for training or prediction
● Object of learning algorithm are known as estimators
● LinearRegression, KMeans etc.
7. Pipeline
● Sequentially apply a list of transforms and a final estimator.
● Intermediate steps of the pipeline must be ‘transforms’, that is, they must
implement fit and transform methods.
● The final estimator only needs to implement fit.
● The transformers in the pipeline can be cached using memory argument.
Imputer StandardScaler PCA SGDClassifier
8. Pipeline
● Allow to quickly build a model with all the pre-processing and imputation
chains as a scikit-learn object with all the fit and transform methods that
usually come with these objects.
● In addition, we can wrap the object in a grid search to find my optimal
hyperparameters across all the steps in my pipeline chain.
9. FeatureUnion
● Concatenates results of multiple
transformer objects.
● This estimator applies a list of transformer
objects in parallel to the input data, then
concatenates the results.
● This is useful to combine several feature
extraction mechanisms into a single
transformer.
11. Advantages
● Integrates very well with GridSearchCV for hyper-parameter tuning
● Code is highly modular & reusable
● Caching of transformers can be enabled
● Perhaps one of the best feature of scikit
12. Limitations
● Sad part is pipeline doesn’t support partial_fit api.
● That’s because all transformers are not capable of out-of-core processing
14. Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com