SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Hamilton
A General Purpose Micro-Framework for Scalable
Feature Engineering (in python!)
Elijah ben Izzy
Data Platform Engineer, Stitch Fix
2
Data Con 2021
Stitch Fix
Shop at your personal curated
store.
Check out what you
like.
3
Data Con 2021
Data Science @ Stitch Fix
algorithms-tour.stitchïŹx.com
- Data Science powers the user experience
- 145+ data scientists and data platform engineers
Hamilton is Open Source code
> pip install sf-hamilton
Get started in <15 minutes!
Documentation
https://hamilton-docs.gitbook.io/
Lots of examples:
https://github.com/stitchfix/hamilton/tree/main/examples
4
What is Hamilton?
Why was Hamilton created?
Hamilton @ Stitch Fix
Hamilton for feature engineering
↳ Scaling humans/teams
↳ Scaling compute/data
Plans for the future
The Agenda
What is Hamilton?
Why was Hamilton created?
Hamilton @ Stitch Fix
Hamilton for feature engineering
↳ Scaling humans/teams
↳ Scaling compute/data
Plans for the future
What is Hamilton?
A declarative dataflow paradigm.
7
What is Hamilton?
declarative: naturally express as readable code
dataflow: represent transformation of any data
paradigm: a novel approach to writing python
8
What is Hamilton?
declarative: naturally express as readable code
dataflow: represent transformation of any data
paradigm: a novel approach to writing python
9
What is Hamilton?
declarative: naturally express as readable code
dataflow: represent transformation of any data
paradigm: a novel approach to writing python
10
Code:
11
def holidays(year: pd.Series, week: pd.Series) -> pd.Series:
"""Some docs"""
return some_library(year, week)
def avg_3wk_spend(spend: pd.Series) -> pd.Series:
"""Some docs"""
return spend.rolling(3).mean()
def spend_per_signup(spend: pd.Series, signups: pd.Series) -> pd.Series:
"""Some docs"""
return spend / signups
def spend_shift_3weeks(spend: pd.Series) -> pd.Series:
"""Some docs"""
return spend.shift(3)
def spend_shift_3weeks_per_signup(spend_shift_3weeks: pd.Series, signups: pd.Series) -> pd.Series:
"""Some docs"""
return spend_shift_3weeks / signups
User
DAG:
Result (e.g. DataFrame):
Hamilton
User
Code → Directed Acyclic Graph → Result
Old Way vs Hamilton Way:
Instead of:
You declare:
12
df['c'] = df['a'] + df['b']
df['d'] = transform(df['c'])
def c(a: pd.Series, b: pd.Series) -> pd.Series:
"""Sums a with b"""
return a + b
def d(c: pd.Series) -> pd.Series:
"""Transforms C to ..."""
new_column = _transform_logic(c)
return new_column
Instead of:
You declare:
Inputs == Function Arguments
Old Way vs Hamilton Way:
13
df['c'] = df['a'] + df['b']
df['d'] = transform(df['c'])
def c(a: pd.Series, b: pd.Series) -> pd.Series:
"""Sums a with b"""
return a + b
def d(c: pd.Series) -> pd.Series:
"""Transforms C to ..."""
new_column = _transform_logic(c)
return new_column
Outputs == Function Name
Full Hello World
14
# feature_logic.py
def c(a: pd.Series, b: pd.Series) -> pd.Series:
"""Sums a with b"""
return a + b
def d(c: pd.Series) -> pd.Series:
"""Transforms C to ..."""
new_column = _transform_logic(c)
return new_column
# run.py
from hamilton import driver
import feature_logic
dr = driver.Driver({'a': ..., 'b': ...}, feature_logic)
df_result = dr.execute(['c', 'd'])
print(df_result)
Functions:
“Driver” - this actually says what and when to execute:
Hamilton TL;DR:
1. For each transform (=), you write a function(s).
2. Functions declare a DAG.
3. Hamilton handles DAG execution.
15
c
d
a b
# feature_logic.py
def c(a: pd.Series, b: pd.Series) -> pd.Series:
"""Replaces c = a + b"""
return a + b
def d(c: pd.Series) -> pd.Series:
"""Replaces d = transform(c)"""
new_column = _transform_logic(c)
return new_column
# run.py
from hamilton import driver
import feature_logic
dr = driver.Driver({'a': ..., 'b': ...},
feature_logic)
df_result = dr.execute(['c', 'd'])
print(df_result)
What is Hamilton?
Why was Hamilton created?
Hamilton @ Stitch Fix
Hamilton for feature engineering
↳ Scaling humans/teams
↳ Scaling compute/data
Plans for the future
Backstory: Time Series Forecasting
17
What
Hamilton
helped solve!
Biggest problems here
Backstory: TS → Dataframe Creation
18
Columns are
functions of
other columns
Backstory: TS → Dataframe Creation
19
g(f(A,B), 
)
h(g(f(A,B), 
), 
)
etcđŸ”„
Backstory: TS → DF → 🍝 Code
20
df = load_dates() # load date ranges
df = load_actuals(df) # load actuals, e.g. spend, signups
df['holidays'] = is_holiday(df['year'], df['week']) # holidays
df['avg_3wk_spend'] = df['spend'].rolling(3).mean() # moving average of spend
df['spend_per_signup'] = df['spend'] / df['signups'] # spend per person signed up
df['spend_shift_3weeks'] = df.spend['spend'].shift(3) # shift spend because ...
df['spend_shift_3weeks_per_signup'] = df['spend_shift_3weeks'] / df['signups']
def my_special_feature(df: pd.DataFrame) -> pd.Series:
return (df['A'] - df['B'] + df['C']) * weights
df['special_feature'] = my_special_feature(df)
# ...
Now scale this code to 1000+ columns & a growing team 😬
Human scaling 😔:
○ Testing / Unit testing 👎
○ Documentation 👎
○ Code Reviews 👎
○ Onboarding 📈 👎
○ Debugging 📈 👎
Underrated problem!
What is Hamilton?
Why was Hamilton created?
Hamilton @ Stitch Fix
Hamilton for feature engineering
↳ Scaling humans/teams
↳ Scaling compute/data
Plans for the future
Hamilton @ Stitch Fix
● Running in production for 2.5+ years
● Initial use-case manages 4000+ feature definitions
● All feature definitions are:
○ Unit testable
○ Documentation friendly
○ Centrally curated, stored, and versioned in git
● Data science teams ❀ it:
○ Enabled a monthly task to be completed 4x faster
○ Easy to onboard new team members
○ Code reviews are simpler
22
Hamilton @ Stitch Fix
23
What is Hamilton?
Why was Hamilton created?
Hamilton @ Stitch Fix
Hamilton for feature engineering
↳ Scaling humans/teams
↳ Scaling compute/data
Plans for the future
Hamilton + Feature Engineering: Overview
featurization training inference
● Can model this all in Hamilton (if you wanted to)
● We’ll just focus on featurization
○ FYI: Hamilton works for any object type.
■ Here we’ll assume pandas for simplicity.
○ E.g. use Hamilton within Airflow, Dagster, Prefect, Flyte, Metaflow, Kubeflow, etc.
25
Load
Data
Transform
into
Features
Fit
Model(s)
Use
Model(s)
data loading &
feature code:
26
def holidays(year: pd.Series, week: pd.Series) -> pd.Series:
"""Some docs"""
return some_library(year, week)
def avg_3wk_spend(spend: pd.Series) -> pd.Series:
"""Some docs"""
return spend.rolling(3).mean()
def spend_per_signup(spend: pd.Series, signups: pd.Series) -> pd.Series:
"""Some docs"""
return spend / signups
def spend_shift_3weeks(spend: pd.Series) -> pd.Series:
"""Some docs"""
return spend.shift(3)
def spend_shift_3weeks_per_signup(spend_shift_3weeks: pd.Series, signups: pd.Series) -> pd.Series:
"""Some docs"""
return spend_shift_3weeks / signups
via
driver:
feature
dataframe:
Modeling Featurization
features.py
run.py
Code that needs to be written:
1. Functions to load data
a. normalize/create common index to join on
2. Transform/feature functions
a. Optional: model functions
3. Drivers materialize data
a. DAG is walked for only what’s needed
Modeling Featurization
27
Data
Loaders
Feature
Functions
Drivers
Code that needs to be written:
1. Functions to load data
a. normalize/create common index to join on
2. Transform/feature functions
a. Optional: model functions
3. Drivers materialize data
a. DAG is walked for only what’s needed
Modeling Featurization
28
Data
Loaders
Drivers
Feature
Functions
Q: Why is feature engineering hard?
Q: Why is feature engineering hard?
A: Scaling!
Scaling: Served Two Ways
}
}
> Human/team:
● Highly coupled code
● Difficulty reusing/understand work
● Broken/unhealthy production pipelines
> Machine:
● Data is too big to fit in memory
● Cannot easily parallelize computation
31
Hamilton helps here!
Hamilton has
integrations here!
What is Hamilton?
Why was Hamilton created?
Hamilton @ Stitch Fix
Hamilton for feature engineering
↳ Scaling humans/teams
↳ Scaling compute/data
Plans for the future
How Hamilton Helps with Human/Team Scaling
33
Highly coupled code Decouples “functions” from use (driver code).
How Hamilton Helps with Human/Team Scaling
34
Highly coupled code Decouples “functions” from use (driver code).
Difficulty reusing/understand work Functions are curated into modules.
Everything is unit testable.
Documentation is natural.
Forced to align on naming.
How Hamilton Helps with Human/Team Scaling
35
Highly coupled code Decouples “functions” from use (driver code).
Difficulty reusing/understand work Functions are curated into modules.
Everything is unit testable.
Documentation is natural.
Forced to align on naming.
Broken/unhealthy production pipelines Debugging is straightforward.
Easy to version features via git/packaging.
Runtime data quality checks.
Hamilton functions:
Scaling Humans/Teams
36
# client_features.py
@tag(owner='Data-Science', pii='False')
@check_output(data_type=np.float64, range=(-5.0, 5.0), allow_nans=False)
def height_zero_mean_unit_variance(height_zero_mean: pd.Series,
height_std_dev: pd.Series) -> pd.Series:
"""Zero mean unit variance value of height"""
return height_zero_mean / height_std_dev
Hamilton Features:
● Unit testing ✅ always possible
● Documentation ✅ tags, visualization, function doc
● Modularity/reuse ✅ module curation & drivers
● Central feature definition store ✅ naming, curation, versioning
● Data quality ✅ runtime checks
Code base implications:
1. Functions are always in modules.
2. Driver script, i.e execution script, is decoupled from functions.
Scaling Humans/Teams
37
Module spend_features.py
Module marketing_features.py
Module customer_features.py
Driver script 1
> Code reuse from day one!
> Low maintenance to support many driver scripts
Driver script 2
Driver script 3
What is Hamilton?
Why was Hamilton created?
Hamilton @ Stitch Fix
Hamilton for feature engineering
↳ Scaling humans/teams
↳ Scaling compute/data
Plans for the future
Scaling Compute/Data with Hamilton
Hamilton has the following integrations out of the box:
● Ray
○ Single process -> Multiprocessing -> Cluster
● Dask
○ Single process -> Multiprocessing -> Cluster
● Pandas on Spark
○ Uses enables using Pandas Spark API with your Pandas code easily
● Switching to run on Ray/Dask/Pandas on Spark requires:
> Only changing driver.py code*
> Pandas on Spark also needs changing how data is loaded.
39
Scaling Hamilton: Driver Only Change
40
# run.py
from hamilton import driver
import data_loaders
import date_features
import spend_features
config = {...} # config, e.g. data_location
dr = driver.Driver(config,
data_loaders,
date_features,
spend_features)
features_wanted = [...] # choose subset wanted
feature_df = dr.execute(features_wanted)
save(feature_df, 'prod.features')
Hamilton + Ray: Driver Only Change
41
# run.py
from hamilton import driver
import data_loaders
import date_features
import spend_features
config = {...} # config, e.g. data_location
dr = driver.Driver(config,
data_loaders,
date_features,
spend_features)
features_wanted = [...] # choose subset wanted
feature_df = dr.execute(features_wanted)
save(feature_df, 'prod.features')
# run_on_ray.py


from hamilton import base, driver
from hamilton.experimental import h_ray


ray.init()
config = {...}
rga = h_ray.RayGraphAdapter(
result_builder=base.PandasDataFrameResult())
dr = driver.Driver(config,
data_loaders, date_features, spend_features,
adapter=rga)
features_wanted = [...] # choose subset wanted
feature_df = dr.execute(features_wanted,
inputs=date_features)
save(feature_df, 'prod.features')
ray.shutdown()
Hamilton + Dask: Driver Only change
42
# run.py
from hamilton import driver
import data_loaders
import date_features
import spend_features
config = {...} # config, e.g. data_location
dr = driver.Driver(config,
data_loaders,
date_features,
spend_features)
features_wanted = [...] # choose subset wanted
feature_df = dr.execute(features_wanted)
save(feature_df, 'prod.features')
# run_on_dask.py


from hamilton import base, driver
from hamilton.experimental import h_dask


client = Client(Cluster(...)) # dask cluster/client
config = {...}
dga = h_dask.DaskGraphAdapter(client,
result_builder=base.PandasDataFrameResult())
dr = driver.Driver(config,
data_loaders, date_features, spend_features,
adapter=dga)
features_wanted = [...] # choose subset wanted
feature_df = dr.execute(features_wanted,
inputs=date_features)
save(feature_df, 'prod.features')
client.shutdown()
Hamilton + Spark: Driver Change + loader
43
# run.py
from hamilton import driver
import data_loaders
import date_features
import spend_features
config = {...} # config, e.g. data_location
dr = driver.Driver(config,
data_loaders,
date_features,
spend_features)
features_wanted = [...] # choose subset wanted
feature_df = dr.execute(features_wanted)
save(feature_df, 'prod.features')
# run_on_pandas_on_spark.py


import pyspark.pandas as ps
from hamilton import base, driver
from hamilton.experimental import h_spark


spark = SparkSession.builder.getOrCreate()
ps.set_option(...)
config = {...}
skga = h_dask.SparkKoalasGraphAdapter(spark, spine='COLUMN_NAME',
result_builder=base.PandasDataFrameResult())
dr = driver.Driver(config,
spark_data_loaders, date_features,spend_features,
adapter=skga)
features_wanted = [...] # choose subset wanted
feature_df = dr.execute(features_wanted,
inputs=date_features)
save(feature_df, 'prod.features')
spark.stop()
Hamilton + Ray/Dask: How Does it Work?
44
# FUNCTIONS
def c(a: pd.Series, b: pd.Series) -> pd.Series:
"""Sums a with b"""
return a + b
def d(c: pd.Series) -> pd.Series:
"""Transforms C to ..."""
new_column = _transform_logic(c)
return new_column
# DRIVER
from hamilton import base, driver
from hamilton.experimental import h_ray


ray.init()
config = {...}
rga = h_ray.RayGraphAdapter(
result_builder=base.PandasDataFrameResult())
dr = driver.Driver(config,
data_loaders,
date_features,
spend_features,
adapter=rga)
features_wanted = [...] # choose subset wanted
feature_df = dr.execute(features_wanted,
inputs=date_features)
save(feature_df, 'prod.features')
ray.shutdown()
# DAG
Hamilton + Ray/Dask: How Does it Work?
45
# FUNCTIONS
def c(a: pd.Series, b: pd.Series) -> pd.Series:
"""Sums a with b"""
return a + b
def d(c: pd.Series) -> pd.Series:
"""Transforms C to ..."""
new_column = _transform_logic(c)
return new_column
# DRIVER
from hamilton import base, driver
from hamilton.experimental import h_ray


ray.init()
config = {...}
rga = h_ray.RayGraphAdapter(
result_builder=base.PandasDataFrameResult())
dr = driver.Driver(config,
data_loaders,
date_features,
spend_features,
adapter=rga)
features_wanted = [...] # choose subset wanted
feature_df = dr.execute(features_wanted,
inputs=date_features)
save(feature_df, 'prod.features')
ray.shutdown()
# Delegate to Ray/Dask


ray.remote(
node.callable).remote(**kwargs)
—---—---—---—---—---
dask.delayed(node.callable)(**kwargs)
# DAG
Hamilton + Spark: How Does it Work?
46
# FUNCTIONS
def c(a: pd.Series, b: pd.Series) -> pd.Series:
"""Sums a with b"""
return a + b
def d(c: pd.Series) -> pd.Series:
"""Transforms C to ..."""
new_column = _transform_logic(c)
return new_column
# DRIVER
from hamilton import base, driver
from hamilton.experimental import h_ray


ray.init()
config = {...}
rga = h_ray.RayGraphAdapter(
result_builder=base.PandasDataFrameResult())
dr = driver.Driver(config,
data_loaders,
date_features,
spend_features,
adapter=rga)
features_wanted = [...] # choose subset wanted
feature_df = dr.execute(features_wanted,
inputs=date_features)
save(feature_df, 'prod.features')
ray.shutdown()
# With Spark


Change these to load
Spark “Pandas”
equivalent object
instead.
Spark will take care
of the rest.
# DAG
Hamilton + Ray/Dask/Pandas on Spark: Caveats
Things to think about:
1. Serialization:
a. Hamilton defaults to serialization methodology of these frameworks.
2. Memory:
a. Defaults should work. But fine tuning memory on a “function” basis is not exposed.
3. Python dependencies:
a. You need to manage them.
4. Looking to graduate these APIs from experimental status
>> Looking for contributions here to extend support in Hamilton! <<
47
What is Hamilton?
Why was Hamilton created?
Hamilton @ Stitch Fix
Hamilton for feature engineering
↳ Scaling humans/teams
↳ Scaling compute/data
Plans for the future
Hamilton: Plans for the Future
49
● Async support for an online context
● Improvements to/configurability around runtime data checks
● Integration with external data sources
● Integrations with more OS compute frameworks
Modin · Ibis · FlyteKit · Metaflow · Dagster · Prefect
● <your idea here>
Summary: Hamilton for Feature/Data Engineering
● Hamilton is a declarative paradigm to describe data/feature
transformations.
○ Embeddable anywhere that runs python.
● It grew out of a need to tame a feature code base.
○ It’ll make yours better too!
● The Hamilton paradigm scales humans/teams through software
engineering best practices.
● Hamilton + Ray/Dask/Pandas on Spark enables one to:
scale humans/teams and scale data/compute.
● Hamilton is going exciting places!
50
Give Hamilton a Try!
We’d love your feedback
> pip install sf-hamilton
⭐ on github (https://github.com/stitchfix/hamilton)
☑ create & vote on issues on github
📣 join us on on Slack
(https://join.slack.com/t/hamilton-opensource/shared_invite/zt-1bjs72asx-wcUTgH7q7QX1igiQ5bbdcg)
51
Thank you.
Questions?
https://twitter.com/elijahbenizzy
https://www.linkedin.com/in/elijahbenizzy/
https://github.com/stitchfix/hamilton
elijah.benizzy@stitchfix.com

Weitere Àhnliche Inhalte

Ähnlich wie Data Con LA 2022-Pre-recorded - Hamilton, General Purpose framework for Scalable Feature Engineering

C++ functions presentation by DHEERAJ KATARIA
C++ functions presentation by DHEERAJ KATARIAC++ functions presentation by DHEERAJ KATARIA
C++ functions presentation by DHEERAJ KATARIADheeraj Kataria
 
BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up Craig Schumann
 
London F-Sharp User Group : Don Syme on F# - 09/09/2010
London F-Sharp User Group : Don Syme on F# - 09/09/2010London F-Sharp User Group : Don Syme on F# - 09/09/2010
London F-Sharp User Group : Don Syme on F# - 09/09/2010Skills Matter
 
[241] AI ìč© ê°œë°œì— ì‚Źìš©ë˜ëŠ” 엔지니얎링
[241] AI ìč© ê°œë°œì— ì‚Źìš©ë˜ëŠ” 엔지니얎링[241] AI ìč© ê°œë°œì— ì‚Źìš©ë˜ëŠ” 엔지니얎링
[241] AI ìč© ê°œë°œì— ì‚Źìš©ë˜ëŠ” 엔지니얎링NAVER D2
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuningYosuke Mizutani
 
䞍深䞍æ·șïŒŒćž¶äœ èȘè­˜ LLVM (Found LLVM in your life)
䞍深䞍æ·șïŒŒćž¶äœ èȘè­˜ LLVM (Found LLVM in your life)䞍深䞍æ·șïŒŒćž¶äœ èȘè­˜ LLVM (Found LLVM in your life)
䞍深䞍æ·șïŒŒćž¶äœ èȘè­˜ LLVM (Found LLVM in your life)Douglas Chen
 
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowHow to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowLucas Arruda
 
TDC2017 | SĂŁo Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | SĂŁo Paulo - Trilha BigData How we figured out we had a SRE team at ...TDC2017 | SĂŁo Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | SĂŁo Paulo - Trilha BigData How we figured out we had a SRE team at ...tdc-globalcode
 
TenYearsCPOptimizer
TenYearsCPOptimizerTenYearsCPOptimizer
TenYearsCPOptimizerPaulShawIBM
 
C++ Functions
C++ FunctionsC++ Functions
C++ Functionssathish sak
 
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018Gleb Otochkin
 
Joel Falcou, Boost.SIMD
Joel Falcou, Boost.SIMDJoel Falcou, Boost.SIMD
Joel Falcou, Boost.SIMDSergey Platonov
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Flyte kubecon 2019 SanDiego
Flyte kubecon 2019 SanDiegoFlyte kubecon 2019 SanDiego
Flyte kubecon 2019 SanDiegoKetanUmare
 
C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2Hammad Rajjoub
 
C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2Hammad Rajjoub
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesJim Dowling
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019Jim Dowling
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesJeff Bertman
 

Ähnlich wie Data Con LA 2022-Pre-recorded - Hamilton, General Purpose framework for Scalable Feature Engineering (20)

C++ functions presentation by DHEERAJ KATARIA
C++ functions presentation by DHEERAJ KATARIAC++ functions presentation by DHEERAJ KATARIA
C++ functions presentation by DHEERAJ KATARIA
 
BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up
 
London F-Sharp User Group : Don Syme on F# - 09/09/2010
London F-Sharp User Group : Don Syme on F# - 09/09/2010London F-Sharp User Group : Don Syme on F# - 09/09/2010
London F-Sharp User Group : Don Syme on F# - 09/09/2010
 
[241] AI ìč© ê°œë°œì— ì‚Źìš©ë˜ëŠ” 엔지니얎링
[241] AI ìč© ê°œë°œì— ì‚Źìš©ë˜ëŠ” 엔지니얎링[241] AI ìč© ê°œë°œì— ì‚Źìš©ë˜ëŠ” 엔지니얎링
[241] AI ìč© ê°œë°œì— ì‚Źìš©ë˜ëŠ” 엔지니얎링
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
䞍深䞍æ·șïŒŒćž¶äœ èȘè­˜ LLVM (Found LLVM in your life)
䞍深䞍æ·șïŒŒćž¶äœ èȘè­˜ LLVM (Found LLVM in your life)䞍深䞍æ·șïŒŒćž¶äœ èȘè­˜ LLVM (Found LLVM in your life)
䞍深䞍æ·șïŒŒćž¶äœ èȘè­˜ LLVM (Found LLVM in your life)
 
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowHow to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
How to build an ETL pipeline with Apache Beam on Google Cloud Dataflow
 
TDC2017 | SĂŁo Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | SĂŁo Paulo - Trilha BigData How we figured out we had a SRE team at ...TDC2017 | SĂŁo Paulo - Trilha BigData How we figured out we had a SRE team at ...
TDC2017 | SĂŁo Paulo - Trilha BigData How we figured out we had a SRE team at ...
 
TenYearsCPOptimizer
TenYearsCPOptimizerTenYearsCPOptimizer
TenYearsCPOptimizer
 
C++ Functions
C++ FunctionsC++ Functions
C++ Functions
 
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
One bridge to connect them all. Oracle GoldenGate for Big Data.UKOUG Tech 2018
 
Joel Falcou, Boost.SIMD
Joel Falcou, Boost.SIMDJoel Falcou, Boost.SIMD
Joel Falcou, Boost.SIMD
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Flyte kubecon 2019 SanDiego
Flyte kubecon 2019 SanDiegoFlyte kubecon 2019 SanDiego
Flyte kubecon 2019 SanDiego
 
C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2
 
C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2C:\Fakepath\Combating Software Entropy 2
C:\Fakepath\Combating Software Entropy 2
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
 
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and Practices
 

Mehr von Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Mehr von Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

KĂŒrzlich hochgeladen

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceDelhi Call girls
 

KĂŒrzlich hochgeladen (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
 

Data Con LA 2022-Pre-recorded - Hamilton, General Purpose framework for Scalable Feature Engineering

  • 1. Hamilton A General Purpose Micro-Framework for Scalable Feature Engineering (in python!) Elijah ben Izzy Data Platform Engineer, Stitch Fix
  • 2. 2 Data Con 2021 Stitch Fix Shop at your personal curated store. Check out what you like.
  • 3. 3 Data Con 2021 Data Science @ Stitch Fix algorithms-tour.stitchïŹx.com - Data Science powers the user experience - 145+ data scientists and data platform engineers
  • 4. Hamilton is Open Source code > pip install sf-hamilton Get started in <15 minutes! Documentation https://hamilton-docs.gitbook.io/ Lots of examples: https://github.com/stitchfix/hamilton/tree/main/examples 4
  • 5. What is Hamilton? Why was Hamilton created? Hamilton @ Stitch Fix Hamilton for feature engineering ↳ Scaling humans/teams ↳ Scaling compute/data Plans for the future The Agenda
  • 6. What is Hamilton? Why was Hamilton created? Hamilton @ Stitch Fix Hamilton for feature engineering ↳ Scaling humans/teams ↳ Scaling compute/data Plans for the future
  • 7. What is Hamilton? A declarative dataflow paradigm. 7
  • 8. What is Hamilton? declarative: naturally express as readable code dataflow: represent transformation of any data paradigm: a novel approach to writing python 8
  • 9. What is Hamilton? declarative: naturally express as readable code dataflow: represent transformation of any data paradigm: a novel approach to writing python 9
  • 10. What is Hamilton? declarative: naturally express as readable code dataflow: represent transformation of any data paradigm: a novel approach to writing python 10
  • 11. Code: 11 def holidays(year: pd.Series, week: pd.Series) -> pd.Series: """Some docs""" return some_library(year, week) def avg_3wk_spend(spend: pd.Series) -> pd.Series: """Some docs""" return spend.rolling(3).mean() def spend_per_signup(spend: pd.Series, signups: pd.Series) -> pd.Series: """Some docs""" return spend / signups def spend_shift_3weeks(spend: pd.Series) -> pd.Series: """Some docs""" return spend.shift(3) def spend_shift_3weeks_per_signup(spend_shift_3weeks: pd.Series, signups: pd.Series) -> pd.Series: """Some docs""" return spend_shift_3weeks / signups User DAG: Result (e.g. DataFrame): Hamilton User Code → Directed Acyclic Graph → Result
  • 12. Old Way vs Hamilton Way: Instead of: You declare: 12 df['c'] = df['a'] + df['b'] df['d'] = transform(df['c']) def c(a: pd.Series, b: pd.Series) -> pd.Series: """Sums a with b""" return a + b def d(c: pd.Series) -> pd.Series: """Transforms C to ...""" new_column = _transform_logic(c) return new_column
  • 13. Instead of: You declare: Inputs == Function Arguments Old Way vs Hamilton Way: 13 df['c'] = df['a'] + df['b'] df['d'] = transform(df['c']) def c(a: pd.Series, b: pd.Series) -> pd.Series: """Sums a with b""" return a + b def d(c: pd.Series) -> pd.Series: """Transforms C to ...""" new_column = _transform_logic(c) return new_column Outputs == Function Name
  • 14. Full Hello World 14 # feature_logic.py def c(a: pd.Series, b: pd.Series) -> pd.Series: """Sums a with b""" return a + b def d(c: pd.Series) -> pd.Series: """Transforms C to ...""" new_column = _transform_logic(c) return new_column # run.py from hamilton import driver import feature_logic dr = driver.Driver({'a': ..., 'b': ...}, feature_logic) df_result = dr.execute(['c', 'd']) print(df_result) Functions: “Driver” - this actually says what and when to execute:
  • 15. Hamilton TL;DR: 1. For each transform (=), you write a function(s). 2. Functions declare a DAG. 3. Hamilton handles DAG execution. 15 c d a b # feature_logic.py def c(a: pd.Series, b: pd.Series) -> pd.Series: """Replaces c = a + b""" return a + b def d(c: pd.Series) -> pd.Series: """Replaces d = transform(c)""" new_column = _transform_logic(c) return new_column # run.py from hamilton import driver import feature_logic dr = driver.Driver({'a': ..., 'b': ...}, feature_logic) df_result = dr.execute(['c', 'd']) print(df_result)
  • 16. What is Hamilton? Why was Hamilton created? Hamilton @ Stitch Fix Hamilton for feature engineering ↳ Scaling humans/teams ↳ Scaling compute/data Plans for the future
  • 17. Backstory: Time Series Forecasting 17 What Hamilton helped solve! Biggest problems here
  • 18. Backstory: TS → Dataframe Creation 18 Columns are functions of other columns
  • 19. Backstory: TS → Dataframe Creation 19 g(f(A,B), 
) h(g(f(A,B), 
), 
) etcđŸ”„
  • 20. Backstory: TS → DF → 🍝 Code 20 df = load_dates() # load date ranges df = load_actuals(df) # load actuals, e.g. spend, signups df['holidays'] = is_holiday(df['year'], df['week']) # holidays df['avg_3wk_spend'] = df['spend'].rolling(3).mean() # moving average of spend df['spend_per_signup'] = df['spend'] / df['signups'] # spend per person signed up df['spend_shift_3weeks'] = df.spend['spend'].shift(3) # shift spend because ... df['spend_shift_3weeks_per_signup'] = df['spend_shift_3weeks'] / df['signups'] def my_special_feature(df: pd.DataFrame) -> pd.Series: return (df['A'] - df['B'] + df['C']) * weights df['special_feature'] = my_special_feature(df) # ... Now scale this code to 1000+ columns & a growing team 😬 Human scaling 😔: ○ Testing / Unit testing 👎 ○ Documentation 👎 ○ Code Reviews 👎 ○ Onboarding 📈 👎 ○ Debugging 📈 👎 Underrated problem!
  • 21. What is Hamilton? Why was Hamilton created? Hamilton @ Stitch Fix Hamilton for feature engineering ↳ Scaling humans/teams ↳ Scaling compute/data Plans for the future
  • 22. Hamilton @ Stitch Fix ● Running in production for 2.5+ years ● Initial use-case manages 4000+ feature definitions ● All feature definitions are: ○ Unit testable ○ Documentation friendly ○ Centrally curated, stored, and versioned in git ● Data science teams ❀ it: ○ Enabled a monthly task to be completed 4x faster ○ Easy to onboard new team members ○ Code reviews are simpler 22
  • 24. What is Hamilton? Why was Hamilton created? Hamilton @ Stitch Fix Hamilton for feature engineering ↳ Scaling humans/teams ↳ Scaling compute/data Plans for the future
  • 25. Hamilton + Feature Engineering: Overview featurization training inference ● Can model this all in Hamilton (if you wanted to) ● We’ll just focus on featurization ○ FYI: Hamilton works for any object type. ■ Here we’ll assume pandas for simplicity. ○ E.g. use Hamilton within Airflow, Dagster, Prefect, Flyte, Metaflow, Kubeflow, etc. 25 Load Data Transform into Features Fit Model(s) Use Model(s)
  • 26. data loading & feature code: 26 def holidays(year: pd.Series, week: pd.Series) -> pd.Series: """Some docs""" return some_library(year, week) def avg_3wk_spend(spend: pd.Series) -> pd.Series: """Some docs""" return spend.rolling(3).mean() def spend_per_signup(spend: pd.Series, signups: pd.Series) -> pd.Series: """Some docs""" return spend / signups def spend_shift_3weeks(spend: pd.Series) -> pd.Series: """Some docs""" return spend.shift(3) def spend_shift_3weeks_per_signup(spend_shift_3weeks: pd.Series, signups: pd.Series) -> pd.Series: """Some docs""" return spend_shift_3weeks / signups via driver: feature dataframe: Modeling Featurization features.py run.py
  • 27. Code that needs to be written: 1. Functions to load data a. normalize/create common index to join on 2. Transform/feature functions a. Optional: model functions 3. Drivers materialize data a. DAG is walked for only what’s needed Modeling Featurization 27 Data Loaders Feature Functions Drivers
  • 28. Code that needs to be written: 1. Functions to load data a. normalize/create common index to join on 2. Transform/feature functions a. Optional: model functions 3. Drivers materialize data a. DAG is walked for only what’s needed Modeling Featurization 28 Data Loaders Drivers Feature Functions
  • 29. Q: Why is feature engineering hard?
  • 30. Q: Why is feature engineering hard? A: Scaling!
  • 31. Scaling: Served Two Ways } } > Human/team: ● Highly coupled code ● Difficulty reusing/understand work ● Broken/unhealthy production pipelines > Machine: ● Data is too big to fit in memory ● Cannot easily parallelize computation 31 Hamilton helps here! Hamilton has integrations here!
  • 32. What is Hamilton? Why was Hamilton created? Hamilton @ Stitch Fix Hamilton for feature engineering ↳ Scaling humans/teams ↳ Scaling compute/data Plans for the future
  • 33. How Hamilton Helps with Human/Team Scaling 33 Highly coupled code Decouples “functions” from use (driver code).
  • 34. How Hamilton Helps with Human/Team Scaling 34 Highly coupled code Decouples “functions” from use (driver code). Difficulty reusing/understand work Functions are curated into modules. Everything is unit testable. Documentation is natural. Forced to align on naming.
  • 35. How Hamilton Helps with Human/Team Scaling 35 Highly coupled code Decouples “functions” from use (driver code). Difficulty reusing/understand work Functions are curated into modules. Everything is unit testable. Documentation is natural. Forced to align on naming. Broken/unhealthy production pipelines Debugging is straightforward. Easy to version features via git/packaging. Runtime data quality checks.
  • 36. Hamilton functions: Scaling Humans/Teams 36 # client_features.py @tag(owner='Data-Science', pii='False') @check_output(data_type=np.float64, range=(-5.0, 5.0), allow_nans=False) def height_zero_mean_unit_variance(height_zero_mean: pd.Series, height_std_dev: pd.Series) -> pd.Series: """Zero mean unit variance value of height""" return height_zero_mean / height_std_dev Hamilton Features: ● Unit testing ✅ always possible ● Documentation ✅ tags, visualization, function doc ● Modularity/reuse ✅ module curation & drivers ● Central feature definition store ✅ naming, curation, versioning ● Data quality ✅ runtime checks
  • 37. Code base implications: 1. Functions are always in modules. 2. Driver script, i.e execution script, is decoupled from functions. Scaling Humans/Teams 37 Module spend_features.py Module marketing_features.py Module customer_features.py Driver script 1 > Code reuse from day one! > Low maintenance to support many driver scripts Driver script 2 Driver script 3
  • 38. What is Hamilton? Why was Hamilton created? Hamilton @ Stitch Fix Hamilton for feature engineering ↳ Scaling humans/teams ↳ Scaling compute/data Plans for the future
  • 39. Scaling Compute/Data with Hamilton Hamilton has the following integrations out of the box: ● Ray ○ Single process -> Multiprocessing -> Cluster ● Dask ○ Single process -> Multiprocessing -> Cluster ● Pandas on Spark ○ Uses enables using Pandas Spark API with your Pandas code easily ● Switching to run on Ray/Dask/Pandas on Spark requires: > Only changing driver.py code* > Pandas on Spark also needs changing how data is loaded. 39
  • 40. Scaling Hamilton: Driver Only Change 40 # run.py from hamilton import driver import data_loaders import date_features import spend_features config = {...} # config, e.g. data_location dr = driver.Driver(config, data_loaders, date_features, spend_features) features_wanted = [...] # choose subset wanted feature_df = dr.execute(features_wanted) save(feature_df, 'prod.features')
  • 41. Hamilton + Ray: Driver Only Change 41 # run.py from hamilton import driver import data_loaders import date_features import spend_features config = {...} # config, e.g. data_location dr = driver.Driver(config, data_loaders, date_features, spend_features) features_wanted = [...] # choose subset wanted feature_df = dr.execute(features_wanted) save(feature_df, 'prod.features') # run_on_ray.py 
 from hamilton import base, driver from hamilton.experimental import h_ray 
 ray.init() config = {...} rga = h_ray.RayGraphAdapter( result_builder=base.PandasDataFrameResult()) dr = driver.Driver(config, data_loaders, date_features, spend_features, adapter=rga) features_wanted = [...] # choose subset wanted feature_df = dr.execute(features_wanted, inputs=date_features) save(feature_df, 'prod.features') ray.shutdown()
  • 42. Hamilton + Dask: Driver Only change 42 # run.py from hamilton import driver import data_loaders import date_features import spend_features config = {...} # config, e.g. data_location dr = driver.Driver(config, data_loaders, date_features, spend_features) features_wanted = [...] # choose subset wanted feature_df = dr.execute(features_wanted) save(feature_df, 'prod.features') # run_on_dask.py 
 from hamilton import base, driver from hamilton.experimental import h_dask 
 client = Client(Cluster(...)) # dask cluster/client config = {...} dga = h_dask.DaskGraphAdapter(client, result_builder=base.PandasDataFrameResult()) dr = driver.Driver(config, data_loaders, date_features, spend_features, adapter=dga) features_wanted = [...] # choose subset wanted feature_df = dr.execute(features_wanted, inputs=date_features) save(feature_df, 'prod.features') client.shutdown()
  • 43. Hamilton + Spark: Driver Change + loader 43 # run.py from hamilton import driver import data_loaders import date_features import spend_features config = {...} # config, e.g. data_location dr = driver.Driver(config, data_loaders, date_features, spend_features) features_wanted = [...] # choose subset wanted feature_df = dr.execute(features_wanted) save(feature_df, 'prod.features') # run_on_pandas_on_spark.py 
 import pyspark.pandas as ps from hamilton import base, driver from hamilton.experimental import h_spark 
 spark = SparkSession.builder.getOrCreate() ps.set_option(...) config = {...} skga = h_dask.SparkKoalasGraphAdapter(spark, spine='COLUMN_NAME', result_builder=base.PandasDataFrameResult()) dr = driver.Driver(config, spark_data_loaders, date_features,spend_features, adapter=skga) features_wanted = [...] # choose subset wanted feature_df = dr.execute(features_wanted, inputs=date_features) save(feature_df, 'prod.features') spark.stop()
  • 44. Hamilton + Ray/Dask: How Does it Work? 44 # FUNCTIONS def c(a: pd.Series, b: pd.Series) -> pd.Series: """Sums a with b""" return a + b def d(c: pd.Series) -> pd.Series: """Transforms C to ...""" new_column = _transform_logic(c) return new_column # DRIVER from hamilton import base, driver from hamilton.experimental import h_ray 
 ray.init() config = {...} rga = h_ray.RayGraphAdapter( result_builder=base.PandasDataFrameResult()) dr = driver.Driver(config, data_loaders, date_features, spend_features, adapter=rga) features_wanted = [...] # choose subset wanted feature_df = dr.execute(features_wanted, inputs=date_features) save(feature_df, 'prod.features') ray.shutdown() # DAG
  • 45. Hamilton + Ray/Dask: How Does it Work? 45 # FUNCTIONS def c(a: pd.Series, b: pd.Series) -> pd.Series: """Sums a with b""" return a + b def d(c: pd.Series) -> pd.Series: """Transforms C to ...""" new_column = _transform_logic(c) return new_column # DRIVER from hamilton import base, driver from hamilton.experimental import h_ray 
 ray.init() config = {...} rga = h_ray.RayGraphAdapter( result_builder=base.PandasDataFrameResult()) dr = driver.Driver(config, data_loaders, date_features, spend_features, adapter=rga) features_wanted = [...] # choose subset wanted feature_df = dr.execute(features_wanted, inputs=date_features) save(feature_df, 'prod.features') ray.shutdown() # Delegate to Ray/Dask 
 ray.remote( node.callable).remote(**kwargs) —---—---—---—---—--- dask.delayed(node.callable)(**kwargs) # DAG
  • 46. Hamilton + Spark: How Does it Work? 46 # FUNCTIONS def c(a: pd.Series, b: pd.Series) -> pd.Series: """Sums a with b""" return a + b def d(c: pd.Series) -> pd.Series: """Transforms C to ...""" new_column = _transform_logic(c) return new_column # DRIVER from hamilton import base, driver from hamilton.experimental import h_ray 
 ray.init() config = {...} rga = h_ray.RayGraphAdapter( result_builder=base.PandasDataFrameResult()) dr = driver.Driver(config, data_loaders, date_features, spend_features, adapter=rga) features_wanted = [...] # choose subset wanted feature_df = dr.execute(features_wanted, inputs=date_features) save(feature_df, 'prod.features') ray.shutdown() # With Spark 
 Change these to load Spark “Pandas” equivalent object instead. Spark will take care of the rest. # DAG
  • 47. Hamilton + Ray/Dask/Pandas on Spark: Caveats Things to think about: 1. Serialization: a. Hamilton defaults to serialization methodology of these frameworks. 2. Memory: a. Defaults should work. But fine tuning memory on a “function” basis is not exposed. 3. Python dependencies: a. You need to manage them. 4. Looking to graduate these APIs from experimental status >> Looking for contributions here to extend support in Hamilton! << 47
  • 48. What is Hamilton? Why was Hamilton created? Hamilton @ Stitch Fix Hamilton for feature engineering ↳ Scaling humans/teams ↳ Scaling compute/data Plans for the future
  • 49. Hamilton: Plans for the Future 49 ● Async support for an online context ● Improvements to/configurability around runtime data checks ● Integration with external data sources ● Integrations with more OS compute frameworks Modin · Ibis · FlyteKit · Metaflow · Dagster · Prefect ● <your idea here>
  • 50. Summary: Hamilton for Feature/Data Engineering ● Hamilton is a declarative paradigm to describe data/feature transformations. ○ Embeddable anywhere that runs python. ● It grew out of a need to tame a feature code base. ○ It’ll make yours better too! ● The Hamilton paradigm scales humans/teams through software engineering best practices. ● Hamilton + Ray/Dask/Pandas on Spark enables one to: scale humans/teams and scale data/compute. ● Hamilton is going exciting places! 50
  • 51. Give Hamilton a Try! We’d love your feedback > pip install sf-hamilton ⭐ on github (https://github.com/stitchfix/hamilton) ☑ create & vote on issues on github 📣 join us on on Slack (https://join.slack.com/t/hamilton-opensource/shared_invite/zt-1bjs72asx-wcUTgH7q7QX1igiQ5bbdcg) 51