SlideShare a Scribd company logo
1 of 15
Download to read offline
GoDataDriven
PROUDLY PART OF THE XEBIA GROUP
Data Science
Accellerator Program
How we teach
Each session we teach will be interactive. Every month we
give you one interactive lecture and one hackersession.
Both the lecture and hackersession require students to code.
The main distinction is that the lectures have more focus on
theory whereas the hackersessions have more focus on
getting your hands dirty with code.
The hackersessions are meant to be fun and engaging while
giving the students much freedom. We often notice that the
hackersessions end up being the most educational part.
Lecture1: Rstudio stack
The first session is meant to reintroduce programming by introducing students to the Rstudio stack. It
will immediately teach them to use the new dplyr syntax and introduce them to a proper work
environment.The session will conclude with a visit to github and showing students how to create
sharable documents.
- intro to programming
- explain the concept of a dataframe
- r ggplot
- r dplyr
- rmarkdown
- ChickWeight
- git
hackersession: webscraping with R
- rvest + html
- type casting
- ggplot
- dplyr
Use R to scrape some video game websites and figure out which heroes of the storm character makes
the most sense.When this is done we scrape funda and task people to find the best house in their home
town.
Lecture 2: Simulation
The second session will focus on a review of statistics and probability. We
keep the math light and invest much time in simulation exercizes.
• explain basic probability distributions (normal)
• what is wrong with the mean of a distribution?
• what is a correlation.
• PCA/Covariance
• explain simulation cases (birthday problem, casino)
• explain bayesian thinking
• basic hypothesis testing
hackersession banditproblem
The users get an online and offline version of the bandit problem. We turn it
into a game. One bandit problem is offline, one is online. We give users keys
and we ask them to figure out which banner is best.
Lecture 3: iPython stack
During this session we explore the python stack. Python is an all purpose
language that does more than just data science. We first cover how to
write concise elegant code before we delve further into how to do data
science with it.
• command line
• notebooks
• python
• jupyter
• pandas + numpy
• matplotlib
hackersession: build a flask app with pandas
The idea is to build a website that you can query, pandas as a backend.
Lecture 4: Linear Models
In this session students will be exposed to the theory behind classical linear
models as well as more modern machine learning models. The focus will be to
understand how these models work and to get a feeling of when to use which
model. We will show how to run all the models in both R and python so people
understand that you can work independant of the language.
• reminder of statistics + assumptions
• how to measure models: training + test
• linear regression
• logistic regression
• tree models
hackersession: automation in regression + codereview
Build your own automation script that applies many algorithms to many datasets
and benchmark it through brute force. You can use a tool that automates this like
caret or just build it yourself. The goal is to find the most robust algorithm.
Lecture 5: Optimisation Science
Any self respecting course on data science should spend a least a day on the
science of optimisation. Operational Research is the backbone of many machine
learning algorithms as well as a tool to automate decision making. Today will be
a day we talk about decision engineering.
• the maths behind optimisation
• closed form optimisation
• gradient descent
• linear programming
• stochastic gradient descent
• heuristic/genetic approach
hackersession: TSP
We'll play a game. Whoever gets the best travelling salesman solution wins. Pick
any tool you like. Just do whatever and fix this problem within a day. We give
multiple instances of TSP such that students can learn that not every algorithm
will work all the time.
Lecture 6: Non-Linear Models
Although very popular, linear models often fail. The main reason is that not
every dataset is linearly seperable and in this lecture we will delve very
deeply into this.
• linear seperability problem
• support vector machines
• factorization machines
• neural network
hackersession: automation in regression + codereview
Build your own automation script that applies many algorithms to many
datasets and benchmark it through brute force. You can use a tool that
automates this like caret or just build it yourself. The goal is to find the
most robust algorithm. At the end of this
Lecture 7: Clustering & Ensemble
In this session we discuss two distinct but important methods in machine learning:
clustering and ensemble models. Clustering involves classifying unlabbeled data such that
we can predict phenomenon without having labels. Ensemble models are models that
combine multiple models together to create a better one. An example of an ensemble
model is a random forest but it is easy to create your own.
• hierarchical clustering
• kmeans clustering
• HMM clustering
• normalisation
• ensemble theory
• random forests
hackersession: outperforming ensembles
We will do a repeat of the basic algorithms dataset but now it is your job to build an
ensemble that outperforms a randomforest. More difficult datasets will also be handed out,
the goal is to have the students realise that just looking at the data is equally important.
Lecture 8: Natural Language
During this session the students will learn about the basics of data mining texts and
NLP algorithms. Texts are a very different data structure with a lot of different usecases
than we usually have. The end goal of this session is to explain how to make a
language detector with basic technqiues.
• cleaning text
• regular expressions
• nltk
• tf/idf
• bayesian filter
• word2vec
• clustering documents
hackersession: Markov Generator of Lyrics
We will scrape different websites containing song text and we will then try to train a
markov chain to create random sentences. We will also have other texts available for
the students to play with.
Lecture 9:Time series
The problem of prediction in data science becomes different if time is involved. During this
session we discuss how to benefit from taking a time series approach, what common
methods are and we try to create models that can change over time such that they can fit a
real time setting.
• lag variables
• log transform
• autocorrelation
• moving average (window models)
• moving variance
• arma/arima
• real time algoritms
hackersession: stock prediction
We're gonna play a game. Whoever can predict the stockmark the best wins. Use whatever
method, you can only pick a portfolio of max 3 stocks.
Lecture 10: Visualisation
Being able to communicate data clearly is important for a data scientist.
The goal of this session is to explain what makes good visualisations
informative and bad visualisations feel like clutter. We will also discuss
how to make interactive visualisations with d3 and how to connect it to
your own backend.
• review of ggplot2 + tufte theory
• interactivity with Shiny
• basics of front end webdev
• d3
• connecting frontend + backend
hackersession: building a custom dashboard
We will give you four interesting datasets and we will leave it up to you to
turn the dataset into an interesting app.
Lecture 11: Making things scale
Sofar we have only discussed how to handle files that fit on one computer. In
this session we will discuss Hadoop and Spark and how to use them to handle
big datasets. We will focus more on the Spark API because it is more relevant
for data scientists and we will spend a significant amount of time explaining
when to handle something as a big data problem and when you want to avoid it.
• when to refer to big data
• bootstrapping techniques
• hadoop ecosystem/tools (briefly)
• big data spark syntax
hackersession
We let the students datamine a large dataset on a cluster that they will have
• provisioning
• R syntax
• python syntax
Lecture 12:The group determines
There are many possible advanced topics that could be discussed but we would like to leave
the subject of the last lecture open. Preferably the students will find consensus in a new
technology (which surely there is one by now). Otherwise, one of the following subjects can be
chosen:
advanced topics
• feature creation
• computer vision
• bayesian graphical models
• neo4j vs sql vs nosql
• deep learning
• ethical considerations
• legal considerations
• julia
hackersession:
For the last week, each student can work on any project. We are there to help them with
anything.
End Goal
After the course the following tasks should be a no-brainer for students:
• get basic insights out of a .csv within a day even if it is a dirty dataset with either python or
R. this includes things like outlier detection, type casting
• when given a clean dataset, candidate will be able to run three different algorithms in a day
with train/test for regression, clustering or classification.
• recognise when a dataset is too big to handle
• be able to estimate when a project will take a week (shiny app) vs a few months (django
app)
• be able to attend a pydata conference and understand half the talks to the degree that they
can summerize it in a few sentences
• turn a .csv file into dashboard as a an microservice with an api within a day

More Related Content

What's hot

Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to KerasJohn Ramey
 
On being a professional software developer
On being a professional software developerOn being a professional software developer
On being a professional software developerAnton Kirillov
 
Introduction To TensorFlow
Introduction To TensorFlowIntroduction To TensorFlow
Introduction To TensorFlowSpotle.ai
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
Document Classification with Neo4j
Document Classification with Neo4jDocument Classification with Neo4j
Document Classification with Neo4jKenny Bastani
 
Basic ideas on keras framework
Basic ideas on keras frameworkBasic ideas on keras framework
Basic ideas on keras frameworkAlison Marczewski
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...inside-BigData.com
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotSteve Moore
 
Антон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabАнтон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabDiana Dymolazova
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
KERAS Python Tutorial
KERAS Python TutorialKERAS Python Tutorial
KERAS Python TutorialMahmutKAMALAK
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Mark Tabladillo
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Dawen Liang
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Adam Gibson
 

What's hot (20)

Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
 
Practical Deep Learning
Practical Deep LearningPractical Deep Learning
Practical Deep Learning
 
On being a professional software developer
On being a professional software developerOn being a professional software developer
On being a professional software developer
 
Introduction To TensorFlow
Introduction To TensorFlowIntroduction To TensorFlow
Introduction To TensorFlow
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
Document Classification with Neo4j
Document Classification with Neo4jDocument Classification with Neo4j
Document Classification with Neo4j
 
Basic ideas on keras framework
Basic ideas on keras frameworkBasic ideas on keras framework
Basic ideas on keras framework
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
 
TensorFlow
TensorFlowTensorFlow
TensorFlow
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
Антон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLabАнтон Кириллов, ZeptoLab
Антон Кириллов, ZeptoLab
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
KERAS Python Tutorial
KERAS Python TutorialKERAS Python Tutorial
KERAS Python Tutorial
 
BD-ACA week1a
BD-ACA week1aBD-ACA week1a
BD-ACA week1a
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
BD-ACA week2
BD-ACA week2BD-ACA week2
BD-ACA week2
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014
 

Viewers also liked

Eligibility Checker Joint Master Program Data Science and Entrepreneurship
Eligibility Checker Joint Master Program Data Science and EntrepreneurshipEligibility Checker Joint Master Program Data Science and Entrepreneurship
Eligibility Checker Joint Master Program Data Science and EntrepreneurshipJheronimus Academy of Data Science
 
Real time data driven applications (and SQL vs NoSQL databases)
Real time data driven applications (and SQL vs NoSQL databases)Real time data driven applications (and SQL vs NoSQL databases)
Real time data driven applications (and SQL vs NoSQL databases)GoDataDriven
 
Apache Spark Talk for Applied machine learning
Apache Spark Talk for Applied machine learningApache Spark Talk for Applied machine learning
Apache Spark Talk for Applied machine learningGoDataDriven
 
Robustifying Descriptor Instability using Fisher Vectors
Robustifying Descriptor Instability using Fisher VectorsRobustifying Descriptor Instability using Fisher Vectors
Robustifying Descriptor Instability using Fisher VectorsGoDataDriven
 
Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)GoDataDriven
 
Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19GoDataDriven
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScaleGoDataDriven
 
Xebicon 2015 - Go Data Driven NOW!
Xebicon 2015 - Go Data Driven NOW!Xebicon 2015 - Go Data Driven NOW!
Xebicon 2015 - Go Data Driven NOW!GoDataDriven
 
Divolte collector overview
Divolte collector overviewDivolte collector overview
Divolte collector overviewGoDataDriven
 
Bare metal Hadoop provisioning
Bare metal Hadoop provisioningBare metal Hadoop provisioning
Bare metal Hadoop provisioningGoDataDriven
 
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopI Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopGoDataDriven
 
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )GoDataDriven
 
Magic, art or science? Deep learning unraveled
Magic, art or science? Deep learning unraveledMagic, art or science? Deep learning unraveled
Magic, art or science? Deep learning unraveledGoDataDriven
 
Exercise type detection
Exercise type detectionExercise type detection
Exercise type detectionGoDataDriven
 
Yes you can play Monopoly with a Genetic Algorithm - Niels Zeilemaker
Yes you can play Monopoly with a Genetic Algorithm - Niels ZeilemakerYes you can play Monopoly with a Genetic Algorithm - Niels Zeilemaker
Yes you can play Monopoly with a Genetic Algorithm - Niels ZeilemakerGoDataDriven
 
Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterKamran Munshi
 
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, Databricks
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, DatabricksSpark Meetup Amsterdam - Dealing with Bad Actors in ETL, Databricks
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, DatabricksGoDataDriven
 

Viewers also liked (20)

Poems
PoemsPoems
Poems
 
Eligibility Checker Joint Master Program Data Science and Entrepreneurship
Eligibility Checker Joint Master Program Data Science and EntrepreneurshipEligibility Checker Joint Master Program Data Science and Entrepreneurship
Eligibility Checker Joint Master Program Data Science and Entrepreneurship
 
Real time data driven applications (and SQL vs NoSQL databases)
Real time data driven applications (and SQL vs NoSQL databases)Real time data driven applications (and SQL vs NoSQL databases)
Real time data driven applications (and SQL vs NoSQL databases)
 
Apache Spark Talk for Applied machine learning
Apache Spark Talk for Applied machine learningApache Spark Talk for Applied machine learning
Apache Spark Talk for Applied machine learning
 
Robustifying Descriptor Instability using Fisher Vectors
Robustifying Descriptor Instability using Fisher VectorsRobustifying Descriptor Instability using Fisher Vectors
Robustifying Descriptor Instability using Fisher Vectors
 
Nature
NatureNature
Nature
 
Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)Real time data driven applications (SQL vs NoSQL databases)
Real time data driven applications (SQL vs NoSQL databases)
 
Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19Sea Amsterdam 2014 November 19
Sea Amsterdam 2014 November 19
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
Xebicon 2015 - Go Data Driven NOW!
Xebicon 2015 - Go Data Driven NOW!Xebicon 2015 - Go Data Driven NOW!
Xebicon 2015 - Go Data Driven NOW!
 
Divolte collector overview
Divolte collector overviewDivolte collector overview
Divolte collector overview
 
Bare metal Hadoop provisioning
Bare metal Hadoop provisioningBare metal Hadoop provisioning
Bare metal Hadoop provisioning
 
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopI Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
 
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )Embarrassingly parallel database calls with Python (PyData Paris 2015 )
Embarrassingly parallel database calls with Python (PyData Paris 2015 )
 
Magic, art or science? Deep learning unraveled
Magic, art or science? Deep learning unraveledMagic, art or science? Deep learning unraveled
Magic, art or science? Deep learning unraveled
 
Exercise type detection
Exercise type detectionExercise type detection
Exercise type detection
 
Yes you can play Monopoly with a Genetic Algorithm - Niels Zeilemaker
Yes you can play Monopoly with a Genetic Algorithm - Niels ZeilemakerYes you can play Monopoly with a Genetic Algorithm - Niels Zeilemaker
Yes you can play Monopoly with a Genetic Algorithm - Niels Zeilemaker
 
Mgt 2010 principles and practice of management chp 1 and 2
Mgt 2010 principles and practice of management chp 1 and 2Mgt 2010 principles and practice of management chp 1 and 2
Mgt 2010 principles and practice of management chp 1 and 2
 
Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @Twitter
 
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, Databricks
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, DatabricksSpark Meetup Amsterdam - Dealing with Bad Actors in ETL, Databricks
Spark Meetup Amsterdam - Dealing with Bad Actors in ETL, Databricks
 

Similar to DataDrivenAccellerator

DL4J at Workday Meetup
DL4J at Workday MeetupDL4J at Workday Meetup
DL4J at Workday MeetupDavid Kale
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial IntelligenceZavain Dar
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTKAshish Jaiman
 
Introduction To Design Patterns Class 4 Composition vs Inheritance
 Introduction To Design Patterns Class 4 Composition vs Inheritance Introduction To Design Patterns Class 4 Composition vs Inheritance
Introduction To Design Patterns Class 4 Composition vs InheritanceBlue Elephant Consulting
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 
How to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? EdurekaHow to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? EdurekaEdureka!
 
Sacrificing the golden calf of "coding"
Sacrificing the golden calf of "coding"Sacrificing the golden calf of "coding"
Sacrificing the golden calf of "coding"Christian Heilmann
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introductionAdwait Bhave
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 
TeelTech - Advancing Mobile Device Forensics (online version)
TeelTech - Advancing Mobile Device Forensics (online version)TeelTech - Advancing Mobile Device Forensics (online version)
TeelTech - Advancing Mobile Device Forensics (online version)Mike Felch
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxBhagyasriPatel2
 

Similar to DataDrivenAccellerator (20)

DL4J at Workday Meetup
DL4J at Workday MeetupDL4J at Workday Meetup
DL4J at Workday Meetup
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
 
Session 2
Session 2Session 2
Session 2
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
 
Introduction To Design Patterns Class 4 Composition vs Inheritance
 Introduction To Design Patterns Class 4 Composition vs Inheritance Introduction To Design Patterns Class 4 Composition vs Inheritance
Introduction To Design Patterns Class 4 Composition vs Inheritance
 
ML crash course
ML crash courseML crash course
ML crash course
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
kaggle_meet_up
kaggle_meet_upkaggle_meet_up
kaggle_meet_up
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
How to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? EdurekaHow to use Artificial Intelligence with Python? Edureka
How to use Artificial Intelligence with Python? Edureka
 
Sacrificing the golden calf of "coding"
Sacrificing the golden calf of "coding"Sacrificing the golden calf of "coding"
Sacrificing the golden calf of "coding"
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
Deep Learning Demystified
Deep Learning DemystifiedDeep Learning Demystified
Deep Learning Demystified
 
Sci computing using python
Sci computing using pythonSci computing using python
Sci computing using python
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
TeelTech - Advancing Mobile Device Forensics (online version)
TeelTech - Advancing Mobile Device Forensics (online version)TeelTech - Advancing Mobile Device Forensics (online version)
TeelTech - Advancing Mobile Device Forensics (online version)
 
2014 pycon-talk
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
 
UNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptxUNIT_5_Data Wrangling.pptx
UNIT_5_Data Wrangling.pptx
 

More from GoDataDriven

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogGoDataDriven
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenGoDataDriven
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowGoDataDriven
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationGoDataDriven
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerGoDataDriven
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo SanchezGoDataDriven
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectGoDataDriven
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...GoDataDriven
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022GoDataDriven
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022GoDataDriven
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022GoDataDriven
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022GoDataDriven
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022GoDataDriven
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanGoDataDriven
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesGoDataDriven
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...GoDataDriven
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...GoDataDriven
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofGoDataDriven
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019GoDataDriven
 

More from GoDataDriven (20)

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature Catalog
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small Screen
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python project
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de Haan
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven Companies
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 

Recently uploaded

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

DataDrivenAccellerator

  • 1. GoDataDriven PROUDLY PART OF THE XEBIA GROUP Data Science Accellerator Program
  • 2. How we teach Each session we teach will be interactive. Every month we give you one interactive lecture and one hackersession. Both the lecture and hackersession require students to code. The main distinction is that the lectures have more focus on theory whereas the hackersessions have more focus on getting your hands dirty with code. The hackersessions are meant to be fun and engaging while giving the students much freedom. We often notice that the hackersessions end up being the most educational part.
  • 3. Lecture1: Rstudio stack The first session is meant to reintroduce programming by introducing students to the Rstudio stack. It will immediately teach them to use the new dplyr syntax and introduce them to a proper work environment.The session will conclude with a visit to github and showing students how to create sharable documents. - intro to programming - explain the concept of a dataframe - r ggplot - r dplyr - rmarkdown - ChickWeight - git hackersession: webscraping with R - rvest + html - type casting - ggplot - dplyr Use R to scrape some video game websites and figure out which heroes of the storm character makes the most sense.When this is done we scrape funda and task people to find the best house in their home town.
  • 4. Lecture 2: Simulation The second session will focus on a review of statistics and probability. We keep the math light and invest much time in simulation exercizes. • explain basic probability distributions (normal) • what is wrong with the mean of a distribution? • what is a correlation. • PCA/Covariance • explain simulation cases (birthday problem, casino) • explain bayesian thinking • basic hypothesis testing hackersession banditproblem The users get an online and offline version of the bandit problem. We turn it into a game. One bandit problem is offline, one is online. We give users keys and we ask them to figure out which banner is best.
  • 5. Lecture 3: iPython stack During this session we explore the python stack. Python is an all purpose language that does more than just data science. We first cover how to write concise elegant code before we delve further into how to do data science with it. • command line • notebooks • python • jupyter • pandas + numpy • matplotlib hackersession: build a flask app with pandas The idea is to build a website that you can query, pandas as a backend.
  • 6. Lecture 4: Linear Models In this session students will be exposed to the theory behind classical linear models as well as more modern machine learning models. The focus will be to understand how these models work and to get a feeling of when to use which model. We will show how to run all the models in both R and python so people understand that you can work independant of the language. • reminder of statistics + assumptions • how to measure models: training + test • linear regression • logistic regression • tree models hackersession: automation in regression + codereview Build your own automation script that applies many algorithms to many datasets and benchmark it through brute force. You can use a tool that automates this like caret or just build it yourself. The goal is to find the most robust algorithm.
  • 7. Lecture 5: Optimisation Science Any self respecting course on data science should spend a least a day on the science of optimisation. Operational Research is the backbone of many machine learning algorithms as well as a tool to automate decision making. Today will be a day we talk about decision engineering. • the maths behind optimisation • closed form optimisation • gradient descent • linear programming • stochastic gradient descent • heuristic/genetic approach hackersession: TSP We'll play a game. Whoever gets the best travelling salesman solution wins. Pick any tool you like. Just do whatever and fix this problem within a day. We give multiple instances of TSP such that students can learn that not every algorithm will work all the time.
  • 8. Lecture 6: Non-Linear Models Although very popular, linear models often fail. The main reason is that not every dataset is linearly seperable and in this lecture we will delve very deeply into this. • linear seperability problem • support vector machines • factorization machines • neural network hackersession: automation in regression + codereview Build your own automation script that applies many algorithms to many datasets and benchmark it through brute force. You can use a tool that automates this like caret or just build it yourself. The goal is to find the most robust algorithm. At the end of this
  • 9. Lecture 7: Clustering & Ensemble In this session we discuss two distinct but important methods in machine learning: clustering and ensemble models. Clustering involves classifying unlabbeled data such that we can predict phenomenon without having labels. Ensemble models are models that combine multiple models together to create a better one. An example of an ensemble model is a random forest but it is easy to create your own. • hierarchical clustering • kmeans clustering • HMM clustering • normalisation • ensemble theory • random forests hackersession: outperforming ensembles We will do a repeat of the basic algorithms dataset but now it is your job to build an ensemble that outperforms a randomforest. More difficult datasets will also be handed out, the goal is to have the students realise that just looking at the data is equally important.
  • 10. Lecture 8: Natural Language During this session the students will learn about the basics of data mining texts and NLP algorithms. Texts are a very different data structure with a lot of different usecases than we usually have. The end goal of this session is to explain how to make a language detector with basic technqiues. • cleaning text • regular expressions • nltk • tf/idf • bayesian filter • word2vec • clustering documents hackersession: Markov Generator of Lyrics We will scrape different websites containing song text and we will then try to train a markov chain to create random sentences. We will also have other texts available for the students to play with.
  • 11. Lecture 9:Time series The problem of prediction in data science becomes different if time is involved. During this session we discuss how to benefit from taking a time series approach, what common methods are and we try to create models that can change over time such that they can fit a real time setting. • lag variables • log transform • autocorrelation • moving average (window models) • moving variance • arma/arima • real time algoritms hackersession: stock prediction We're gonna play a game. Whoever can predict the stockmark the best wins. Use whatever method, you can only pick a portfolio of max 3 stocks.
  • 12. Lecture 10: Visualisation Being able to communicate data clearly is important for a data scientist. The goal of this session is to explain what makes good visualisations informative and bad visualisations feel like clutter. We will also discuss how to make interactive visualisations with d3 and how to connect it to your own backend. • review of ggplot2 + tufte theory • interactivity with Shiny • basics of front end webdev • d3 • connecting frontend + backend hackersession: building a custom dashboard We will give you four interesting datasets and we will leave it up to you to turn the dataset into an interesting app.
  • 13. Lecture 11: Making things scale Sofar we have only discussed how to handle files that fit on one computer. In this session we will discuss Hadoop and Spark and how to use them to handle big datasets. We will focus more on the Spark API because it is more relevant for data scientists and we will spend a significant amount of time explaining when to handle something as a big data problem and when you want to avoid it. • when to refer to big data • bootstrapping techniques • hadoop ecosystem/tools (briefly) • big data spark syntax hackersession We let the students datamine a large dataset on a cluster that they will have • provisioning • R syntax • python syntax
  • 14. Lecture 12:The group determines There are many possible advanced topics that could be discussed but we would like to leave the subject of the last lecture open. Preferably the students will find consensus in a new technology (which surely there is one by now). Otherwise, one of the following subjects can be chosen: advanced topics • feature creation • computer vision • bayesian graphical models • neo4j vs sql vs nosql • deep learning • ethical considerations • legal considerations • julia hackersession: For the last week, each student can work on any project. We are there to help them with anything.
  • 15. End Goal After the course the following tasks should be a no-brainer for students: • get basic insights out of a .csv within a day even if it is a dirty dataset with either python or R. this includes things like outlier detection, type casting • when given a clean dataset, candidate will be able to run three different algorithms in a day with train/test for regression, clustering or classification. • recognise when a dataset is too big to handle • be able to estimate when a project will take a week (shiny app) vs a few months (django app) • be able to attend a pydata conference and understand half the talks to the degree that they can summerize it in a few sentences • turn a .csv file into dashboard as a an microservice with an api within a day