SlideShare a Scribd company logo
1 of 10
Download to read offline
SAME DATA.
BETTER RESULTS.
PAUL SALAZAR
PAUL@SKYTREE.NET!
1
SKYTREE’S FOCUS
"
PRODUCTION GRADE"
MACHINE LEARNING
Machine learning: the modern science of finding patterns and making predictions from data.!
aka: multivariate statistics, data mining, pattern recognition, or advanced/predictive analytics.!
Machine Learning Use Cases!
Predict categories and classes!
Predict values and numbers!
Grouping and segmentation!
Detection and characterization!
Visualization and reduction!
Find similar items !
Classification !
Regression!
Clustering!
Density Estimation !
Dimension Reduction!
Multidimensional Querying!
Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest
Neighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine,
2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression
Recommendations Predictions
Outlier
Detection
What are the current options for ML for Big Data!
1.  Just use a subset of the data!!
–  e.g. just take the first 1,000 rows. Result to expect: Capture only
the broadest patterns. à Lower accuracy."
2.  Just use a simple ML method!!
–  e.g. use logistic regression instead of nonlinear SVM. Result to
expect: Entire types of patterns cannot be found. à Lower
accuracy."
3.  Just use simple parallelism/MapReduce!!
–  i.e. replace all the for-loops with parallel ones. Result to expect:
Only the simplest of ML methods (not O(N2)/O(N3)) can be
significantly sped up this way. à See #2."
4.  Just throw it in the cloud!!
–  i.e. somehow use the large compute power of the cloud. Result
to expect: The cost of sending it to the cloud is even greater than
the compute cost. à See #1.  See also #3."
Skytree’s Unique Differentiation:

Fundamental Technology Breakthrough!
Complexity of State-of-the-Art Machine Learning methods:!
1.  Querying: all-nearest-neighbors O(N2)!
2.  Density estimation: kernel density estimation O(N2), kernel conditional density est.
O(N3) !
3.  Classification: logistic regression, decision tree, neural nets, nearest-neighbor 

classifier O(N2), kernel discriminant O(N2), support vector machine O(N3), !
4.  Regression: linear regression, LASSO, kernel regression O(N2), regression tree, 

Gaussian process regression O(N3)!
5.  Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), 

maximum variance unfolding O(N3); Gaussian graphical models, discrete graphical
models!
6.  Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)!
7.  Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 

2-sample testing O(Nn), n=2, 3, 4, …!
►  Unfortunately O(N2), O(N3) are computationally prohibitive for big data!
Skytree has invented a way to reduce the complexity of above
methods from O(N2) and O(N3) to O(N) or O(N log N).
5
Performance!
Up to 10,000x !
speedups!
(on one CPU)!
6
How Does Skytree Do This?!
7
Deep knowledge of algorithms
Drawing from the latest from academia
Smart programming
Efficient ways to compute order N(2) and N(3)
Distributed systems
Take advantage of parallel computing speed
Team!
8
Martin Hack, CEO & Co-Founder

Sun, GreenBorder (Google)!
Alexander Gray, PhD, CTO & Co-Founder

Leading Light for Large-Scale, Fast Algorithms!
Paul Salazar, VP Sales

RedHat, Greenplum!
Leland Wilkinson, PhD, VP Data Visualization

Creator of SYSTAT (SPSS/IBM).!
Tim Marsland, PhD, VP Engineering

Sun Fellow, CTO Software, Apple, Oracle!
!
!
!
EXECUTIVE
TEAM!
BOARD OF
DIRECTORS!
Rick Lewis, USVP

Noah Doyle, Javelin Venture Partners!
David Toth, Founder and CEO NetRatings (Nielsen)!
Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’!
Prof. David Patterson, UC Berkeley: systems (inventor RISC, RAID)!
Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)!
Prof. James Demmel, UC Berkeley: high-performance computing!
INVESTORS!
TECH!
ADVISORY!
BOARD!
USVP, Javelin Venture Partners, Scott McNealy, UPS
Product Overview!
9
Skytree Adviser
for Desktop
Data Science for Everyone
Skytree Server
for Enterprises
Enterprise Machine Learning
•  Predict Categories/Classes
•  Detect Anomalies
•  Find Trends
•  Predict Values/Numbers
•  Identify Patterns
•  Find Outliers
Advanced Analytics:
Thank you for learning about Skytree
Read more at www.skytree.net
!
•  We’re hiring: check out our careers page.!
•  Download Skytree Adviser for Free.!
•  Pick up a T-Shirt.!

More Related Content

What's hot

Slide 1
Slide 1Slide 1
Slide 1
butest
 

What's hot (18)

Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
Research Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories MetadataResearch Papers Recommender based on Digital Repositories Metadata
Research Papers Recommender based on Digital Repositories Metadata
 
NBITSearch. Features.
NBITSearch. Features.NBITSearch. Features.
NBITSearch. Features.
 
Object multifunctional indexing with an open API
Object multifunctional indexing with an open API Object multifunctional indexing with an open API
Object multifunctional indexing with an open API
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
PyTables
PyTablesPyTables
PyTables
 
Large Data Analyze With PyTables
Large Data Analyze With PyTablesLarge Data Analyze With PyTables
Large Data Analyze With PyTables
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
RasterFrames + STAC
RasterFrames + STACRasterFrames + STAC
RasterFrames + STAC
 
Slide 1
Slide 1Slide 1
Slide 1
 
Similar image search
Similar image searchSimilar image search
Similar image search
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013Big dataanalyticsbeyondhadoop public_20_june_2013
Big dataanalyticsbeyondhadoop public_20_june_2013
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 

Similar to Skytree big data london meetup - may 2013

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
butest
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 

Similar to Skytree big data london meetup - may 2013 (20)

Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Python for data science
Python for data sciencePython for data science
Python for data science
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Mat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports DataMat189: Cluster Analysis with NBA Sports Data
Mat189: Cluster Analysis with NBA Sports Data
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Machine Learning with JavaScript
Machine Learning with JavaScriptMachine Learning with JavaScript
Machine Learning with JavaScript
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2D
 
Fr pca lda
Fr pca ldaFr pca lda
Fr pca lda
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Skytree big data london meetup - may 2013

  • 1. SAME DATA. BETTER RESULTS. PAUL SALAZAR PAUL@SKYTREE.NET! 1
  • 2. SKYTREE’S FOCUS " PRODUCTION GRADE" MACHINE LEARNING Machine learning: the modern science of finding patterns and making predictions from data.! aka: multivariate statistics, data mining, pattern recognition, or advanced/predictive analytics.!
  • 3. Machine Learning Use Cases! Predict categories and classes! Predict values and numbers! Grouping and segmentation! Detection and characterization! Visualization and reduction! Find similar items ! Classification ! Regression! Clustering! Density Estimation ! Dimension Reduction! Multidimensional Querying! Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest Neighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine, 2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression Recommendations Predictions Outlier Detection
  • 4. What are the current options for ML for Big Data! 1.  Just use a subset of the data!! –  e.g. just take the first 1,000 rows. Result to expect: Capture only the broadest patterns. à Lower accuracy." 2.  Just use a simple ML method!! –  e.g. use logistic regression instead of nonlinear SVM. Result to expect: Entire types of patterns cannot be found. à Lower accuracy." 3.  Just use simple parallelism/MapReduce!! –  i.e. replace all the for-loops with parallel ones. Result to expect: Only the simplest of ML methods (not O(N2)/O(N3)) can be significantly sped up this way. à See #2." 4.  Just throw it in the cloud!! –  i.e. somehow use the large compute power of the cloud. Result to expect: The cost of sending it to the cloud is even greater than the compute cost. à See #1.  See also #3."
  • 5. Skytree’s Unique Differentiation:
 Fundamental Technology Breakthrough! Complexity of State-of-the-Art Machine Learning methods:! 1.  Querying: all-nearest-neighbors O(N2)! 2.  Density estimation: kernel density estimation O(N2), kernel conditional density est. O(N3) ! 3.  Classification: logistic regression, decision tree, neural nets, nearest-neighbor 
 classifier O(N2), kernel discriminant O(N2), support vector machine O(N3), ! 4.  Regression: linear regression, LASSO, kernel regression O(N2), regression tree, 
 Gaussian process regression O(N3)! 5.  Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), 
 maximum variance unfolding O(N3); Gaussian graphical models, discrete graphical models! 6.  Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)! 7.  Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 
 2-sample testing O(Nn), n=2, 3, 4, …! ►  Unfortunately O(N2), O(N3) are computationally prohibitive for big data! Skytree has invented a way to reduce the complexity of above methods from O(N2) and O(N3) to O(N) or O(N log N). 5
  • 6. Performance! Up to 10,000x ! speedups! (on one CPU)! 6
  • 7. How Does Skytree Do This?! 7 Deep knowledge of algorithms Drawing from the latest from academia Smart programming Efficient ways to compute order N(2) and N(3) Distributed systems Take advantage of parallel computing speed
  • 8. Team! 8 Martin Hack, CEO & Co-Founder
 Sun, GreenBorder (Google)! Alexander Gray, PhD, CTO & Co-Founder
 Leading Light for Large-Scale, Fast Algorithms! Paul Salazar, VP Sales
 RedHat, Greenplum! Leland Wilkinson, PhD, VP Data Visualization
 Creator of SYSTAT (SPSS/IBM).! Tim Marsland, PhD, VP Engineering
 Sun Fellow, CTO Software, Apple, Oracle! ! ! ! EXECUTIVE TEAM! BOARD OF DIRECTORS! Rick Lewis, USVP
 Noah Doyle, Javelin Venture Partners! David Toth, Founder and CEO NetRatings (Nielsen)! Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’! Prof. David Patterson, UC Berkeley: systems (inventor RISC, RAID)! Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)! Prof. James Demmel, UC Berkeley: high-performance computing! INVESTORS! TECH! ADVISORY! BOARD! USVP, Javelin Venture Partners, Scott McNealy, UPS
  • 9. Product Overview! 9 Skytree Adviser for Desktop Data Science for Everyone Skytree Server for Enterprises Enterprise Machine Learning •  Predict Categories/Classes •  Detect Anomalies •  Find Trends •  Predict Values/Numbers •  Identify Patterns •  Find Outliers Advanced Analytics:
  • 10. Thank you for learning about Skytree Read more at www.skytree.net ! •  We’re hiring: check out our careers page.! •  Download Skytree Adviser for Free.! •  Pick up a T-Shirt.!