Suche senden
Hochladen
New directions for mahout
•
Als PPTX, PDF herunterladen
•
1 gefällt mir
•
552 views
MapR Technologies
Folgen
Technologie
Melden
Teilen
Melden
Teilen
1 von 36
Jetzt herunterladen
Empfohlen
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
Ted Dunning
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
MapR Technologies
Dealing with an Upside Down Internet
Dealing with an Upside Down Internet
MapR Technologies
Apache drill
Apache drill
MapR Technologies
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
Apache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
Philly DB MapR Overview
Philly DB MapR Overview
MapR Technologies
Dchug m7-30 apr2013
Dchug m7-30 apr2013
jdfiori
Empfohlen
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
Ted Dunning
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
MapR Technologies
Dealing with an Upside Down Internet
Dealing with an Upside Down Internet
MapR Technologies
Apache drill
Apache drill
MapR Technologies
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
Apache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
Philly DB MapR Overview
Philly DB MapR Overview
MapR Technologies
Dchug m7-30 apr2013
Dchug m7-30 apr2013
jdfiori
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
DataWorks Summit/Hadoop Summit
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
DataWorks Summit
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
DataWorks Summit/Hadoop Summit
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Adam Doyle
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
MapR Technologies
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
Milind Bhandarkar
News From Mahout
News From Mahout
MapR Technologies
Summit EU Machine Learning
Summit EU Machine Learning
MapR Technologies
Weitere ähnliche Inhalte
Was ist angesagt?
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
DataWorks Summit/Hadoop Summit
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
DataWorks Summit
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
DataWorks Summit/Hadoop Summit
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Adam Doyle
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
DataWorks Summit
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
MapR Technologies
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
Gwen (Chen) Shapira
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
Milind Bhandarkar
Was ist angesagt?
(20)
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
Andere mochten auch
News From Mahout
News From Mahout
MapR Technologies
Summit EU Machine Learning
Summit EU Machine Learning
MapR Technologies
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
MapR Technologies
Drill at the Chicago Hug
Drill at the Chicago Hug
MapR Technologies
Drill Lightning London Big Data
Drill Lightning London Big Data
MapR Technologies
Cmu 2011 09.pptx
Cmu 2011 09.pptx
MapR Technologies
Andere mochten auch
(6)
News From Mahout
News From Mahout
Summit EU Machine Learning
Summit EU Machine Learning
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
Drill at the Chicago Hug
Drill at the Chicago Hug
Drill Lightning London Big Data
Drill Lightning London Big Data
Cmu 2011 09.pptx
Cmu 2011 09.pptx
Ähnlich wie New directions for mahout
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
MapR Technologies
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning Clustering
MapR Technologies
Boston hug-2012-07
Boston hug-2012-07
Ted Dunning
New Directions for Mahout
New Directions for Mahout
Ted Dunning
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
MapR Technologies
Graphlab dunning-clustering
Graphlab dunning-clustering
Ted Dunning
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
Ted Dunning
London Data Science - Super-Fast Clustering Report
London Data Science - Super-Fast Clustering Report
MapR Technologies
Devoxx Real-Time Learning
Devoxx Real-Time Learning
MapR Technologies
Introduction to Mahout
Introduction to Mahout
Ted Dunning
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUG
MapR Technologies
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
Sanket Shikhar
Training Neural Networks
Training Neural Networks
Databricks
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
CodeOps Technologies LLP
House price prediction
House price prediction
SabahBegum
London hug
London hug
MapR Technologies
Operationalizing SDN
Operationalizing SDN
ADVA
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
Ted Dunning
What's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache Mahout
MapR Technologies
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
Robert Grossman
Ähnlich wie New directions for mahout
(20)
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning Clustering
Boston hug-2012-07
Boston hug-2012-07
New Directions for Mahout
New Directions for Mahout
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
Graphlab dunning-clustering
Graphlab dunning-clustering
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
London Data Science - Super-Fast Clustering Report
London Data Science - Super-Fast Clustering Report
Devoxx Real-Time Learning
Devoxx Real-Time Learning
Introduction to Mahout
Introduction to Mahout
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUG
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
Training Neural Networks
Training Neural Networks
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
House price prediction
House price prediction
London hug
London hug
Operationalizing SDN
Operationalizing SDN
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
What's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache Mahout
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
Mehr von MapR Technologies
Converging your data landscape
Converging your data landscape
MapR Technologies
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
MapR Technologies
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
MapR Technologies
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
Mehr von MapR Technologies
(20)
Converging your data landscape
Converging your data landscape
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
Kürzlich hochgeladen
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
The Digital Insurer
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Allon Mureinik
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Enterprise Knowledge
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
Kürzlich hochgeladen
(20)
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
New directions for mahout
1.
1©MapR Technologies -
Confidential New Directions in Mahout
2.
2©MapR Technologies -
Confidential Cut Out Bloat
3.
3©MapR Technologies -
Confidential
4.
4©MapR Technologies -
Confidential Bloat is Leaving in 0.7 Lots of abandoned code in Mahout – average code quality is poor – no users – no maintainers – why do we care? Examples – old LDA – old Naïve Bayes – genetic algorithms If you care, get on the mailing list 0.7 is about to be released
5.
5©MapR Technologies -
Confidential Integration of Collections
6.
6©MapR Technologies -
Confidential Nobody Cares about Collections We need it, math is built on it Pull it into math Broke the build (battle of the code expanders) Fixed now (thanks Grant)
7.
7©MapR Technologies -
Confidential K-nearest Neighbor with Super Fast k-means
8.
8©MapR Technologies -
Confidential What’s that? Find the k nearest training examples Use the average value of the target variable from them This is easy … but hard – easy because it is so conceptually simple and you don’t have knobs to turn or models to build – hard because of the stunning amount of math – also hard because we need top 50,000 results Initial prototype was massively too slow – 3K queries x 200K examples takes hours – needed 20M x 25M in the same time
9.
9©MapR Technologies -
Confidential How We Did It 2 week hackathon with 6 developers from customer bank Agile-ish development To avoid IP issues – all code is Apache Licensed (no ownership question) – all data is synthetic (no question of private data) – all development done on individual machines, hosting on Github – open is easier than closed (in this case) Goal is new open technology to facilitate new closed solutions Ambitious goal of ~ 1,000,000 x speedup
10.
10©MapR Technologies -
Confidential How We Did It 2 week hackathon with 6 developers from customer bank Agile-ish development To avoid IP issues – all code is Apache Licensed (no ownership question) – all data is synthetic (no question of private data) – all development done on individual machines, hosting on Github – open is easier than closed (in this case) Goal is new open technology to facilitate new closed solutions Ambitious goal of ~ 1,000,000 x speedup – well, really only 100-1000x after basic hygiene
11.
11©MapR Technologies -
Confidential What We Did Mechanism for extending Mahout Vectors – DelegatingVector, WeightedVector, Centroid Searcher interface – ProjectionSearch, KmeansSearch, LshSearch, Brute Super-fast clustering – Kmeans, StreamingKmeans
12.
12©MapR Technologies -
Confidential Projection Search java.lang.TreeSet!
13.
13©MapR Technologies -
Confidential How Many Projections?
14.
14©MapR Technologies -
Confidential K-means Search Simple Idea – pre-cluster the data – to find the nearest points, search the nearest clusters Recursive application – to search a cluster, use a Searcher!
15.
15©MapR Technologies -
Confidential
16.
16©MapR Technologies -
Confidential x
17.
17©MapR Technologies -
Confidential
18.
18©MapR Technologies -
Confidential
19.
19©MapR Technologies -
Confidential x
20.
20©MapR Technologies -
Confidential But This Require k-means! Need a new k-means algorithm to get speed – Hadoop is very slow at iterative map-reduce – Maybe Pregel clones like Giraph would be better – Or maybe not Streaming k-means is – One pass (through the original data) – Very fast (20 us per data point with threads) – Very parallelizable
21.
21©MapR Technologies -
Confidential How It Works For each point – Find approximately nearest centroid (distance = d) – If d > threshold, new centroid – Else possibly new cluster – Else add to nearest centroid If centroids > K ~ C log N – Recursively cluster centroids with higher threshold Result is large set of centroids – these provide approximation of original distribution – we can cluster centroids to get a close approximation of clustering original – or we can just use the result directly
22.
22©MapR Technologies -
Confidential Parallel Speedup? 1 2 3 4 5 20 10 100 20 30 40 50 200 Threads Timeperpoint(μs) 2 3 4 5 6 8 10 12 14 16 Threaded version Non- threaded Perfect Scaling ✓
23.
23©MapR Technologies -
Confidential Warning, Recursive Descent Inner loop requires finding nearest centroid With lots of centroids, this is slow But wait, we have classes to accelerate that!
24.
24©MapR Technologies -
Confidential Warning, Recursive Descent Inner loop requires finding nearest centroid With lots of centroids, this is slow But wait, we have classes to accelerate that! (Let’s not use k-means searcher, though)
25.
25©MapR Technologies -
Confidential Pig Vector
26.
26©MapR Technologies -
Confidential What is it? Supports Pig access to Mahout functions So far text vectorization And classification And model saving
27.
27©MapR Technologies -
Confidential What is it? Supports Pig access to Mahout functions So far text vectorization And classification And model saving Kind of works (see pigML from twitter for better function)
28.
28©MapR Technologies -
Confidential Compile and Install Start by compiling and installing mahout in your local repository: cd ~/Apache git clone https://github.com/apache/mahout.git cd mahout mvn install -DskipTests Then do the same with pig-vector cd ~/Apache git clone git@github.com:tdunning/pig-vector.git cd pig-vector mvn package
29.
29©MapR Technologies -
Confidential Tokenize and Vectorize Text Tokenized is done using a text encoder – the dimension of the resulting vectors (typically 100,000-1,000,000 – a description of the variables to be included in the encoding – the schema of the tuples that pig will pass together with their data types Example: define EncodeVector org.apache.mahout.pig.encoders.EncodeVector ('10','x+y+1', 'x:numeric, y:word, z:text'); You can also add a Lucene 3.1 analyzer in parentheses if you want something fancier
30.
30©MapR Technologies -
Confidential The Formula Not normal arithmetic Describes which variables to use, whether offset is included Also describes which interactions to use
31.
31©MapR Technologies -
Confidential The Formula Not normal arithmetic Describes which variables to use, whether offset is included Also describes which interactions to use – but that doesn’t do anything yet!
32.
32©MapR Technologies -
Confidential Load and Encode Data Load the data a = load '/Users/tdunning/Downloads/NNBench.csv' using PigStorage(',') as (x1:int, x2:int, x3:int); And encode it b = foreach a generate 1 as key, EncodeVector(*) as v; Note that the true meaning of * is very subtle Now store it store b into 'vectors.dat' using com.twitter.elephantbird.pig.store.SequenceFileStorage ( '-c com.twitter.elephantbird.pig.util.IntWritableConverter’, '-c com.twitter.elephantbird.pig.util.GenericWritableConverter -t org.apache.mahout.math.VectorWritable’);
33.
33©MapR Technologies -
Confidential Train a Model Pass previously encoded data to a sequential model trainer define train org.apache.mahout.pig.LogisticRegression( 'iterations=5, inMemory=true, features=100000, categories=alt.atheism comp.sys.mac.hardware rec.motorcycles sci.electronics talk.politics.guns comp.graphics comp.windows.x rec.sport.baseball sci.med talk.politics.mideast comp.os.ms-windows.misc misc.forsale rec.sport.hockey sci.space talk.politics.misc comp.sys.ibm.pc.hardware rec.autos sci.crypt soc.religion.christian talk.religion.misc'); Note that the argument is a string with its own syntax
34.
34©MapR Technologies -
Confidential Reservations and Qualms Pig-vector isn’t done And it is ugly And it doesn’t quite work And it is hard to build But there seems to be promise
35.
35©MapR Technologies -
Confidential Potential Add Naïve Bayes Model? Somehow simplify the syntax? Try a recent version of elephant-bird? Switch to pigML?
36.
36©MapR Technologies -
Confidential Contact: – tdunning@maprtech.com – @ted_dunning Slides and such: – http://info.mapr.com/ted-bbuzz-2012 Hash tags: #bbuzz #mahout
Jetzt herunterladen