SlideShare ist ein Scribd-Unternehmen logo
1 von 19
© 2014 MapR Technologies 1
© 2014 MapR Technologies 2
© 2014 MapR Technologies 3
© 2014 MapR Technologies 4
A typical
encounter with a
potential Mahout
user
© 2014 MapR Technologies 5
Which leads us to
the Mahout 1.0
vision
© 2014 MapR Technologies 6
© 2014 MapR Technologies 7
© 2014 MapR Technologies 8
© 2014 MapR Technologies 9
Example: Cooccurrence Analysis
© 2014 MapR Technologies 10
How often do items co-occur?
// load distributed matrix
val A = drmFromHDFS(...)
// compute co-occurrences
val C = A.t %*% A
© 2014 MapR Technologies 11
How often do items co-occur?
// load distributed matrix
val A = drmFromHDFS(...)
// compute co-occurrences
val C = A.t %*% A
Under the covers:
Optimizer rewrites the matrix multiplication and
transpose operations to a TransposeSelf operator
Optimizer chooses from two physical operators for
TransposeSelf
© 2014 MapR Technologies 12
Which items co-occur anomalously?
// compute & broadcast number
// of interactions per item
val numInteractions =
drmBroadcast(A.colSums)
// create indicator matrix
val I = C.mapBlock() {
case (keys, block) =>
// allocate sparse block of indicator matrix
val indicatorBlock = sparse(block.nrow, block.ncol)
// compute indicators with loglikelihood ratio test
for (row <- block)
indicatorBlock(row.index,::) = computeLLR(row,numInteractions)
keys -> indicatorBlock
}
© 2014 MapR Technologies 13
Runtime
• prototype on Apache Spark
– fast and expressive cluster
computing system
– general computation graphs, in-memory primitives, rich API, interactive
shell
• future: add Stratosphere
– project proposed to
Apache Incubator recently
– similar to Apache Spark, adds data flow optimization and efficient out-
of-core execution
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15
© 2014 MapR Technologies 16
How Does This Apply?
© 2014 MapR Technologies 17
How Can I Start?
© 2014 MapR Technologies 18
Q&A
@ted_dunning @mapr maprtech
tdunning@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies
© 2014 MapR Technologies 20

Weitere ähnliche Inhalte

Was ist angesagt?

Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matterDataWorks Summit
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the MoviesDataWorks Summit
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With ChaosDataWorks Summit
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with ChaosMapR Technologies
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and RecommendationsTed Dunning
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.Albert Bifet
 
Universal Adiabatic Quantum Computer v1.0
Universal Adiabatic Quantum Computer v1.0Universal Adiabatic Quantum Computer v1.0
Universal Adiabatic Quantum Computer v1.0Aditya Yadav
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Architecting R into Storm Application Development Process
Architecting R into Storm Application Development ProcessArchitecting R into Storm Application Development Process
Architecting R into Storm Application Development ProcessDataWorks Summit
 

Was ist angesagt? (20)

Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the Movies
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With Chaos
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with Chaos
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Universal Adiabatic Quantum Computer v1.0
Universal Adiabatic Quantum Computer v1.0Universal Adiabatic Quantum Computer v1.0
Universal Adiabatic Quantum Computer v1.0
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Architecting R into Storm Application Development Process
Architecting R into Storm Application Development ProcessArchitecting R into Storm Application Development Process
Architecting R into Storm Application Development Process
 

Andere mochten auch

HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLHBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLMapR Technologies
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 

Andere mochten auch (6)

HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLHBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 

Ähnlich wie Possible Visions for Mahout 1.0

Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksKelly Kohlleffel
 
Uav route planning for maximum target coverage
Uav route planning for maximum target coverageUav route planning for maximum target coverage
Uav route planning for maximum target coveragecseij
 
Survey on virtual machine placement techniques in cloud computing environment
Survey on virtual machine placement techniques in cloud computing environmentSurvey on virtual machine placement techniques in cloud computing environment
Survey on virtual machine placement techniques in cloud computing environmentijccsa
 
Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Amir Payberah
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...MapR Technologies
 
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...daoudalaa
 
Multi-agent approach to resource allocation inautonomous vehicle fleet
Multi-agent approach to resource allocation inautonomous vehicle fleetMulti-agent approach to resource allocation inautonomous vehicle fleet
Multi-agent approach to resource allocation inautonomous vehicle fleetdaoudalaa
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedAllen Day, PhD
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014Hortonworks
 
The multigent Layer for CALMeD SURF
The multigent Layer for CALMeD SURFThe multigent Layer for CALMeD SURF
The multigent Layer for CALMeD SURFMiguel Rebollo
 
Integrating fuzzy and ant colony system for
Integrating fuzzy and ant colony system forIntegrating fuzzy and ant colony system for
Integrating fuzzy and ant colony system forijcsa
 
IEEE Paper Presentation by Chandan Kumar
IEEE Paper Presentation by Chandan KumarIEEE Paper Presentation by Chandan Kumar
IEEE Paper Presentation by Chandan KumarChandan Kumar
 
Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014ijcsbi
 
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfDevfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfKAI CHU CHUNG
 
Optimal location of relief facility.pptx
Optimal location of relief facility.pptxOptimal location of relief facility.pptx
Optimal location of relief facility.pptxgyaneshtripathiirsme
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
 

Ähnlich wie Possible Visions for Mahout 1.0 (20)

Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
 
Uav route planning for maximum target coverage
Uav route planning for maximum target coverageUav route planning for maximum target coverage
Uav route planning for maximum target coverage
 
Survey on virtual machine placement techniques in cloud computing environment
Survey on virtual machine placement techniques in cloud computing environmentSurvey on virtual machine placement techniques in cloud computing environment
Survey on virtual machine placement techniques in cloud computing environment
 
Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...
 
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...
 
Multi-agent approach to resource allocation inautonomous vehicle fleet
Multi-agent approach to resource allocation inautonomous vehicle fleetMulti-agent approach to resource allocation inautonomous vehicle fleet
Multi-agent approach to resource allocation inautonomous vehicle fleet
 
PFD UAV Final Presentation
PFD UAV Final PresentationPFD UAV Final Presentation
PFD UAV Final Presentation
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
 
The multigent Layer for CALMeD SURF
The multigent Layer for CALMeD SURFThe multigent Layer for CALMeD SURF
The multigent Layer for CALMeD SURF
 
Integrating fuzzy and ant colony system for
Integrating fuzzy and ant colony system forIntegrating fuzzy and ant colony system for
Integrating fuzzy and ant colony system for
 
journal for research
journal for researchjournal for research
journal for research
 
IEEE Paper Presentation by Chandan Kumar
IEEE Paper Presentation by Chandan KumarIEEE Paper Presentation by Chandan Kumar
IEEE Paper Presentation by Chandan Kumar
 
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
 
Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014
 
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfDevfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
 
Optimal location of relief facility.pptx
Optimal location of relief facility.pptxOptimal location of relief facility.pptx
Optimal location of relief facility.pptx
 
Ch1
Ch1Ch1
Ch1
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 

Mehr von Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

Mehr von Ted Dunning (11)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Kürzlich hochgeladen

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Possible Visions for Mahout 1.0

  • 1. © 2014 MapR Technologies 1
  • 2. © 2014 MapR Technologies 2
  • 3. © 2014 MapR Technologies 3
  • 4. © 2014 MapR Technologies 4 A typical encounter with a potential Mahout user
  • 5. © 2014 MapR Technologies 5 Which leads us to the Mahout 1.0 vision
  • 6. © 2014 MapR Technologies 6
  • 7. © 2014 MapR Technologies 7
  • 8. © 2014 MapR Technologies 8
  • 9. © 2014 MapR Technologies 9 Example: Cooccurrence Analysis
  • 10. © 2014 MapR Technologies 10 How often do items co-occur? // load distributed matrix val A = drmFromHDFS(...) // compute co-occurrences val C = A.t %*% A
  • 11. © 2014 MapR Technologies 11 How often do items co-occur? // load distributed matrix val A = drmFromHDFS(...) // compute co-occurrences val C = A.t %*% A Under the covers: Optimizer rewrites the matrix multiplication and transpose operations to a TransposeSelf operator Optimizer chooses from two physical operators for TransposeSelf
  • 12. © 2014 MapR Technologies 12 Which items co-occur anomalously? // compute & broadcast number // of interactions per item val numInteractions = drmBroadcast(A.colSums) // create indicator matrix val I = C.mapBlock() { case (keys, block) => // allocate sparse block of indicator matrix val indicatorBlock = sparse(block.nrow, block.ncol) // compute indicators with loglikelihood ratio test for (row <- block) indicatorBlock(row.index,::) = computeLLR(row,numInteractions) keys -> indicatorBlock }
  • 13. © 2014 MapR Technologies 13 Runtime • prototype on Apache Spark – fast and expressive cluster computing system – general computation graphs, in-memory primitives, rich API, interactive shell • future: add Stratosphere – project proposed to Apache Incubator recently – similar to Apache Spark, adds data flow optimization and efficient out- of-core execution
  • 14. © 2014 MapR Technologies 14
  • 15. © 2014 MapR Technologies 15
  • 16. © 2014 MapR Technologies 16 How Does This Apply?
  • 17. © 2014 MapR Technologies 17 How Can I Start?
  • 18. © 2014 MapR Technologies 18 Q&A @ted_dunning @mapr maprtech tdunning@mapr.com Engage with us! MapR maprtech mapr-technologies
  • 19. © 2014 MapR Technologies 20

Hinweis der Redaktion

  1. I just have 5 minutes for this talk. Given the short time I thought I’d share with you some of the more interesting things you can do with Hadoop in 5 minutes or less…
  2. In 1 minute you can perform 4.73 million concurrent authentications in the largest biometric database in the worldIn India, there is no social security card. It’s difficult for the average citizen to set up a bank account, access benefit programs, and enjoy economic mobility. It’s difficult for the government as well with over a $1B of government aid classified as leakage, the result of fraud and corruption. The Aadhaar program is poised to change all that by leveraging the unique IDs that all people are born with. The program aims to get fingerprints and retina scans for all 1.2 billion citizens. The scale of this project required an integrated in-Hadoop database that was capable of 200 millisecond response times while supporting millions of concurrent look-ups.
  3. In 1 minute you can perform 4.73 million concurrent authentications in the largest biometric database in the worldIn India, there is no social security card. It’s difficult for the average citizen to set up a bank account, access benefit programs, and enjoy economic mobility. It’s difficult for the government as well with over a $1B of government aid classified as leakage, the result of fraud and corruption. The Aadhaar program is poised to change all that by leveraging the unique IDs that all people are born with. The program aims to get fingerprints and retina scans for all 1.2 billion citizens. The scale of this project required an integrated in-Hadoop database that was capable of 200 millisecond response times while supporting millions of concurrent look-ups.