Strata 2012 Million Monkeys

•Als PPTX, PDF herunterladen•

0 gefällt mir•338 views

The document discusses randomness and the infinite monkey theorem through three key points: 1) With enough random combinations, even unlikely events become probable, like monkeys randomly typing Shakespeare. 2) Hadoop has near-linear scalability, allowing computational power and storage to increase predictably by simply adding more nodes, unlike relational databases. 3) This scalability provides business value by enabling applications to expand without massive engineering efforts or code rewrites.

Technologie

Given Enough Monkeys
Some Thoughts on Randomness
Jesse Anderson | CLOUDERA, INSTRUCTOR

Million Monkeys Algorithm

Randomly generate a 9 character group

TOBEORNOT

Does it exist in Shakespeare?
To be, or not to be- that is the question

3

Exponential Growth (aka Big Data)

Odds of finding a group Contiguous
Combinations
of characters is 1 in 26 Characters
raised to the power of
the number of 8 208,827,064,576
contiguous characters
9 5,429,503,678,976

10 141,167,095,653,376

4

Hadoop Scalability
Percent of Linear Scalability
100

80
Percent

60 RDBMS
Hadoop
40

20

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Nodes RDBMS = Relational Database

6

Business Value of Scalability

Scaling does not require Adding more computers
massive re-engineering to cluster gets a
and complete rewrites of predictable increase in
code computational power and
storage

SAVE SAVE

7

Going Viral (and taking over the world)

Covered internationally 26,000 unique
in BBC, Wall Street visits from 119
Journal, Wired and countries in
Slashdot one day

8

Weitere ähnliche Inhalte

Ähnlich wie Strata 2012 Million Monkeys

Architecting Virtualized Infrastructure for Big DataRichard McDougall

Etu L2 Training - Hadoop 企業應用實作James Chen

Cassandraspichale

Big Data Analytics with AWS and AWS Marketplace WebinarAmazon Web Services

Big Data Analytics with Amazon Web ServicesAmazon Web Services

Big data analytics_7_giants_public_24_sep_2013Vijay Srinivas Agneeswaran, Ph.D

ScimoreDB @ CommunityDays 2011scimore

Scimore CommunityDays 2011scimore

Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit

Hadoop - Simple. Scalable.elliando dias

Brisk hadoop june2011_sfjavasrisatish ambati

SparkNitish Upreti

Dynamo Systems - QCon SF 2012 PresentationShanley Kane

No SqlMichael Marth

Prepare Your Data For The CloudIndicThreads

Preparing your data for the cloudInphina Technologies

DeepImage_GTC15_publicRen Wu

Brisk hadoop june2011srisatish ambati

Navigating NoSQL in cloudy skiesshnkr_rmchndrn

Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Sybase Türkiye

Ähnlich wie Strata 2012 Million Monkeys (20)

Architecting Virtualized Infrastructure for Big Data

Etu L2 Training - Hadoop 企業應用實作

Cassandra

Big Data Analytics with AWS and AWS Marketplace Webinar

Big Data Analytics with Amazon Web Services

Big data analytics_7_giants_public_24_sep_2013

ScimoreDB @ CommunityDays 2011

Scimore CommunityDays 2011

Scaling Big Data Mining Infrastructure Twitter Experience

Hadoop - Simple. Scalable.

Brisk hadoop june2011_sfjava

Spark

Dynamo Systems - QCon SF 2012 Presentation

No Sql

Prepare Your Data For The Cloud

Preparing your data for the cloud

DeepImage_GTC15_public

Brisk hadoop june2011

Navigating NoSQL in cloudy skies

Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase

Mehr von Jesse Anderson

Managing Real-Time Data TeamsJesse Anderson

Pulsar for Kafka PeopleJesse Anderson

Big Data and Analytics in the COVID-19 EraJesse Anderson

Working Together As Data Teams V1Jesse Anderson

What Does an Exec Need to About Architecture and WhyJesse Anderson

The Five Dysfunctions of a Data Engineering TeamJesse Anderson

HBaseCon 2014-Just the BasicsJesse Anderson

EC2 Performance, Spot Instance ROI and EMR ScalabilityJesse Anderson

Introduction to Regular ExpressionsJesse Anderson

Why Use MVC?Jesse Anderson

How to Use MVCJesse Anderson

Introduction to AndroidJesse Anderson

Mehr von Jesse Anderson (12)

Managing Real-Time Data Teams

Pulsar for Kafka People

Big Data and Analytics in the COVID-19 Era

Working Together As Data Teams V1

What Does an Exec Need to About Architecture and Why

The Five Dysfunctions of a Data Engineering Team

HBaseCon 2014-Just the Basics

EC2 Performance, Spot Instance ROI and EMR Scalability

Introduction to Regular Expressions

Why Use MVC?

How to Use MVC

Introduction to Android

Kürzlich hochgeladen

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Sample pptx for embedding into website for demoHarshalMandlekar2

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

A Journey Into the Emotions of Software DevelopersNicole Novielli

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

What is Artificial Intelligence?????????blackmambaettijean

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

How to write a Business Continuity PlanDatabarracks

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Rise of the Machines: Known As Drones...Rick Flair

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Kürzlich hochgeladen (20)

DevEX - reference for building teams, processes, and platforms

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Dev Dives: Streamline document processing with UiPath Studio Web

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Sample pptx for embedding into website for demo

DSPy a system for AI to Write Prompts and Do Fine Tuning

SIP trunking in Janus @ Kamailio World 2024

TeamStation AI System Report LATAM IT Salaries 2024

What's New in Teams Calling, Meetings and Devices March 2024

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

A Journey Into the Emotions of Software Developers

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

What is Artificial Intelligence?????????

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

How to write a Business Continuity Plan

Generative AI for Technical Writer or Information Developers

Anypoint Exchange: It’s Not Just a Repo!

Rise of the Machines: Known As Drones...

The State of Passkeys with FIDO Alliance.pptx

Nell’iperspazio con Rocket: il Framework Web di Rust!

Strata 2012 Million Monkeys

1. Given Enough Monkeys Some Thoughts on Randomness Jesse Anderson | CLOUDERA, INSTRUCTOR

2. Infinite Monkey Theorem 2

3. Million Monkeys Algorithm Randomly generate a 9 character group TOBEORNOT Does it exist in Shakespeare? To be, or not to be- that is the question 3

4. Exponential Growth (aka Big Data) Odds of finding a group Contiguous Combinations of characters is 1 in 26 Characters raised to the power of the number of 8 208,827,064,576 contiguous characters 9 5,429,503,678,976 10 141,167,095,653,376 4

5. Data Bias? 5

6. Hadoop Scalability Percent of Linear Scalability 100 80 Percent 60 RDBMS Hadoop 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Nodes RDBMS = Relational Database 6

7. Business Value of Scalability Scaling does not require Adding more computers massive re-engineering to cluster gets a and complete rewrites of predictable increase in code computational power and storage SAVE SAVE 7

8. Going Viral (and taking over the world) Covered internationally 26,000 unique in BBC, Wall Street visits from 119 Journal, Wired and countries in Slashdot one day 8

9. @jessetanderson

Hinweis der Redaktion

Interesting statistical question. Thought about since Aristotle.Randomness+Resouces+Time=AnythingPossibleNo real monkeys – need virtual monkeys
Lucky monkeyThe monkey wears a lot of hats. He generates and then compares.Every work of Shakespeare created. First was A Lover’s Complaint and last was Taming of the ShrewVisualization to find your favorite line from Shakespeare
Shakespeare lazy. Heavily influenced English Literature.Big Data isn’t always a huge file. It can be high computation.
Creating Shakespeare not a business. Don’t have Shakespeare in your data.If you look hard enough you will find itHumans are not randomYou want to be looking for what’s actually there. Check your assumptionsOperate with scientific method. Form a hypothesis. Test hypothesis against data.Offer what customers are looking for. Not what you think or favorite or new product. Only what your data shows.
This is not a map of MT and ID1 to 20 node testingKeep efficiency up RDBMS efficiency in gutter
Engineers not spending time coding to scale. Busy adding new features.No code changes for scaling. Took 1.5 months on one computer and 3.5 days on 20 nodesSpending on new computers gives a consistent, linear increase. Compare spending on RDBMS and Hadoop.
We like to ask bigger questions.I asked if Shakespeare could be randomly recreated by a bunch of virtual monkeys? The answer is yes.

Strata 2012 Million Monkeys

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Strata 2012 Million Monkeys

Ähnlich wie Strata 2012 Million Monkeys (20)

Mehr von Jesse Anderson

Mehr von Jesse Anderson (12)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Strata 2012 Million Monkeys

Hinweis der Redaktion