Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Big Data Science: Intro and Benefits

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
introduction to data science
introduction to data science
Wird geladen in …3
×

Hier ansehen

1 von 22 Anzeige

Big Data Science: Intro and Benefits

What is Big Data? What is Data Science? What are the benefits? How will they evolve in my organisation?

Built around the premise that the investment in big data is far less than the cost of not having it, this presentation made at a tech media industry event, this presentation will unveil and explore the nuances of Big Data and Data Science and their synergy forming Big Data Science. It highlights the benefits of investing in it and defines a path to their evolution within most organisations.

What is Big Data? What is Data Science? What are the benefits? How will they evolve in my organisation?

Built around the premise that the investment in big data is far less than the cost of not having it, this presentation made at a tech media industry event, this presentation will unveil and explore the nuances of Big Data and Data Science and their synergy forming Big Data Science. It highlights the benefits of investing in it and defines a path to their evolution within most organisations.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (20)

Anzeige

Ähnlich wie Big Data Science: Intro and Benefits (20)

Weitere von Chandan Rajah (17)

Anzeige

Big Data Science: Intro and Benefits

  1. 1. TITLE and title BIG DATA SCIENCE Chandan Rajah – CEO, Parallel AI “The price of light is far less than the cost of darkness”
  2. 2. TITLE and title SUB TITLE SUB TITLE footnote footnote BENEFITS OF BIG DATA COST SPEED AGILITY CAPABILITY
  3. 3. TITLE and title SUB TITLE SUB TITLE footnote footnote BIG DATA JOURNEY WHERE WHAT WHY HOW
  4. 4. TITLE and title SUB TITLE SUB TITLE footnote footnote What is Big Data ? Big Data ≠ Data Volume Big Data = Crude Oil Think of data like ‘Crude Oil’ Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it in ‘mega tanks’ Source: Data Science London
  5. 5. TITLE and title SUB TITLE SUB TITLE footnote footnote What is Data Science ? Data Science ≠ Statistical Analysis Data Science = Oil Refinery Data science is about ‘treating’ data; applying ‘science’ to the data; Refine the data ‘results’; and combine to form ‘insight’ Source: Data Science London
  6. 6. TITLE and title SUB TITLE SUB TITLE footnote footnote What is the Big Data Science Toolkit ? • Scala, Java, Python, R… (bonus: Clojure Haskell, Erlang) • Hadoop, HDFS, MapReduce… (bonus: Spark, Storm, Tez) • Scalding, HBase, Hive… (bonus: Shark, Titan, Giraph) • Flume, Sqoop, ETL, Webscrapers… (bonus: Hume) • SQL, RDBMS, DW, OLAP… (bonus: SOLR, ElasticSearch) • Knime, Weka RapidMiner… (bonus: SciPy, NumPy, Pandas) • D3.js, Kibana, ggplot2, Flare… (bonus: Shiny, Flare, Datameer) • NoSQL, MongoDB, Cassandra, CouchDB • And sometimes… MS Excel Source: Data Science London
  7. 7. TITLE and title SUB TITLE SUB TITLE footnote footnote Knowns, Unknowns & DIKUW FTW! known knowns we know we know known unknowns we know we don’t know unknown unknowns we don’t know we don’t know D I K U W DATA INFORMATION KNOWLEDGE UNDERSTANDING WISDOM raw what how to why when numbers description experience cause & effect prediction letters context tested proven what’s best symbols relationship instruction signals reports programs models PAST FUTURE Data Engineer Data Analyst Data Miner Data Scientist known knowns known unknowns unknown unknowns Source: Data Science London
  8. 8. TITLE TITLE TITLE TITLE Business Intelligence to Data Discovery ? data you know data you don’t know questionsyou’reasking questionsyou’renotasking Data Analyst Data Scientist Business Intelligence Data Discovery DATA MODELLING Y  F( X, random noise, parameters) ALGORITHMIC MODELLING Y  [ BLACK BOX ]  X Source: Applied Data Labs & Leo Breiman
  9. 9. TITLE and title SUB TITLE SUB TITLE footnote footnote BIG DATA JOURNEY WHERE WHAT WHY HOW
  10. 10. TITLE TITLE TITLE TITLE Why is Big Data needed ? VOLUME VELOCITY VARIETY Exponential growth; 2x in 2 yrs PB (1000 TB) is now common Event streams; never at rest 640k GB per internet minute 100s of data sources 85% not in a table
  11. 11. TITLE and title SUB TITLE SUB TITLE footnote footnote BIG DATA JOURNEY WHERE WHAT WHY HOW
  12. 12. TITLE TITLE TITLE TITLE Big Data Heat Map – Gartner 2012
  13. 13. TITLE TITLE TITLE TITLE Big Data Potential by Sector – McKinsey for USBLS, 2011
  14. 14. TITLE TITLE TITLE TITLE Big Data Investment by Industry – Gartner, 2012
  15. 15. TITLE TITLE TITLE TITLE Top Big Data Challenges – Gartner, 2012
  16. 16. TITLE TITLE TITLE TITLE CIO Survey on Big Data Investments – IDG Survey, 2013
  17. 17. TITLE TITLE TITLE TITLE CIO Survey on Main Drivers to Invest – IDG Survey, 2014
  18. 18. TITLE and title SUB TITLE SUB TITLE footnote footnote BIG DATA JOURNEY WHERE WHAT WHY HOW
  19. 19. TITLE TITLE TITLE TITLE How will Big Data Evolve? EXTERNAL ALIGNMENT INTERNAL COHERENCE Align with Existing BI; Maximise Value Exploit Capability; Respond Rapidly Focus; Innovate; Stay Ahead Repeat; Stabilize; Governance
  20. 20. TITLE and title SUB TITLE SUB TITLE footnote footnote RECAP OF BENEFITS COST SPEED AGILITY CAPABILITY
  21. 21. TITLE TITLE TITLE TITLE LAST WORDS OF WISDOM NOT ALL ROADS LEAD TO ROME TIME VALUE OF DATA KNOWLEDGE IS POWER I AM AN INDIVIDUAL
  22. 22. TITLE and title “The price of light is far less than the cost of darkness”

Hinweis der Redaktion

  • COST – 20x less per TB v/s Teradata, Netezza, Oracle– 75% less average marginal cost per capacitySPEED – 10x faster than Teradata, NetezzaAGILITY – 115% lesser average cost per data source v/s OracleSCIENCE – Machine learning, prediction
  • WHAT - What is Big Data Science?WHY - Why is it needed?WHERE - Where is it being used?HOW - How will it evolve?
  • COST – 20x less per TB v/s Teradata, Netezza, Oracle– 75% less average marginal cost per capacitySPEED – 10x faster than Teradata, NetezzaAGILITY – 115% lesser average cost per data source v/s OracleSCIENCE – Machine learning, prediction
  • TIME VALUE - Yesterday’s data is less valuable than today’s data - Historical data is more valuable than just now alonePOWER - Get from unknown unknowns to known unknowns or known knowns is powerfulLEAD TO ROME - Exploring with no direct business impact is not a bad thingINDIVUDUAL - Treat every customer as an individual not an aggregate and analyse - Aggregate only individual insights

×