Suche senden
Hochladen
Big data and data science study
•
Als PPTX, PDF herunterladen
•
9 gefällt mir
•
2,741 views
D
dspadawan
Folgen
Melden
Teilen
Melden
Teilen
1 von 43
Jetzt herunterladen
Empfohlen
Apache Storm
Apache Storm
Edureka!
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
Edureka!
Hadoop for Java Professionals
Hadoop for Java Professionals
Edureka!
Spark streaming
Spark streaming
Noam Shaish
Data Manipulation at Scale Systems and Algorithms
Data Manipulation at Scale Systems and Algorithms
Gianfranco Campana
Data Manipulation at Scale Systems and Algorithms
Data Manipulation at Scale Systems and Algorithms
Zhipeng Liang
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
Kimberly Hoffman
03 preprocessing
03 preprocessing
purnimatm
Empfohlen
Apache Storm
Apache Storm
Edureka!
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
Edureka!
Hadoop for Java Professionals
Hadoop for Java Professionals
Edureka!
Spark streaming
Spark streaming
Noam Shaish
Data Manipulation at Scale Systems and Algorithms
Data Manipulation at Scale Systems and Algorithms
Gianfranco Campana
Data Manipulation at Scale Systems and Algorithms
Data Manipulation at Scale Systems and Algorithms
Zhipeng Liang
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
Kimberly Hoffman
03 preprocessing
03 preprocessing
purnimatm
Social BPM
Social BPM
Sandy Kemsley
Deep Learning in theano
Deep Learning in theano
Massimo Quadrana
Deep learning
Deep learning
Mohamed Loey
Data science
Data science
Mohamed Loey
Big Data, Bigger Campaigns: Using IBM’s Unica and Netezza Platforms to Increa...
Big Data, Bigger Campaigns: Using IBM’s Unica and Netezza Platforms to Increa...
graemeknows
Selection and on boarding process
Selection and on boarding process
Elijah Ezendu
Machine Learning and Data Mining: 15 Data Exploration and Preparation
Machine Learning and Data Mining: 15 Data Exploration and Preparation
Pier Luca Lanzi
Intégration des données avec Talend ETL
Intégration des données avec Talend ETL
Lilia Sfaxi
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine Learning
Julian Bright
Ideation and Design Principles Workshop
Ideation and Design Principles Workshop
Dan Saffer
Capturing Data Requirements
Capturing Data Requirements
mcomtraining
2016 kcd 세미나 발표자료. 구글포토로 바라본 인공지능과 머신러닝
2016 kcd 세미나 발표자료. 구글포토로 바라본 인공지능과 머신러닝
JungGeun Lee
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Ahmed Mahmoud
RMPG Learning Series CRM Workshop Day 1 session 3
RMPG Learning Series CRM Workshop Day 1 session 3
iNFiNiTi HR Company Limited
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가
Yongha Kim
The Field Guide to Data Science
The Field Guide to Data Science
EMC
Webinar Smile et Talend : Faites communiquer vos applications en temps réel
Webinar Smile et Talend : Faites communiquer vos applications en temps réel
Smile I.T is open
SWAD Timeline 4:3
SWAD Timeline 4:3
Antonio Cañas Vargas
Swad Timeline
Swad Timeline
Antonio Cañas Vargas
SIC Finale Status Report August 6.pptx
SIC Finale Status Report August 6.pptx
Shaista Ansari
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
Cloudera, Inc.
Internship final presentation GraphicPeople
Internship final presentation GraphicPeople
Samsuddoha Sams
Weitere ähnliche Inhalte
Andere mochten auch
Social BPM
Social BPM
Sandy Kemsley
Deep Learning in theano
Deep Learning in theano
Massimo Quadrana
Deep learning
Deep learning
Mohamed Loey
Data science
Data science
Mohamed Loey
Big Data, Bigger Campaigns: Using IBM’s Unica and Netezza Platforms to Increa...
Big Data, Bigger Campaigns: Using IBM’s Unica and Netezza Platforms to Increa...
graemeknows
Selection and on boarding process
Selection and on boarding process
Elijah Ezendu
Machine Learning and Data Mining: 15 Data Exploration and Preparation
Machine Learning and Data Mining: 15 Data Exploration and Preparation
Pier Luca Lanzi
Intégration des données avec Talend ETL
Intégration des données avec Talend ETL
Lilia Sfaxi
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine Learning
Julian Bright
Ideation and Design Principles Workshop
Ideation and Design Principles Workshop
Dan Saffer
Capturing Data Requirements
Capturing Data Requirements
mcomtraining
2016 kcd 세미나 발표자료. 구글포토로 바라본 인공지능과 머신러닝
2016 kcd 세미나 발표자료. 구글포토로 바라본 인공지능과 머신러닝
JungGeun Lee
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Ahmed Mahmoud
RMPG Learning Series CRM Workshop Day 1 session 3
RMPG Learning Series CRM Workshop Day 1 session 3
iNFiNiTi HR Company Limited
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가
Yongha Kim
The Field Guide to Data Science
The Field Guide to Data Science
EMC
Webinar Smile et Talend : Faites communiquer vos applications en temps réel
Webinar Smile et Talend : Faites communiquer vos applications en temps réel
Smile I.T is open
Andere mochten auch
(17)
Social BPM
Social BPM
Deep Learning in theano
Deep Learning in theano
Deep learning
Deep learning
Data science
Data science
Big Data, Bigger Campaigns: Using IBM’s Unica and Netezza Platforms to Increa...
Big Data, Bigger Campaigns: Using IBM’s Unica and Netezza Platforms to Increa...
Selection and on boarding process
Selection and on boarding process
Machine Learning and Data Mining: 15 Data Exploration and Preparation
Machine Learning and Data Mining: 15 Data Exploration and Preparation
Intégration des données avec Talend ETL
Intégration des données avec Talend ETL
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine Learning
Ideation and Design Principles Workshop
Ideation and Design Principles Workshop
Capturing Data Requirements
Capturing Data Requirements
2016 kcd 세미나 발표자료. 구글포토로 바라본 인공지능과 머신러닝
2016 kcd 세미나 발표자료. 구글포토로 바라본 인공지능과 머신러닝
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
RMPG Learning Series CRM Workshop Day 1 session 3
RMPG Learning Series CRM Workshop Day 1 session 3
기계학습 / 딥러닝이란 무엇인가
기계학습 / 딥러닝이란 무엇인가
The Field Guide to Data Science
The Field Guide to Data Science
Webinar Smile et Talend : Faites communiquer vos applications en temps réel
Webinar Smile et Talend : Faites communiquer vos applications en temps réel
Ähnlich wie Big data and data science study
SWAD Timeline 4:3
SWAD Timeline 4:3
Antonio Cañas Vargas
Swad Timeline
Swad Timeline
Antonio Cañas Vargas
SIC Finale Status Report August 6.pptx
SIC Finale Status Report August 6.pptx
Shaista Ansari
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
Cloudera, Inc.
Internship final presentation GraphicPeople
Internship final presentation GraphicPeople
Samsuddoha Sams
Data Science in the Enterprise
Data Science in the Enterprise
The Hive
Welcome Address for NUS-ISS e- Open House 2020: Designing Intelligent Edge C...
Welcome Address for NUS-ISS e- Open House 2020: Designing Intelligent Edge C...
NUS-ISS
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
Galvanize
Bringing Deep Learning into production
Bringing Deep Learning into production
Paolo Platter
SODA Framework Projects 25 Sep 2022 v1.pptx
SODA Framework Projects 25 Sep 2022 v1.pptx
SushruthNagaraj1
Fast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud Service
Gustavo Rene Antunez
Processing Twitter Stream with Oracle Event Processing (OEP)
Processing Twitter Stream with Oracle Event Processing (OEP)
Trivadis
Scalable Machine Learning using R and Azure HDInsight - Parashar
Scalable Machine Learning using R and Azure HDInsight - Parashar
Parashar Shah
Introduction to Big data tdd and pig unit
Introduction to Big data tdd and pig unit
Edureka!
Data Curation Lifecycle Management at the University of Edinburgh
Data Curation Lifecycle Management at the University of Edinburgh
EDINA, University of Edinburgh
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
GautamPopli1
Azure ML: from basic to integration with custom applications
Azure ML: from basic to integration with custom applications
Davide Mauri
Keeping on Top of Your Research Data - 2014-05-07 - Social Sciences Division,...
Keeping on Top of Your Research Data - 2014-05-07 - Social Sciences Division,...
Research Support Team, IT Services, University of Oxford
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
DataWorks Summit
Data Science and CDSW
Data Science and CDSW
Jason Hubbard
Ähnlich wie Big data and data science study
(20)
SWAD Timeline 4:3
SWAD Timeline 4:3
Swad Timeline
Swad Timeline
SIC Finale Status Report August 6.pptx
SIC Finale Status Report August 6.pptx
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
Internship final presentation GraphicPeople
Internship final presentation GraphicPeople
Data Science in the Enterprise
Data Science in the Enterprise
Welcome Address for NUS-ISS e- Open House 2020: Designing Intelligent Edge C...
Welcome Address for NUS-ISS e- Open House 2020: Designing Intelligent Edge C...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
Bringing Deep Learning into production
Bringing Deep Learning into production
SODA Framework Projects 25 Sep 2022 v1.pptx
SODA Framework Projects 25 Sep 2022 v1.pptx
Fast, Flexible Application Development with Oracle Database Cloud Service
Fast, Flexible Application Development with Oracle Database Cloud Service
Processing Twitter Stream with Oracle Event Processing (OEP)
Processing Twitter Stream with Oracle Event Processing (OEP)
Scalable Machine Learning using R and Azure HDInsight - Parashar
Scalable Machine Learning using R and Azure HDInsight - Parashar
Introduction to Big data tdd and pig unit
Introduction to Big data tdd and pig unit
Data Curation Lifecycle Management at the University of Edinburgh
Data Curation Lifecycle Management at the University of Edinburgh
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
Azure ML: from basic to integration with custom applications
Azure ML: from basic to integration with custom applications
Keeping on Top of Your Research Data - 2014-05-07 - Social Sciences Division,...
Keeping on Top of Your Research Data - 2014-05-07 - Social Sciences Division,...
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Data Science and CDSW
Data Science and CDSW
Big data and data science study
1.
Background © Jim
Kaskade: Big Data BIG DATA AND DATA SCIENCE study materials and online courses by @dspadawan
2.
WHAT IS DATA
SCIENCE 2 Copyright © 2013-2014 by Teradata. All rights reserved. THE DATA SCIENCE VENN DIAGRAM @dspadawan
3.
DATA SCIENCE DOMAINS
All links go to Wiki. If you are not sure what something means you can learn. 1. Data Science (Fundamentals) 2. Statistics 3. Programming languages 4. Machine Learning / Data Mining 5. Text Mining / Natural Language Processing 6. Data Visualization 7. Big Data (Hadoop, MapReduce, NoSQL) 8. Data Ingestion 9. Data Munging or Data Wrangling 10. Toolbox (Weka, …, Spark, Storm, …, Sqoop, RHIPE, etc.) 3 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
4.
DATA SCIENCE METRO
MAP 4 Copyright © 2013-2014 by Teradata. All rights reserved. BECOMING A DATA SCIENTIST
5.
MASSIVE OPEN ONLINE
COURSES (MOOC) • Aggregator > http://www.mooc-list.com • Platforms > https://www.coursera.org > https://www.edx.org > https://www.open2study.com > https://www.udacity.com > https://www.udemy.com > http://online.stanford.edu • Interactive platforms > http://www.codecademy.com > https://www.datacamp.com 5 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
6.
WANT TO WORK
AS DATA SCIENTIST? 6 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
7.
DATA SCIENCE &
ANALYTICS • Coursera > Core Concepts in Data Analysis https://www.coursera.org/course/datan > Introduction to Data Science: https://www.coursera.org/course/datasci > Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1 – 9 courses + 1 capstone project – Each course or capstone takes 4 weeks – You can do it for free or you can pay 49 USD for certification > Welcome To Process Mining: Data science in Action! https://www.coursera.org/course/procmin 7 Copyright © 2013-2014 by Teradata. All rights reserved. 1 @dspadawan
8.
DATA SCIENCE &
ANALYTICS 1 • Edx > The Analytics Edge http://www.edx.org/course/mitx/mitx-15-071x-analytics-edge- 1416 > Data, Analytics and Learning http://www.edx.org/course/utarlingtonx/utarlingtonx-link5-10x-data- analytics-2186 • Udacity $ > Intro to Data Science https://www.udacity.com/course/ud359 8 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
9.
MATH DANCE 9
Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
10.
STATISTICS COURSES •
Coursera > Data analysis and statistical inference: https://www.coursera.org/course/statistics > Statistical inference and exploratory data analysis: https://www.coursera.org/specialization/jhudatascience/1/courses • EdX > Introduction to Statistics: Descriptive Statistics http://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-stat2-1x-introduction- 1138 > Introduction to Statistics: Probability http://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-stat2-2x-introduction- 1534 > Introduction to Statistics: Inference http://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-stat2-3x-introduction- 1533 10 Copyright © 2013-2014 by Teradata. All rights reserved. 2 @dspadawan
11.
STATISTICS COURSES CONT.
2 • Udacity $ > Intro to statistics: https://www.udacity.com/course/st101 > Exploratory data analysis: https://www.udacity.com/course/ud651 > Intro to Inferential Statistics https://www.udacity.com/course/ud201 • Mathematical monk > https://www.youtube.com/playlist?list=PL17567A1A3F5DB5E4 11 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
12.
PROGRAMMING LANGUAGES •
Analysis/Data mining: > R language > Python > SQL > (Perl) > (Octave) • Big Data (Hadoop) > Java (!) > Python • Visualization > JavaScript 12 Copyright © 2013-2014 by Teradata. All rights reserved. 3 @dspadawan
13.
R LANGUAGE •
Basic info and SW > R Language: http://www.r-project.org > R Studio (IDE): http://www.rstudio.com • Courses > R Programming: https://www.coursera.org/course/rprog • Practice > Interactive courses: https://www.datacamp.com/courses > Data mining examples in R: http://www.rdatamining.com 13 Copyright © 2013-2014 by Teradata. All rights reserved. 3 @dspadawan
14.
PYTHON • Basic
info and SW: > Python language: https://www.python.org > Eclipse Python: http://pydev.org • Python for Java developers: > http://www.sthurlow.com/python • Google's Python Class > https://developers.google.com/edu/python • Code Academy Python > http://www.codecademy.com/tracks/python 14 Copyright © 2013-2014 by Teradata. All rights reserved. 3 @dspadawan
15.
OCTAVE • Basic
info and SW: > http://octave.sourceforge.net > https://gnu.org/software/octave > http://en.wikipedia.org/wiki/GNU_Octave • Coursera: > Machine learning: https://www.coursera.org/course/ml 15 Copyright © 2013-2014 by Teradata. All rights reserved. 3 Octave is mostly compatible with MatLab. @dspadawan
16.
MACHINE LEARNING COURSES
Subfield of computer science and artificial intelligence about learn from data. • Coursera > Machine Learning (Stanford): https://www.coursera.org/course/ml > Machine Learning: (University of Washington) https://www.coursera.org/course/machlearning > Practical Machine Learning (Johns Hopkins): https://www.coursera.org/course/predmachlearn – part of Data Science Specialization • Udacity > Machine Learning (Supervised, Reinforcement, Unsupervised) https://www.udacity.com/course/ud675 https://www.udacity.com/course/ud820 https://www.udacity.com/course/ud741 16 Copyright © 2013-2014 by Teradata. All rights reserved. 4A $ @dspadawan
17.
MACHINE LEARNING VIDEOS
• Udemy > Hilary Mason: An Intro to Machine Learning with Web Data https://www.udemy.com/hilary-mason-an-intro-to-machine-learning- with-web-data > Hilary Mason: Advanced Machine Learning https://www.udemy.com/hilary-mason-advanced-machine-learning/ • Mathematical monk > https://www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA • Videolectures.net > http://blog.videolectures.net/100-most-popular-machine-learning- talks-at-videolectures-net/ 17 Copyright © 2013-2014 by Teradata. All rights reserved. 4A $ @dspadawan
18.
DATA MINING COURSES
Process of discovery patterns in large data sets via machine learning or statistics. • Coursera > Mining Massive Datasets (Stanford) https://www.coursera.org/course/mmds • Udemy > Matthew Russell on Mining the Social Web https://www.udemy.com/matthew-russell-on-mining-the-social-web/ > Data Mining https://www.udemy.com/data-mining • Web page > http://www.rdatamining.com 18 Copyright © 2013-2014 by Teradata. All rights reserved. 4B $ @dspadawan
19.
DATA MINING COURSES
& TOOLS • Courses: > Data Mining with Weka: https://weka.waikato.ac.nz/dataminingwithweka/preview > More Data Mining with Weka: https://weka.waikato.ac.nz/moredataminingwithweka • Weka > SW: http://www.cs.waikato.ac.nz/ml/weka • Knime > SW: https://www.knime.org/downloads/overview • RapidMiner > Official site: http://rapidminer.com > SW: http://sourceforge.net/projects/rapidminer 19 Copyright © 2013-2014 by Teradata. All rights reserved. 4B @dspadawan
20.
TEXT MINING 5A
• R Data Mining (Word Cloud) TOP RECURRING THEMES ABOUT BIG DATA > http://www.rdatamining.com/examples/text-mining • Videolectures.net > http://videolectures.net/Top/Computer_Science/Text_Mining • Tool (Word Cloud) > Wordle.net 20 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
21.
NATURAL LANGUAGE PROCESSING
COURSES • Coursera > Natural Language Processing Subfield of computer science and artificial intelligence and linguistics. (Columbia University): https://www.coursera.org/course/nlangp > Natural Language Processing (Stanford): https://www.coursera.org/course/nlp • Deeper Learning MOOC > http://dlmooc.deeper-learning.org/ • Wikipedia > http://en.wikipedia.org/wiki/Natural_language_processing 21 Copyright © 2013-2014 by Teradata. All rights reserved. 5B @dspadawan
22.
VISUALIZATION TOOLS 6
• Tableau > http://www.tableausoftware.com > Commercial visualization software • D3.js > http://d3js.org > Data Driven document visualization library • GraphViz > http://www.graphviz.org > Graph visualization tools • Gephi > https://gephi.github.io > Visualization platform 22 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
23.
TABLEAU 6 •
Trainings > http://www.tableausoftware.com/learn/training > On demand > Live Online planned for specific topic • Download > Tableau Public: http://www.tableausoftware.com/public > Tableau Trial: http://www.tableausoftware.com/products/trial • Certification > Desktop (Qualified associate, Certified Professional) > Server (Qualified associate, Certified Professional) > http://www.tableausoftware.com/support/certification 23 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
24.
HOW BIG, IS
BIG ENOUGH? 24 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
25.
BIG DATA STUDY
7 • MOOC > http://bigdatauniversity.com > http://bigdatacourse.appspot.com • Coursera > Web Intelligence and Big Data https://www.coursera.org/course/bigdata • Udemy $ > Big Data and Hadoop Essentials https://www.udemy.com/big-data-and-hadoop-essentials-free-tutorial • Open2Study > Big Data for Better Performance http://www.open2study.com/courses/big-data-for-better-performance 25 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
26.
BIG DATA TOOLS
• Hadoop – Big Data Framework • Hive – DWH infrastructure build on top of Hadoop • HBase – Non-relational, distributed DB • Pig – Hadoop programming tool • Storm – Real time computation system for Hadoop • Solr – Search platform • Falcon – Data management and processing for Hadoop • Sqoop – CMD application for transfer data into Hadoop • Flume – Large scale log aggregation framework • Oozie – Workflow scheduler for Hadoop • Ambari – Simpler management for Hadoop clusters • Mahout – Machine Learning algorithms implemented on Hadoop • ZooKeeper – Coordination service for distributed applications • Knox - REST API Gateway for interacting with Hadoop clusters 26 Copyright © 2013-2014 by Teradata. All rights reserved. 7 @dspadawan
27.
HADOOP STUDY •
Hadoop providers > http://www.cloudera.com > http://hortonworks.com > http://www.mapr.com > http://www.teradata.com/aster • Udacity > Intro to Hadoop and MapReduce https://www.udacity.com/course/ud617 • Udemy > Become a Certified Hadoop Developer | Training | Tutorial https://www.udemy.com/hadoop-tutorial 27 Copyright © 2013-2014 by Teradata. All rights reserved. 7 There is more Hadoop providers: IBM, Pivotal, etc. $ $ @dspadawan
28.
NOT ONLY SQL
DATABASES • MongoDB – JSON document store > http://www.mongodb.com > https://university.mongodb.com • CouchDB – JSON document store > http://couchdb.apache.org • CasandraDB – High performance column oriented DB > http://cassandra.apache.org • VoltDB – In-memory database > http://voltdb.com • Redis – High performance column oriented DB > http://redis.io • NuoDB – Distributed SQL DB > http://www.nuodb.com 28 Copyright © 2013-2014 by Teradata. All rights reserved. 7 @dspadawan
29.
BIG DATA UNIVERSITY
7 • Big Data Courses path: > Big Data Fundamentals > Hadoop Fundamentals > Moving Data into Hadoop (Sqoop and Flume tools) > Query languages for Hadoop (Hive, Pig and Jaql) > SQL Access for Hadoop > Using HBase for Real-time Access to your Big Data > Accessing Hadoop Data Using Hive > Introduction to Pig > Controlling Hadoop Jobs using Oozie > Hadoop Reporting and Analysis > Introduction to MapReduce Programming • Courses are provided by IBM 29 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
30.
IT IS EVEN
BETTER, DON’T YOU THINK? 30 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
31.
CLOUDERA HADOOP 7
• Tutorials > 8 different paths > On demand and free > Lectured together with Udacity (paid on monthly basis) > http://cloudera.com/content/cloudera/en/training/courses.html > http://cloudera.com/content/cloudera/en/training/library.html • Sandbox > http://cloudera.com/content/support/en/downloads/quickstart_v ms/cdh-5-1-x1.html • Certification > 200 USD per exam > http://cloudera.com/content/cloudera/en/training/certification.ht ml 31 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
32.
HORTONWORKS HADOOP 7
• Tutorials > http://hortonworks.com/tutorials > 3 paths for – Developers – Administrators – Data Scientists • Sandbox > http://hortonworks.com/hdp/downloads • Certifications > 200 USD per exam > http://hortonworks.com/training/certification 32 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
33.
MAPR HADOOP 7
• Tutorials > https://www.mapr.com/services/mapr-academy/training-videos > 3 paths for – Developers – Administrators – Business users • Sandbox > https://www.mapr.com/products/mapr-sandbox-hadoop • Certification > For administrator only > You must pass Hadoop Cluster Administration on MapR course > https://www.mapr.com/services/mapr-academy/certification 33 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
34.
STREAMING – NO
BIG DEAL 34 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
35.
STREAMING DATA PROCESSING
• Storm (https://storm.incubator.apache.org) • Open source (ASF) real-time Hadoop • Twitter project • Spark (https://spark.apache.org) • Open source (ASF) in-memory Hadoop • Apache project • S4 (http://incubator.apache.org/s4) • Open source (ASF) processing of stream data • Yahoo project • Samza (http://samza.incubator.apache.org) • Open source processing messagining data • LinkedIn project 35 Copyright © 2013-2014 by Teradata. All rights reserved. 7 @dspadawan
36.
DATA INGESTION 8
• Techniques Process of obtaining, importing and processing data for later use or storage. > Data import and export > Data fusion – integration multiple data > Data sampling – selection of data subset (rows) > Data discovery – detection patterns in data > Exploratory data analysis – summarize main data characteristics > Feature extraction – selection of data subset (columns) > Data scrubbing – data error correction > Missing data values – data correction > Etc. 36 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
37.
DATA WRANGLING /
DATA MUNGING 9 • Coursera > Getting and Cleaning Data Converting or mapping data from one "raw" form into another format. part of Data Science Specialization https://www.coursera.org/course/getdata • Udacity $ > Data Wrangling with MongoDB https://www.udacity.com/course/ud032 • School of Data > Many different courses http://schoolofdata.org • Tools > OpenRefine, DataWrangler – clean up and transform tools > Talend, Pentaho – integration 37 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
38.
TOOLBOX 10 •
Hadoop and realtime > Apache Scibe • Machine Learning > H2O – In memory machine learning • Data Mining > Rattle – GUI for DM using R • Python and NLP > NLTK = Natural Language ToolKit for Python • R and Hadoop > RHIPE = R + Hadoop Integrated Programming Environment • Visualization > Many Eyes – Online visualization system from IBM 38 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
39.
ONLINE SOURCES •
Data Science Servers: > http://www.datasciencecentral.com > http://www.hadoop360.com > http://www.datascienceweekly.org • Aggregators > https://trello.com/b/rbpEfMld/data-science • Blogs • http://datasciencemasters.org • http://www.kdnuggets.com • http://www.zipfianacademy.com/blog/post/46864003608/a-practical-intro- to-data-science • http://datascience101.wordpress.com • http://fivethirtyeight.blogs.nytimes.com 39 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
40.
FREE BOOKS •
Data Science > Doing Data Science > Agile Data Science > Data Science for Business • Statistics > Think Stats • Programming > R language – 25 Recipes for Getting Started with R – Learning R > Python – Learning Python, 5th Edition – Think Python 40 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
41.
FREE BOOKS CONTINUED
• Machine Learning / Data Mining > Machine Learning for Hackers > Mining the Social Web • Visualization > Visualizing Data > Getting Started with D3 > Communicating Data with Tableau • Text mining / Natural Language Processing > 21 Recipes for Mining Twitter > Natural Language Processing with Python > Natural Language Annotation for Machine Learning 41 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
42.
FREE BOOKS CONTINUED
• Big Data > Hadoop: The Definitive Guide, 3rd Edition > Ethics of Big Data > Big Data Analytics with R and Hadoop • Data Ingestion > Data Analysis with Open Source Tools > Python for Data Analysis • Data Wrangling and Munging > Using OpenRefine • Toolbox $ > Getting Started with Storm > Fast Data Processing with Spark 42 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
43.
QUESTIONS AND ANSWERS
43 Copyright © 2013-2014 by Teradata. All rights reserved. By Tara Laskowski @dspadawan Contact me at datasciencepadawan@gmail.com Follow me at twitter @dspadawan Read my blog http://datasciencepadawan.blogspot.com
Hinweis der Redaktion
OK, done
OK, done
OK, done
OK, done
OK, done, code: REFcb75
OK, done
OK, done
OK, done
OK, done
OK, done
OK, done
OK, done
K-Mart predictive analysis
Jetzt herunterladen