SlideShare a Scribd company logo
1 of 48
LEN CHANG
• MACHINE LEARNING & DATA MINING
• DISTRIBUTION SYSTEM & NOSQL
• CRAWLER & CHINESE MINING
• Communication Engineering, General Study - CCU
• Software Engineering, Master Study - NCU
• Pixnet Hackathon 2014 – EXIT MINING
• Pixnet Hackathon 2015 – Spam User Detection
• Taipei Open Data Hackathon 2015
– The relation between Religion and Taipei City
• BI SYSTEM & DATA VISUALIZATION
• FINANCE & EDUCATION & ART & SPORT
• THE PLAYER OF BLIZZARD GAMES
AGENDA
• A GOOD STORY
• TOOL 1 : DATABASE
• TOOL 2 : COLLECTION AND REPLICATE.
• TOOL 3 : VISUALIZATION.
• TOOL 4: MACHINE LEARNING
• SAMPLE
• SUMMARY
A GOOD STORY
DIGITAL CUSTOMER EXPERIENCE
how much money do you want to pay ?
45 NT / Latte 95 NT / Latte
WHY ?
如果說家庭是人際交流的「第一個好去處」,而職場是
「第二個好去處」,那麼像咖啡館(如星巴克)這樣的公
共場所,就是我常提到的「第三個好去處」。咖啡館的環
境介於住家和辦公室兩者之間,既能社交,也能獨處,人
們可以在這裡與他人聯絡感情,也能重新面對自我。星巴
克的創業宗旨,就是想為一般人提供這種寶貴的機會。
~Howard Schultz
• the loyalty card
• pay in advance on mobile
• wireless device charging
Digital customer experience
Chief Digital Officer: Adam Brotman
Location
Mobile pay
loyalty card
A Good
Digital Customer Experience
Social network
BI System, Data
warehousing…etc
A GOOD STORY TELL US…
• FIND YOUR “UNIQUE CUSTOMER DATA”.
• USE “CUSTOMER DATA” TO IMPROVE “DIGITAL CUSTOMER EXPERIENCE"
• USE “DIGITAL CUSTOMER EXPERIENCE” TO HELP ORGANIZATION “MAKE MONEY”.
TOOL 1: DATABASE
OLAP AND NOSQL
Location
Mobile pay
loyalty card
A Good
Digital Customer Experience
Social network
BI System, Data
warehousing…etc
BI System, Data
warehousing…etc
Relation-DB NOSQL
How to choose ?
THE PURPOSE IS IMPORTANT
CDC
ETL
SQL
100 % accurate answer when I see the report
THE PURPOSE IS IMPORTANT
Marching Learning
Real time feedback
Real-time dashboard
less accurate, faster response when I need a rough answer
THE PURPOSE IS IMPORTANT
Marching Learning
Powerful at full-text search, weak at number computing.
THE PURPOSE IS IMPORTANT
High frequency
Real-time dashboard
To ensure accurate and speed, costing isn’t important.
DATABASE
• 100 % ACCURATE
• RELATION DATABASE
• LESS ACCURATE, MORE FASTER
• HBASE, SPARK ,CASSANDRA, MONGODB, OTHERS..
• SPECIAL CASE
• FULL-TEXTING SEARCH: ELASTICSEARCH
• ACCURATE AND SPEED: REDIS OR OTHER IN-MEMORY DB.
COLLECTION AND REPLICATE
LOGSTASH AND FLUENTD
REPLICATION TOOL
Location
Mobile pay
loyalty card
A Good
Digital Customer Experience
Social network
BI System, Data
warehousing…etc
Collection: Any Data in, Any Data out
Location
Mobile pay
loyalty card
Social network
BI System, Data
warehousing…etc
Collection: Any Data in, Any Data out
FLUENTD: BUILD YOUR UNIFIED LOGGING LAYER
LOGSTASH: COLLECT, ENRICH & TRANSPORT DATA
COMPARISON
FLUENTD
• LANG: C EMBEDDED IN RUBY
• PLATFORM: LINUX
• MAJOR OUTPUT DB: MONGODB
LOGSTASH
• LANG: JAVA
• PLATFORM: LINUX AND WINDOWS
• MAJOR OUTPUT DB: ELASTICSEARCH
• ELK ARCH.
Location
Mobile pay
loyalty card
Social network
BI System, Data
warehouse…etc
Replicate: replicate data
from DB_A to DB_B
RDB RDB
Case 1
NOSQL RDB
Case 3
Transaction
DB
NOSQL NOSQL
Case 2
ETL: Extract-Transform-Load
RDB RDB
Case 1
NOSQL NOSQL
Case 2
NOSQL RDB
Case 3
Node
PostgresNode
Node
Node
Node
mongo
COMPARISON
RDB TO RDB NOSQL TO RDBNOSQL TO NOSQL
• TRADITIONAL MECHANISM
• TO ENSURE THE “DATA
CONSISTENCY”
• FINANCIAL INDUSTRY
• HUGE DATA ANALYSIS
• LOW COSTING HARDWARE ,
POWERFUL AND FAST
COMPUTATION
• NEED PROGRAMMING SKILL,
NOT ONLY SQL
• MAKE A RDB AS A NODE OF
NOSQL CLUSTER
• MAYBE IT IS A BALANCE
BETWEEN NOSQL AND RDB
VISUALIZATION
VISUALIZE YOUR DATA
1,999 USD
MACHINE LEARNING
GENETIC ALGORITHM
Genetic algorithm
Travelling salesman problem
Self-help tourism Scheduling
Genetic Algorithm
System
Linear algebra and Probability are important
Bayesian probability Decision Tree
Regression
Support Vector Machine
SAMPLE
SOME INTERESTING APPLICATION SAMPLE ….
“
”
FINANCIAL DISTRESS PREDICTION
SYSTEM
financial index
Company Share price
Genetic Algorithm
3000 financial indices
20 financial indices
Support Vector Machine
Matlab & C# & ASP.NET
“
”
GAME TREND MONITOR SYSTEM
Crawler System
Crawler System
Crawler System
Crawler System
DB
Text Mining
System
Article =>
Emotional Value
C# & MSSQL & SSRS
DB
C# & MSSQL & SSRS
“
”
APP BEHAVIOR ANALYSIS SYSTEM
RDB
s3fs
Node
PostgresNode
Node
Node
Node
mongo
Pentaho
R
R & RUBY & MONGODB & POSTGRES & Pentaho & MOSQL & FLUENTD & s3fs
SUMMARY
FOR THE SAME THING, YOU WILL MAKE A BETTER SOLUTION OR MECHANISM WHEN YOU'RE A MULTI
DOMAIN-EXPERT.
Crawler System
Text Mining
System
Article => Emotional Value
8 years up…
Shortcut?
What’s the fastest method to understand zombie ?
Hadoop Con2015 - The Data Scientist’s Toolbox

More Related Content

Viewers also liked

UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesNed Potter
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging ChallengesAaron Irizarry
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
綠黨網路支黨部 黨員大會工作報告
綠黨網路支黨部  黨員大會工作報告綠黨網路支黨部  黨員大會工作報告
綠黨網路支黨部 黨員大會工作報告Charles Chuang
 
臺北市政府開放資料黑客松
臺北市政府開放資料黑客松臺北市政府開放資料黑客松
臺北市政府開放資料黑客松Charles Chuang
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 
2014 Pixnet Hackathonh - EXIF Mining
2014 Pixnet Hackathonh - EXIF Mining2014 Pixnet Hackathonh - EXIF Mining
2014 Pixnet Hackathonh - EXIF MiningLen Chang
 
Use Redis in Odd and Unusual Ways
Use Redis in Odd and Unusual WaysUse Redis in Odd and Unusual Ways
Use Redis in Odd and Unusual WaysItamar Haber
 
Madrid Agudelo Juliana_AporteIndividual
Madrid Agudelo Juliana_AporteIndividualMadrid Agudelo Juliana_AporteIndividual
Madrid Agudelo Juliana_AporteIndividualJuliana Madrid
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
 
DevNexus 2017 - Building and Deploying 12 Factor Apps in Scala, Java, Ruby, a...
DevNexus 2017 - Building and Deploying 12 Factor Apps in Scala, Java, Ruby, a...DevNexus 2017 - Building and Deploying 12 Factor Apps in Scala, Java, Ruby, a...
DevNexus 2017 - Building and Deploying 12 Factor Apps in Scala, Java, Ruby, a...Neil Shannon
 
Agile scrum in startup
Agile scrum in startup  Agile scrum in startup
Agile scrum in startup Len Chang
 
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic Stack
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic StackHadoop con2016 - Implement Real-time Centralized logging System by Elastic Stack
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic StackLen Chang
 
Nine Pages You Should Optimize on Your Blog and How
Nine Pages You Should Optimize on Your Blog and HowNine Pages You Should Optimize on Your Blog and How
Nine Pages You Should Optimize on Your Blog and HowLeslie Samuel
 
African Americans: College Majors and Earnings
African Americans: College Majors and Earnings African Americans: College Majors and Earnings
African Americans: College Majors and Earnings CEW Georgetown
 
The Online College Labor Market
The Online College Labor MarketThe Online College Labor Market
The Online College Labor MarketCEW Georgetown
 
GAME ON! Integrating Games and Simulations in the Classroom
GAME ON! Integrating Games and Simulations in the Classroom GAME ON! Integrating Games and Simulations in the Classroom
GAME ON! Integrating Games and Simulations in the Classroom Brian Housand
 

Viewers also liked (19)

UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
綠黨網路支黨部 黨員大會工作報告
綠黨網路支黨部  黨員大會工作報告綠黨網路支黨部  黨員大會工作報告
綠黨網路支黨部 黨員大會工作報告
 
臺北市政府開放資料黑客松
臺北市政府開放資料黑客松臺北市政府開放資料黑客松
臺北市政府開放資料黑客松
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
2014 Pixnet Hackathonh - EXIF Mining
2014 Pixnet Hackathonh - EXIF Mining2014 Pixnet Hackathonh - EXIF Mining
2014 Pixnet Hackathonh - EXIF Mining
 
Use Redis in Odd and Unusual Ways
Use Redis in Odd and Unusual WaysUse Redis in Odd and Unusual Ways
Use Redis in Odd and Unusual Ways
 
Madrid Agudelo Juliana_AporteIndividual
Madrid Agudelo Juliana_AporteIndividualMadrid Agudelo Juliana_AporteIndividual
Madrid Agudelo Juliana_AporteIndividual
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
DevNexus 2017 - Building and Deploying 12 Factor Apps in Scala, Java, Ruby, a...
DevNexus 2017 - Building and Deploying 12 Factor Apps in Scala, Java, Ruby, a...DevNexus 2017 - Building and Deploying 12 Factor Apps in Scala, Java, Ruby, a...
DevNexus 2017 - Building and Deploying 12 Factor Apps in Scala, Java, Ruby, a...
 
02 math essentials
02 math essentials02 math essentials
02 math essentials
 
Agile scrum in startup
Agile scrum in startup  Agile scrum in startup
Agile scrum in startup
 
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic Stack
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic StackHadoop con2016 - Implement Real-time Centralized logging System by Elastic Stack
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic Stack
 
Nine Pages You Should Optimize on Your Blog and How
Nine Pages You Should Optimize on Your Blog and HowNine Pages You Should Optimize on Your Blog and How
Nine Pages You Should Optimize on Your Blog and How
 
African Americans: College Majors and Earnings
African Americans: College Majors and Earnings African Americans: College Majors and Earnings
African Americans: College Majors and Earnings
 
The Online College Labor Market
The Online College Labor MarketThe Online College Labor Market
The Online College Labor Market
 
GAME ON! Integrating Games and Simulations in the Classroom
GAME ON! Integrating Games and Simulations in the Classroom GAME ON! Integrating Games and Simulations in the Classroom
GAME ON! Integrating Games and Simulations in the Classroom
 

Similar to Hadoop Con2015 - The Data Scientist’s Toolbox

DataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetupDataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetupFrancisco Liwa
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoSpark Summit
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Inside Analysis
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to GraphsNeo4j
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteJoel Natividad
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
Neo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to GraphsNeo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to GraphsNeo4j
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...Neo4j
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2Neo4j
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
Keynote: GraphTour Toronto
Keynote: GraphTour TorontoKeynote: GraphTour Toronto
Keynote: GraphTour TorontoNeo4j
 
Доклад Владимира Бичева на третьем митапе сообщества блокчейн-разработчиков С...
Доклад Владимира Бичева на третьем митапе сообщества блокчейн-разработчиков С...Доклад Владимира Бичева на третьем митапе сообщества блокчейн-разработчиков С...
Доклад Владимира Бичева на третьем митапе сообщества блокчейн-разработчиков С...Дмитрий Плахов
 
Digital Transformation: Why Public Sector Customers are Moving to the Cloud
Digital Transformation: Why Public Sector Customers are Moving to the CloudDigital Transformation: Why Public Sector Customers are Moving to the Cloud
Digital Transformation: Why Public Sector Customers are Moving to the CloudAmazon Web Services
 

Similar to Hadoop Con2015 - The Data Scientist’s Toolbox (20)

DataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetupDataScience and BigData Cebu 1st meetup
DataScience and BigData Cebu 1st meetup
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion
 
Introduction: Relational to Graphs
Introduction: Relational to GraphsIntroduction: Relational to Graphs
Introduction: Relational to Graphs
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Neo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to GraphsNeo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to Graphs
 
Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...Geschäftliches Potential für System-Integratoren und Berater -  Graphdatenban...
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Keynote: GraphTour Toronto
Keynote: GraphTour TorontoKeynote: GraphTour Toronto
Keynote: GraphTour Toronto
 
Доклад Владимира Бичева на третьем митапе сообщества блокчейн-разработчиков С...
Доклад Владимира Бичева на третьем митапе сообщества блокчейн-разработчиков С...Доклад Владимира Бичева на третьем митапе сообщества блокчейн-разработчиков С...
Доклад Владимира Бичева на третьем митапе сообщества блокчейн-разработчиков С...
 
Digital Transformation: Why Public Sector Customers are Moving to the Cloud
Digital Transformation: Why Public Sector Customers are Moving to the CloudDigital Transformation: Why Public Sector Customers are Moving to the Cloud
Digital Transformation: Why Public Sector Customers are Moving to the Cloud
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Recently uploaded (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Hadoop Con2015 - The Data Scientist’s Toolbox

Editor's Notes

  1. Introduction (5 min) TOOL 1 (5 min) TOOL 2 (10 min) TOOL 3 (10 min) SUMMARY (5 min)
  2. Question 1: Why we feel a thing which Starbucks latte is more expensive is reasonable ? Question 2: Have anyone can identify what different between general latte and Starbucks latte ? Question 3: So, What Starbucks do something for this? (Animation)
  3. Starbucks 1.0 : The relation between person and person. Starbucks 2.0 : Make customers a good digital experience.
  4. 不存在一個完美的資料庫,每種資料庫都有其擅長與不擅長的地方。 以銀行業而言,資料不容許出錯,就算是報表也是一樣。這就不適合利用 擁有 “弱”一致性的 NOSQL,而是必須要使用 “強”一致性的 RDB。 因銀行業有其固定的看報表時間,所以可以利用其他時間跑大量的程序,甚至建立許多的 Cube 供報表使用。
  5. - 以手機語音小幫手為例,跟上述銀行業最大的差別就在於,些微的資料偏差對於分析來說是沒有太大的影響的,此時,我們就可以利用到NOSQL 的大規模運算能力去快速的獲取我們所需要的答案。
  6. 有的時候,某些 NOSQL 是為了處理一些特殊情況而被設計出來的,譬如: 文字檢索。 Elasticsearch 的文字檢索功能非常強大而且快速,可以說整個資料庫就是為了文字檢索而生的。但其對於數值處理方面卻不是很擅長。
  7. 資料格式簡單,但需與前台和後端做大規模的資料頻繁更新與一致性確認。 使用 in-memory database
  8. - 不管是個人出於興趣作分析,或者是當數據顧問。或者是人數5~10人的新創小公司,這套工具可以幫助你大幅增加判斷的準確度和減少大幅的內部 IT 視覺化工具開發。划算的投資。
  9. Monitor 專用
  10. - 不要浪費了分散式系統提供給我們將近無窮無盡的運算能力
  11. - 沒有捷徑
  12. - Domain Knowledge isn’t tool. It’s common sense.