SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Big Data in Action
Ngon Pham, Lana Engineer
Introduction
●
●
●
●
●

Introduction
Problem
Approach
Demo
Big Data in Vietnam
Introduction
● Internet-enabled devices
○ Tons of data generated every second

● Hardware becomes much cheaper
○ We can now store and process much more data
Problem
● How to process 10TB, how long and how
much?
○ Assume
■ Amazon EC2
■ HDD read at 50MB/s
■ Computation time is less than I/O time
Problem
● 1 machine, 1 core, 1 HDD
○ Time: 55.56 hours
○ Amazon Cost: $0.12 x 55.56 = $6.67

● 10 machines, 40 cores, 40 HDD
○ Time: 1.39 hours
○ Amazon Cost: $0.48 x 10 x 1.39 = $6.67
⇒ The same cost but 40x faster
Question
● How to divide data/process between
machines?
● How to make each process read data inside
the machine directly instead of another?
● How to replicate data, restore the process if
there is failure?
● Lots of task management questions...
Approach
● Hadoop
● MongoDB
● Spark
Hadoop Approach
● Storage
○ HDFS
Hadoop Approach
● Computation
○ MapReduce
MongoDB Approach
● Storage
○ Document
MongoDB Approach
● Computation
○ SQL
○ Aggregation
○ MapReduce
Spark Approach
● Storage
○ Resilient distributed
dataset (RDD)
○ Persistent backed by
HDFS / HBase...
Spark Approach
● Computation
○ Mixed
○ In-memory
computing
Demo
● Hadoop
○ Run script to create Amazon cluster
○ Play with Hadoop / HDFS / Spark
○ Process Wikipedia data

● MongoDB
○ Collect data from different sources and analyze
Big Data in Vietnam
Big Data in Vietnam
● Why is MongoDB popular?
○ Lots of PHP developers prefer
○ Simple to setup and use
○ Similar to MySQL
Big Data in Vietnam
● Hadoop is used by a few big local online
companies & international startups
○ Analyze tons of data
○ Create new competitive advantage
⇒ But there is a big shortage of skilled engineers
Q&A

Q&A

Weitere ähnliche Inhalte

Ähnlich wie Big Data in Action

The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingMartinStrycek
 
Big data on google platform dev fest presentation
Big data on google platform   dev fest presentationBig data on google platform   dev fest presentation
Big data on google platform dev fest presentationPrzemysław Pastuszka
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingErik Osterman
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheapMarc Cluet
 
Bandwidth, Throughput, Iops, And Flops
Bandwidth, Throughput, Iops, And FlopsBandwidth, Throughput, Iops, And Flops
Bandwidth, Throughput, Iops, And Flopsbillmenger
 
TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseHakan Ilter
 
Beyond 1000 bosh Deployments
Beyond 1000 bosh DeploymentsBeyond 1000 bosh Deployments
Beyond 1000 bosh Deploymentsanynines GmbH
 
Unit 1 - Computer memory.pptx
Unit 1 - Computer memory.pptxUnit 1 - Computer memory.pptx
Unit 1 - Computer memory.pptxMrAdhit1
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Web performance optimization - MercadoLibre
Web performance optimization - MercadoLibreWeb performance optimization - MercadoLibre
Web performance optimization - MercadoLibrePablo Moretti
 
From nothing to a video under 2 seconds / Mikhail Sychev (YouTube)
From nothing to a video under 2 seconds / Mikhail Sychev  (YouTube)From nothing to a video under 2 seconds / Mikhail Sychev  (YouTube)
From nothing to a video under 2 seconds / Mikhail Sychev (YouTube)Ontico
 
Harnessing the cloud_for_saa_s_hosted_platfor
Harnessing the cloud_for_saa_s_hosted_platforHarnessing the cloud_for_saa_s_hosted_platfor
Harnessing the cloud_for_saa_s_hosted_platforLuke Summerfield
 
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.comOpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.comOpenNebula Project
 
Scaling up with Aerospike!
Scaling up with Aerospike!Scaling up with Aerospike!
Scaling up with Aerospike!Anshu Prateek
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud ComputingBryan Bende
 

Ähnlich wie Big Data in Action (20)

The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
Big data on google platform dev fest presentation
Big data on google platform   dev fest presentationBig data on google platform   dev fest presentation
Big data on google platform dev fest presentation
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 
Bandwidth, Throughput, Iops, And Flops
Bandwidth, Throughput, Iops, And FlopsBandwidth, Throughput, Iops, And Flops
Bandwidth, Throughput, Iops, And Flops
 
TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use Case
 
Beyond 1000 bosh Deployments
Beyond 1000 bosh DeploymentsBeyond 1000 bosh Deployments
Beyond 1000 bosh Deployments
 
Unit 1 - Computer memory.pptx
Unit 1 - Computer memory.pptxUnit 1 - Computer memory.pptx
Unit 1 - Computer memory.pptx
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Web performance optimization - MercadoLibre
Web performance optimization - MercadoLibreWeb performance optimization - MercadoLibre
Web performance optimization - MercadoLibre
 
From nothing to a video under 2 seconds / Mikhail Sychev (YouTube)
From nothing to a video under 2 seconds / Mikhail Sychev  (YouTube)From nothing to a video under 2 seconds / Mikhail Sychev  (YouTube)
From nothing to a video under 2 seconds / Mikhail Sychev (YouTube)
 
Harnessing the cloud_for_saa_s_hosted_platfor
Harnessing the cloud_for_saa_s_hosted_platforHarnessing the cloud_for_saa_s_hosted_platfor
Harnessing the cloud_for_saa_s_hosted_platfor
 
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.comOpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com
OpenNebulaConf2018 - Our Journey to OpenNebula - Germán Gutierrez - Booking.com
 
Scaling up with Aerospike!
Scaling up with Aerospike!Scaling up with Aerospike!
Scaling up with Aerospike!
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud Computing
 
MapReduce
MapReduceMapReduce
MapReduce
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Big Data in Action