SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Big Data & Hadoop
Semantic Web Meetup

Jean-Pierre König
03. Oktober 2013
COMPANY
PROFILE
WE ARE HERE
Vom Standort Kreuzlingen / Schweiz
bedient YMC seit 2001 namhafte
nationale und internationale Kunden.
WE WORK WITH
Customers
WE WORK WITH
Partners
WE CREATE

Hosting &
Support

Web-Strategien

Social-Media-Anwendungen
(z.B. Corporate Blogs, Wikis, Facebook-Apps etc.)

Shop-Systeme, Websites, Intranets

Kundenspezifische
Individuallösungen fürs Web
WEB
SOLUTIONS

Empfehlungssysteme
(z.B. für Apps, Webshops, Websites und Intranet)

Mobile Strategien

MOBILE
APPLICATIONS

BIG DATA
ANALYTICS

Apps für Tablets und Smartphones
(iPhone, Android)

Massgeschneiderte Web Analytics Systeme
(z.B. mit Echtzeit-Metriken und Effekten in
Sozialen Netzwerken)

Integration von Sozialen Netzwerken wie
Facebook und Twitter

Geolokalisierung für
ortsspezifische Services

Vorhersagemodelle
(z.B. für Interessen von App-Usern)

Training
(Apache Hadoop)

Integrierte Suchsysteme
(z.B. auch für unstrukturierte Daten)
WHAT IS
BIG DATA
WHAT IS BIG DATA
§  More general
§  When data sets become so large and complex that it
becomes difficult to process, including capture, curation,
storage, search, sharing, transfer, analysis, and
visualization
§  It is difficult to work with using most RDBMS, statistic and
visualization systems
§  It requires massively parallel software running on tens,
hundreds, or even thousands of servers

§  The 3 V’s by Gartner
§  Big data is high volume, high velocity, and/or high variety
information assets that require new forms of processing to
enable enhanced decision making, insight discovery and
process optimization. (2012)
WHAT DRIVES BIG DATA
§  Human-generated data
§  Documents, transaction data, CRM, social media
- your working life is devoted to looking at
screens and typing more data into some system.

§  Sensor-generated data
§  There is the trend that a large part of the physical
world around us will eventually somehow be
online – The Internet of Things.

§  Machine-generated data will quickly top
human-generated data
DRIVERS
BUSINESS DRIVES
Fraud protection
Risk management
Environment Safety

Increase
Revenue

Risk
Prevention

360° Customer Experience Management
Digital Security
Social Media Analysis
Infrastructure Observation
(Mass) Personalization
Recommendation Engines
Data as a Service
Research

Improve
DecisionMaking

Data Aggregation
Sampling
Web Archives
Predictive Analytics
Data Pre-processing
Video, Audio & Image Processing
Infrastructure Management
THE EMERGING SOLUTIONS
§  NoSQL* Movement
§  NoSQL databases are finding significant and growing
industry use in big data and real-time web applications.

§  Hadoop and it’s ecosystem
§  Enterprise-grade solutions, consulting, support
§  Top 3 vendors: Cloudera, Hortonworks, MapR
§  Adoption throughout the software industry, e.g. IBM
BigInsights, Microsoft HDInsight, Oracle Big Data
Appliance, EMC/Spring/VMWare Pivotal HD, HP HAVEn,
Intel Distribution, Dell w/Cloudera

Also referred to as "Not only SQL"
HADOOP
IN A NUTSHELL
WHAT IS HADOOP
§  An open-source implementation of frameworks
for reliable, scalable, distributed computing and
data storage Official Hadoop website
§  A reliable shared storage and analysis system
O‘Reilly: Hadoop – The Definitive Guide

§  A free, Java-based programming framework that
supports the processing of large data sets in a
distributed computing environment Margaret Rouse
§  A complete, open-source ecosystem for
capturing, organizing, storing, searching, sharing,
analyzing, visualizing, and ... Jack Norris
A BRIEF HISTORY OF HADOOP
§  In 2002 Doug Cutting* started with Nutch, a open source web
search engine
§  Fortunately Google published papers, that
§ 

describes the architecture of their distributed filesystem, called GFS
(2003)
§  introduced MapReduce (2004)

§  In 2005 Nutch released a new version with NDFS and
MapReduce and moved out to form an independent subproject
called Hadoop in 2006
§  Cutting joined Yahoo! to build and run Hadoop at web scale
§  In 2008 Hadoop became a top-level Apache project and it was
used at Yahoo! (10k cores), Last.fm, Facebook and New York
Times
*Doug Cutting is also the creator of Apache Lucene
HADOOP IN A NUTSHELL
§  HDFS
§  A distributed file system for storage
§  Is highly fault-tolerant and is designed to be
deployed on low-cost/commodity hardware
§  1 Master called NameNode, many DataNodes(10+)

§  MapReduce
§  A batch query processor to run an ad hoc query
against your whole dataset and get the results in a
reasonable time
§  1 Master called JobTracker, many TaskTrackers (10+)
HADOOP FACT-SHEET
HDFS/distributed storage
§  Economical
§  Commodity hardware

§  Scalable
§  Rebalances data on new nodes

§  Fault Tolerant
§  Detects faults and auto recovers

§  Reliable
§  Maintains multiple copies of data

§  High throughput
§  Because data is distributed

MapReduce/distributed processing
§  Economical
§  Commodity hardware

§  Scalable
§  Add notes to increase parallelism

§  Fault tolerant
§  Auto-recover job failures

§  Data locality
§  Process where the data resides
HADOOP PRINCIPLES
§  Schema on read
§  Data locality
§  No shared memory or disks
§  Scales out to thousands of servers
HADOOP
HADOOP SYSTEM COMPENENTS
Masters

Slaves
(many of them)

HDFS

NameNode

MapReduce

JobTracker

Secondary NameNode

DataNode

TaskTracker
WRITING FILES ON HDFS*
OK, write to DataNodes
1, 5 and 9.

He, i want to write A, B
and C of my File.txt.
File.txt

NameNode
Block A

Client

Block B
Block C

DataNode 6

DataNode 1

DataNode 5

Block A

Block B

Block C`

Block B`

Block A`

Block A`

Block C`
Rack 1

* Replication Factor of 3

Rack 2

DataNode 9
Block C

...

DataNode N
Block B`
READING FILES FROM HDFS
Tell me the block
locations of File.txt.

A à DataNode 1,5,6
B à DataNode 1,5,N
C à DataNode 5,9,6

NameNode
Client

DataNode 6

DataNode 1

DataNode 5

Block A

Block B

Block C`

Block B`

Block A`

Block A`

Block C`
Rack 1

Rack 2

DataNode 9
Block C

...

DataNode N
Block B`
MAPREDUCE IN A NUTSHELL
Input

Split

Deer Car Bear

Word Count Example

Bear, 2

Car, 3

Deer, 1
Deer, 1

Car Car River

Reduce

Car, 1
Car, 1
Car, 1
Deer Bear River
Car Car River
Deer Car Bear

Shuffle
Bear, 1
Bear, 1

Deer Bear River

Map

Deer, 2

River, 1
River, 1

River, 2

Result

Deer, 1
Bear, 1
River, 1

Bear, 2
Car, 3
Deer, 2
River, 2

Car, 1
Car, 1
River, 1

Deer, 1
Car, 1
Bear, 1
MAPREDUCE VS. RDBMS
§  RDBMS
§ 

In a centralized database system, you’ve got one big disk connected to
4 or 8 or 16 big processors.

§  MapReduce
§ 

In a Hadoop cluster, every server has 2 or 4 or 8 CPUs. You can run
your job by sending your code to each of the dozens of servers in your
cluster, and each server operates on its own little piece of the data.
Results are then delivered back to you in a unified whole. You map the
operation out to all of those servers and then you reduce the results
back into a single result set.

§  Architecturally, the reason you’re able to deal with lots of data is
because Hadoop spreads it out. And the reason you’re able to
ask complicated computational questions is because you’ve got
all of these processors, working in parallel, harnessed together.
ECOSYSTEM
HADOOP
HADOOP ECOSYSTEM
HADOOP’S DATABASE HBASE*
§  Unlike RDMS
§  No secondary indexes
§  No transactions
§  De-normalized, Schema less

§  Random read/write access to big data
§  Billions of rows and millions of columns
§  Automatic data sharding
§  Integrates with MapReduce
* Modeled after Google’s BigTable
USE CASES
HADOOP
USE CASES
Data Warehousing

§  Complementary ETL process
File
Server

Analytics

OLTP
Data
Warehouse

ETL

Visualization

CRM
Reports

ERP

Data Marts
Data Cubes

...
Logs Logs
Logs

PIG
Social
Media
Sensors

...

Sqoop
Flume
Java API

Hive

MapReduce

HDFS
USE CASES
Data Warehousing

§  Substitutive ETL process
File
Server

Analytics

OLTP
Hadoop

Data
Warehouse

Visualization

CRM
ERP

...
Logs Logs
Logs

Social
Media
Sensors

...

Reports
USE CASES
Data Warehousing

§  (Predictive) Analytics at scale
File
Server

Analytics

OLTP
Hadoop

Data
Warehouse

Visualization

CRM
ERP

...
Lo Logs
Logs
gs
Social
Media
Sensors

...

Reports
USE CASES
Data Warehousing

§  Machine Learning, Natural language processing, sentiment at scale
File
Server

OLTP

Analytics

ML +NLP

*
Hadoop

Data
Warehouse

Visualization

CRM
ERP

Reports

...
Lo Logs
Logs
gs
Social
Media
Sensors

...

* Personalized recommendations
§  content, products, services …
THANK
YOU!
CONTACT US
jean-pierre.koenig@ymc.ch
Tel. +41 (0)71 508 24 86
www.ymc.ch
@YMC_Big_Data

YMC AG
Sonnenstrasse 4
CH-8280 Kreuzlingen
Switzerland

Photo Credits:
Slide 03: Matterhorn and Lake by Noel Reynolds
Slde 24: Hadoop Ecosystem by Rishu Shrivastava

Weitere ähnliche Inhalte

Was ist angesagt?

Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsDeep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsMark Rittman
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big dataYukti Kaura
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with ExamplesJoe McTee
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Daniel Abadi
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Imviplav
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 

Was ist angesagt? (20)

Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsDeep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data Connectors
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoop
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Flexible Design
Flexible DesignFlexible Design
Flexible Design
 

Andere mochten auch

Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionTianwei Liu
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of HadoopNam Nham
 
Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to SolrTNR Global
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopGrant Ingersoll
 

Andere mochten auch (6)

Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
 
Migration from FAST ESP to Solr
Migration from FAST ESP to SolrMigration from FAST ESP to Solr
Migration from FAST ESP to Solr
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
 

Ähnlich wie Semantic web meetup 14.november 2013

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopArchana Gopinath
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoopdatabloginfo
 

Ähnlich wie Semantic web meetup 14.november 2013 (20)

Hadoop
HadoopHadoop
Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop
HadoopHadoop
Hadoop
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 

Kürzlich hochgeladen

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Kürzlich hochgeladen (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

Semantic web meetup 14.november 2013

  • 1. Big Data & Hadoop Semantic Web Meetup Jean-Pierre König 03. Oktober 2013
  • 3. WE ARE HERE Vom Standort Kreuzlingen / Schweiz bedient YMC seit 2001 namhafte nationale und internationale Kunden.
  • 6. WE CREATE Hosting & Support Web-Strategien Social-Media-Anwendungen (z.B. Corporate Blogs, Wikis, Facebook-Apps etc.) Shop-Systeme, Websites, Intranets Kundenspezifische Individuallösungen fürs Web WEB SOLUTIONS Empfehlungssysteme (z.B. für Apps, Webshops, Websites und Intranet) Mobile Strategien MOBILE APPLICATIONS BIG DATA ANALYTICS Apps für Tablets und Smartphones (iPhone, Android) Massgeschneiderte Web Analytics Systeme (z.B. mit Echtzeit-Metriken und Effekten in Sozialen Netzwerken) Integration von Sozialen Netzwerken wie Facebook und Twitter Geolokalisierung für ortsspezifische Services Vorhersagemodelle (z.B. für Interessen von App-Usern) Training (Apache Hadoop) Integrierte Suchsysteme (z.B. auch für unstrukturierte Daten)
  • 8. WHAT IS BIG DATA §  More general §  When data sets become so large and complex that it becomes difficult to process, including capture, curation, storage, search, sharing, transfer, analysis, and visualization §  It is difficult to work with using most RDBMS, statistic and visualization systems §  It requires massively parallel software running on tens, hundreds, or even thousands of servers §  The 3 V’s by Gartner §  Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. (2012)
  • 9. WHAT DRIVES BIG DATA §  Human-generated data §  Documents, transaction data, CRM, social media - your working life is devoted to looking at screens and typing more data into some system. §  Sensor-generated data §  There is the trend that a large part of the physical world around us will eventually somehow be online – The Internet of Things. §  Machine-generated data will quickly top human-generated data
  • 10. DRIVERS BUSINESS DRIVES Fraud protection Risk management Environment Safety Increase Revenue Risk Prevention 360° Customer Experience Management Digital Security Social Media Analysis Infrastructure Observation (Mass) Personalization Recommendation Engines Data as a Service Research Improve DecisionMaking Data Aggregation Sampling Web Archives Predictive Analytics Data Pre-processing Video, Audio & Image Processing Infrastructure Management
  • 11. THE EMERGING SOLUTIONS §  NoSQL* Movement §  NoSQL databases are finding significant and growing industry use in big data and real-time web applications. §  Hadoop and it’s ecosystem §  Enterprise-grade solutions, consulting, support §  Top 3 vendors: Cloudera, Hortonworks, MapR §  Adoption throughout the software industry, e.g. IBM BigInsights, Microsoft HDInsight, Oracle Big Data Appliance, EMC/Spring/VMWare Pivotal HD, HP HAVEn, Intel Distribution, Dell w/Cloudera Also referred to as "Not only SQL"
  • 13. WHAT IS HADOOP §  An open-source implementation of frameworks for reliable, scalable, distributed computing and data storage Official Hadoop website §  A reliable shared storage and analysis system O‘Reilly: Hadoop – The Definitive Guide §  A free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment Margaret Rouse §  A complete, open-source ecosystem for capturing, organizing, storing, searching, sharing, analyzing, visualizing, and ... Jack Norris
  • 14. A BRIEF HISTORY OF HADOOP §  In 2002 Doug Cutting* started with Nutch, a open source web search engine §  Fortunately Google published papers, that §  describes the architecture of their distributed filesystem, called GFS (2003) §  introduced MapReduce (2004) §  In 2005 Nutch released a new version with NDFS and MapReduce and moved out to form an independent subproject called Hadoop in 2006 §  Cutting joined Yahoo! to build and run Hadoop at web scale §  In 2008 Hadoop became a top-level Apache project and it was used at Yahoo! (10k cores), Last.fm, Facebook and New York Times *Doug Cutting is also the creator of Apache Lucene
  • 15. HADOOP IN A NUTSHELL §  HDFS §  A distributed file system for storage §  Is highly fault-tolerant and is designed to be deployed on low-cost/commodity hardware §  1 Master called NameNode, many DataNodes(10+) §  MapReduce §  A batch query processor to run an ad hoc query against your whole dataset and get the results in a reasonable time §  1 Master called JobTracker, many TaskTrackers (10+)
  • 16. HADOOP FACT-SHEET HDFS/distributed storage §  Economical §  Commodity hardware §  Scalable §  Rebalances data on new nodes §  Fault Tolerant §  Detects faults and auto recovers §  Reliable §  Maintains multiple copies of data §  High throughput §  Because data is distributed MapReduce/distributed processing §  Economical §  Commodity hardware §  Scalable §  Add notes to increase parallelism §  Fault tolerant §  Auto-recover job failures §  Data locality §  Process where the data resides
  • 17. HADOOP PRINCIPLES §  Schema on read §  Data locality §  No shared memory or disks §  Scales out to thousands of servers
  • 18. HADOOP HADOOP SYSTEM COMPENENTS Masters Slaves (many of them) HDFS NameNode MapReduce JobTracker Secondary NameNode DataNode TaskTracker
  • 19. WRITING FILES ON HDFS* OK, write to DataNodes 1, 5 and 9. He, i want to write A, B and C of my File.txt. File.txt NameNode Block A Client Block B Block C DataNode 6 DataNode 1 DataNode 5 Block A Block B Block C` Block B` Block A` Block A` Block C` Rack 1 * Replication Factor of 3 Rack 2 DataNode 9 Block C ... DataNode N Block B`
  • 20. READING FILES FROM HDFS Tell me the block locations of File.txt. A à DataNode 1,5,6 B à DataNode 1,5,N C à DataNode 5,9,6 NameNode Client DataNode 6 DataNode 1 DataNode 5 Block A Block B Block C` Block B` Block A` Block A` Block C` Rack 1 Rack 2 DataNode 9 Block C ... DataNode N Block B`
  • 21. MAPREDUCE IN A NUTSHELL Input Split Deer Car Bear Word Count Example Bear, 2 Car, 3 Deer, 1 Deer, 1 Car Car River Reduce Car, 1 Car, 1 Car, 1 Deer Bear River Car Car River Deer Car Bear Shuffle Bear, 1 Bear, 1 Deer Bear River Map Deer, 2 River, 1 River, 1 River, 2 Result Deer, 1 Bear, 1 River, 1 Bear, 2 Car, 3 Deer, 2 River, 2 Car, 1 Car, 1 River, 1 Deer, 1 Car, 1 Bear, 1
  • 22. MAPREDUCE VS. RDBMS §  RDBMS §  In a centralized database system, you’ve got one big disk connected to 4 or 8 or 16 big processors. §  MapReduce §  In a Hadoop cluster, every server has 2 or 4 or 8 CPUs. You can run your job by sending your code to each of the dozens of servers in your cluster, and each server operates on its own little piece of the data. Results are then delivered back to you in a unified whole. You map the operation out to all of those servers and then you reduce the results back into a single result set. §  Architecturally, the reason you’re able to deal with lots of data is because Hadoop spreads it out. And the reason you’re able to ask complicated computational questions is because you’ve got all of these processors, working in parallel, harnessed together.
  • 25. HADOOP’S DATABASE HBASE* §  Unlike RDMS §  No secondary indexes §  No transactions §  De-normalized, Schema less §  Random read/write access to big data §  Billions of rows and millions of columns §  Automatic data sharding §  Integrates with MapReduce * Modeled after Google’s BigTable
  • 27. USE CASES Data Warehousing §  Complementary ETL process File Server Analytics OLTP Data Warehouse ETL Visualization CRM Reports ERP Data Marts Data Cubes ... Logs Logs Logs PIG Social Media Sensors ... Sqoop Flume Java API Hive MapReduce HDFS
  • 28. USE CASES Data Warehousing §  Substitutive ETL process File Server Analytics OLTP Hadoop Data Warehouse Visualization CRM ERP ... Logs Logs Logs Social Media Sensors ... Reports
  • 29. USE CASES Data Warehousing §  (Predictive) Analytics at scale File Server Analytics OLTP Hadoop Data Warehouse Visualization CRM ERP ... Lo Logs Logs gs Social Media Sensors ... Reports
  • 30. USE CASES Data Warehousing §  Machine Learning, Natural language processing, sentiment at scale File Server OLTP Analytics ML +NLP * Hadoop Data Warehouse Visualization CRM ERP Reports ... Lo Logs Logs gs Social Media Sensors ... * Personalized recommendations §  content, products, services …
  • 32. CONTACT US jean-pierre.koenig@ymc.ch Tel. +41 (0)71 508 24 86 www.ymc.ch @YMC_Big_Data YMC AG Sonnenstrasse 4 CH-8280 Kreuzlingen Switzerland Photo Credits: Slide 03: Matterhorn and Lake by Noel Reynolds Slde 24: Hadoop Ecosystem by Rishu Shrivastava