SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
Hands On Big Data:
Getting Started With
NoSQL And Hadoop
Mario Cartia
mario@big-data.ninja
Big Data Facts
•  Google processes about 20Pb (E+15
bytes) of data each day
•  About 5Eb (Exabytes, E+18 bytes) of data
in the world. 90% generated over last 2
years
•  Wearable computing and IoT…
Big Data: 3V Model
•  Big Data it’s not only about volume
– Volume
>= Petabytes, not Gigabytes
– Variety
Structured and unstructured data
– Velocity
Real-time or near real-time
Big Data
Risk
Big Data
Opportunity
Big Data Facts
Big Data Success Stories
Amazon.com, a pioneer of targeted
advertising became a big data user when Greg
Linden, one of its software engineers realized
the potential of book reviewing from the
average results of their in-house review project
When Amazon compared the results of the
computer sales against the in house reviews,
the results were much better for the data-
derived material, and revolutionized e-
commerce
Big Data Success Stories
Google Flu Trends is a web service
operated by Google. It provides
estimates of influenza activity for more
than 25 countries. By aggregating Google
search queries, it attempts to make
accurate predictions about flu activity
In the 2009 flu pandemic Google Flu
Trends tracked information about flu in
the United States. In February 2010, the
CDC identified influenza cases spiking in
the mid-Atlantic region of the United
States. However, Google’s data of search
queries about flu symptoms was able to
show that same spike two weeks prior to
the CDC report being released
Big Data Success Stories
reCAPTCHA is a user-dialogue system originally
developed by Luis von Ahn, Ben Maurer, Colin
McMillen, David Abraham and Manuel Blum at
Carnegie Mellon University's main Pittsburgh
campus, and acquired by Google in September
2009
The reCAPTCHA service supplies subscribing
websites with images of words that optical
character recognition (OCR) software has been
unable to read. The subscribing websites present
these images for humans to decipher as
CAPTCHA words, as part of their normal
validation procedures. They then return the results
to the reCAPTCHA service, which sends the
results to the digitization projects
Secondary
data
usage
Big Data Techniques
Statistics
Data Warehouse Data Visualization
Data Mining
Prediction Machine Learning
Advanced Analytics
Correlation Analysis
Business Intelligence
The Traditional Approach
ETL: Extract, Transform, Load
•  Extracts data from outside sources
•  Transforms it to fit operational needs,
which can include quality levels
•  Loads it into the end target (database,
operational data store, data mart or data
warehouse)
Does it fit “big data” needs?
Hadoop Basics
Apache Hadoop is an open-source
software framework for distributed
storage and distributed processing
of Big Data on clusters of
commodity hardware
Hadoop Basics
Hadoop was created by Doug
Cutting and Mike Cafarella in 2005.
Cutting, who was working at
Yahoo! at the time named it after
his son's toy elephant
Hadoop 1 vs. Hadoop 2
Hadoop Distributions
Hadoop Market
Hadoop vs. RDBMS
From RDBMS to NoSQL
A NoSQL (often interpreted as Not
Only SQL) database provides a
mechanism for storage and
retrieval of data that is modeled in
means other than the tabular
relations used in relational
databases
From RDBMS to NoSQL
Motivations for this approach include
simplicity of design, horizontal scaling
and finer control over availability. The
data structure (e.g. key-value, graph, or
document) differs from the RDBMS,
and therefore some operations are
faster in NoSQL and some in RDBMS
NoSQL Approaches
Most popular NoSQL database types
•  Document (MongoDB, CouchDB, Clusterpoint,
Couchbase, MarkLogic, etc.)
•  Key-value (Redis, MemcacheDB, Dynamo,
FoundationDB, Riak, FairCom c-treeACE,
Aerospike, etc.)
•  Column (Accumulo, Cassandra, Druid, HBase,
Vertica, etc.)
•  Graph (Allegro, Neo4J, InfiniteGraph,
OrientDB, Virtuoso, Stardog, etc.)
NoSQL Approaches
NoSQL How To Choose(Brewer)
CAP theorem (Brewer)
Hadoop Architecture Overview
Hadoop Core Components
MapReduce Model
•  MapReduce is a programming model, and an
associated implementation, for processing and
generating large data sets with a parallel,
distributed algorithm on a cluster
•  The model is inspired by the map and reduce
functions commonly used in functional
programming, although their purpose in the
MapReduce framework is not the same as in their
original forms
MapReduce Paper
MapReduce Overview
•  Map step: Each worker node applies the map()
function to the local data, and writes the output to a
temporary storage. A master node orchestrates that
for redundant copies of input data, only one is
processed
•  Shuffle step: Worker nodes redistribute data based
on the output keys (produced by the map()
function), such that all data belonging to one key is
located on the same worker node
•  Reduce step: Worker nodes now process each
group of output data, per key, in parallel
Map Reduce: A really simple
introduction
Dear <Your Name>,
As you know we are building the blogging platform
blogger2.com, I need some statistics. I need to find out,
Acorss all blogs ever wrriten on blogger.com, how many times 1
character words occur(like 'a', 'I'), How many times two
character words occur (like 'be', 'is').. and so on till how
many times do ten character words occur.
I know its a really big job. So, I will assign, all 50,000
employees working in our company to work with you on this for
a week. I am going on a vacation for a week, and its really
important that I've this when I return. Good luck.
regds,
The CEO
(src: http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/)
Map Reduce: A really simple
introduction
The next day, You stand with a mike on the dias
before 50,000 and proclaim. For a week, you will all
be divided into many groups:
•  The Mappers (tens of Thousands of people will be
in this group)
•  The Grouper (Assume just one guy for now)
•  The Reducers ( Around 10 of em.) and..
•  The Master (That’s you)
Map Reduce: A really simple
introduction
•  Each mapper will get a set of 50 blog urls and really
Big sheet of paper. Each one of you need to go to
each of that url. and for each word in those blogs,
write one line on the paper. The format of that line
should be the number of characters in the word, then
a commna, and then the actual word
•  For example, if you find the word “a”, you write “1,a”,
in a new line in your paper. since the word “a” has
only 1 character. If you find the word “hello”, you
write “5,hello” on the new line
Map Reduce: A really simple
introduction
Each take 4 days. So, After 4 days, your sheet might
look like this
•  “1,a”
•  “5,hello”
•  “2,if”
•  .. and a million more lines
At the end of the 4th day. each one of you will give
your sheet completely filled to the Grouper
Map Reduce: A really simple
introduction
•  I will give you 10 papers. The first paper will be marked
1, the second paper will be marked 2, and so on, till 10
•  You collect the output from mappers and for each line in
the mapper’s sheet, if it says “1,”, your write the on
sheet 1, if it says “2, ”, you write it on sheet two
•  For example, if the first line of a mapper’s sheet says
“1,a”, you write “a” on sheet 1. if it says “2,if”, your
write “if” on sheet 2. If it says “5,hello”, you write hello
on sheet 5
Map Reduce: A really simple
introduction
So at the end of your work, the 10 sheets you have might look like
this
•  Sheet 1: a, a ,a , I, I , i, a, i, i, i…. millions more
•  Sheet 2: if, of, it, of, of, if, at, im, is,is, of, of … millions more
•  Sheet 3 :the, the, and, for, met, bet, the, the, and, … millions
more
•  ..
•  Sheet 10: ……
once you are done, you distribute, each sheet to one reducer. For
example sheet 1 goes to reducer 1, sheet 2 goes to reducer 2 and
so on.
Map Reduce: A really simple
introduction
•  Each one of you gets one sheet from the grouper. For each
sheet you count the number of words written on it and write it
in big bold letters on the back side of the paper.
•  For ex, if you are reducer 2 you get sheet 2 from the grouper
that looks like this:
“Sheet 2: if, of, it, of, of, if, at, im,
is,is, of, of …”
•  You count the number of words on that sheet, say the number
of words is 28838380044, You write it on the back side of the
paper , in big bold letters and give it to the Master
Map Reduce: A really simple
introduction
You essentially did map reduce. The greatest advantage
in your approach was this:
•  The mappers can work independently
•  The reducers can work independently
•  The grouper can work really fast, because, he din’t
have to do any counting of words, all the had to do
was to look at the first number and put that word in the
appropriate sheet
The process can be easily applied to other kinds of
problems
Map Reduce: formal definition
The Map and Reduce functions of
MapReduce are both defined with respect
to data structured in (key, value) pairs.
Map takes one pair of data with a type in
one data domain, and returns a list of pairs
in a different domain:
•  Map(k1 ,v1) → list(k2, v2)
Map Reduce: formal definition
The Map function is applied in parallel to every
pair in the input dataset
This produces a list of pairs for each call
After that, the MapReduce framework collects
all pairs with the same key from all lists and
groups them together, creating one group for
each key
Map Reduce: formal definition
The Reduce function is then applied in parallel to
each group, which in turn produces a collection of
values in the same domain:
•  Reduce(k2, list (v2)) → list(v3)
Each Reduce call typically produces either one value
v3 or an empty return, though one call is allowed to
return more than one value. The returns of all calls
are collected as the desired result list
MapReduce job example
package org.myorg;
import java.io.IOException;
…
public class WordCount {
public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
MapReduce job example
public static class Reduce extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
MapReduce job example
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
Machine Learning
Machine learning is a scientific discipline
that deals with the construction and study
of algorithms that can learn from data.
Such algorithms operate by building a
model based on inputs and using that to
make predictions or decisions, rather
than following only explicitly
programmed instructions
Machine Learning
Machine learning can be
considered a subfield of computer
science and statistics. It has strong
ties to artificial intelligence and
optimization, which deliver
methods, theory and application
domains to the field
Machine Learning
Example applications include
spam filtering, optical character
recognition (OCR), search engines
and computer vision. Machine
learning is sometimes conflated
with data mining
Machine Learning Examples
Machine Learning Examples
Machine Learning Tools
Apache Mahout is a project of the
Apache Software Foundation to produce
free implementations of distributed or
otherwise scalable machine learning
algorithms focused primarily in the areas
of collaborative filtering, clustering and
classification
Machine Learning Tools
Data Visualization
Studies show the brain
processes images 60,000x
faster than text. The final
step in your big data
analytics workflow, the big
data analytics visualization
is a visual representation of
the insights gained from
your analysis
Data Visualization Tools
Data Visualization Tools

Weitere ähnliche Inhalte

Was ist angesagt?

Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ SpotifyNikhil Tibrewal
 
Is There Room For Another Elephant In Tucson
Is There Room For Another Elephant In TucsonIs There Room For Another Elephant In Tucson
Is There Room For Another Elephant In TucsonAndy Lenards
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-ReduceBrendan Tierney
 
Geek camp
Geek campGeek camp
Geek campjdhok
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at SpotifyNeville Li
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
Recent IT Development and Women: Big Data and The Power of Women in Goryeo
 Recent IT Development and Women: Big Data and The Power of Women in Goryeo Recent IT Development and Women: Big Data and The Power of Women in Goryeo
Recent IT Development and Women: Big Data and The Power of Women in GoryeoJongwook Woo
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Senthil Kumar
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce Sina Ebrahimi
 
EDHREC @ Data Science MD
EDHREC @ Data Science MDEDHREC @ Data Science MD
EDHREC @ Data Science MDDonald Miner
 
Data Structures for Statistical Computing in Python
Data Structures for Statistical Computing in PythonData Structures for Statistical Computing in Python
Data Structures for Statistical Computing in PythonWes McKinney
 

Was ist angesagt? (20)

An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
 
Deep Dive - DynamoDB
Deep Dive - DynamoDBDeep Dive - DynamoDB
Deep Dive - DynamoDB
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
 
Is There Room For Another Elephant In Tucson
Is There Room For Another Elephant In TucsonIs There Room For Another Elephant In Tucson
Is There Room For Another Elephant In Tucson
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Mypreson 27
Mypreson 27Mypreson 27
Mypreson 27
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Geek camp
Geek campGeek camp
Geek camp
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Recent IT Development and Women: Big Data and The Power of Women in Goryeo
 Recent IT Development and Women: Big Data and The Power of Women in Goryeo Recent IT Development and Women: Big Data and The Power of Women in Goryeo
Recent IT Development and Women: Big Data and The Power of Women in Goryeo
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce
 
EDHREC @ Data Science MD
EDHREC @ Data Science MDEDHREC @ Data Science MD
EDHREC @ Data Science MD
 
Real-World NoSQL Schema Design
Real-World NoSQL Schema DesignReal-World NoSQL Schema Design
Real-World NoSQL Schema Design
 
Data Structures for Statistical Computing in Python
Data Structures for Statistical Computing in PythonData Structures for Statistical Computing in Python
Data Structures for Statistical Computing in Python
 
Neo4j vs giraph
Neo4j vs giraphNeo4j vs giraph
Neo4j vs giraph
 

Andere mochten auch

Diventare famosi con lo stack ELK - Alfonso Iannotta
Diventare famosi con lo stack ELK - Alfonso IannottaDiventare famosi con lo stack ELK - Alfonso Iannotta
Diventare famosi con lo stack ELK - Alfonso IannottaData Driven Innovation
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
The Amazing Space Adventure
The Amazing Space AdventureThe Amazing Space Adventure
The Amazing Space Adventurestudentswork
 
An Adventure in Space and Time: Using Student Interns in Ethnographic Research
An Adventure in Space and Time: Using Student Interns in Ethnographic ResearchAn Adventure in Space and Time: Using Student Interns in Ethnographic Research
An Adventure in Space and Time: Using Student Interns in Ethnographic ResearchDavid Clover
 
Next 50yrs in Space - The Final Frontier
Next 50yrs in Space - The Final FrontierNext 50yrs in Space - The Final Frontier
Next 50yrs in Space - The Final Frontierrubal_9
 
The Adventure of Space
The Adventure of SpaceThe Adventure of Space
The Adventure of SpaceLingua Net
 
Cosmic Adventure Episode 2.01 Time Travel
Cosmic Adventure Episode 2.01 Time TravelCosmic Adventure Episode 2.01 Time Travel
Cosmic Adventure Episode 2.01 Time TravelStephen Kwong
 
Fun Things to Do in Houston on Business
Fun Things to Do in Houston on BusinessFun Things to Do in Houston on Business
Fun Things to Do in Houston on BusinessReady SET Maids
 
Reach for the moon! Space and Space Travel Lesson Plan
Reach for the moon! Space and Space Travel Lesson PlanReach for the moon! Space and Space Travel Lesson Plan
Reach for the moon! Space and Space Travel Lesson Plandrunkenbutterfly
 
ERZIA_Space_Presentation 2015_rev4_Commercial In Confidence
ERZIA_Space_Presentation 2015_rev4_Commercial In ConfidenceERZIA_Space_Presentation 2015_rev4_Commercial In Confidence
ERZIA_Space_Presentation 2015_rev4_Commercial In ConfidenceLuis Garcia
 
Smau Torino 2016 - Codemotion
Smau Torino 2016 - CodemotionSmau Torino 2016 - Codemotion
Smau Torino 2016 - CodemotionSMAU
 
Machine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMachine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMario Cartia
 
Machine Learning Real Life Applications By Examples - Mario Cartia
Machine Learning Real Life Applications By Examples - Mario CartiaMachine Learning Real Life Applications By Examples - Mario Cartia
Machine Learning Real Life Applications By Examples - Mario CartiaData Driven Innovation
 
Space visit
Space visitSpace visit
Space visitjodi
 
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'AnnaBig Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'AnnaData Driven Innovation
 
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)Data Driven Innovation
 
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...Data Driven Innovation
 

Andere mochten auch (20)

Diventare famosi con lo stack ELK - Alfonso Iannotta
Diventare famosi con lo stack ELK - Alfonso IannottaDiventare famosi con lo stack ELK - Alfonso Iannotta
Diventare famosi con lo stack ELK - Alfonso Iannotta
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
The Amazing Space Adventure
The Amazing Space AdventureThe Amazing Space Adventure
The Amazing Space Adventure
 
An Adventure in Space and Time: Using Student Interns in Ethnographic Research
An Adventure in Space and Time: Using Student Interns in Ethnographic ResearchAn Adventure in Space and Time: Using Student Interns in Ethnographic Research
An Adventure in Space and Time: Using Student Interns in Ethnographic Research
 
Next 50yrs in Space - The Final Frontier
Next 50yrs in Space - The Final FrontierNext 50yrs in Space - The Final Frontier
Next 50yrs in Space - The Final Frontier
 
Ready SET Maids
Ready SET MaidsReady SET Maids
Ready SET Maids
 
The Adventure of Space
The Adventure of SpaceThe Adventure of Space
The Adventure of Space
 
My Space Journey
My Space JourneyMy Space Journey
My Space Journey
 
Cosmic Adventure Episode 2.01 Time Travel
Cosmic Adventure Episode 2.01 Time TravelCosmic Adventure Episode 2.01 Time Travel
Cosmic Adventure Episode 2.01 Time Travel
 
Space travel
Space travelSpace travel
Space travel
 
Fun Things to Do in Houston on Business
Fun Things to Do in Houston on BusinessFun Things to Do in Houston on Business
Fun Things to Do in Houston on Business
 
Reach for the moon! Space and Space Travel Lesson Plan
Reach for the moon! Space and Space Travel Lesson PlanReach for the moon! Space and Space Travel Lesson Plan
Reach for the moon! Space and Space Travel Lesson Plan
 
ERZIA_Space_Presentation 2015_rev4_Commercial In Confidence
ERZIA_Space_Presentation 2015_rev4_Commercial In ConfidenceERZIA_Space_Presentation 2015_rev4_Commercial In Confidence
ERZIA_Space_Presentation 2015_rev4_Commercial In Confidence
 
Smau Torino 2016 - Codemotion
Smau Torino 2016 - CodemotionSmau Torino 2016 - Codemotion
Smau Torino 2016 - Codemotion
 
Machine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMachine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By Examples
 
Machine Learning Real Life Applications By Examples - Mario Cartia
Machine Learning Real Life Applications By Examples - Mario CartiaMachine Learning Real Life Applications By Examples - Mario Cartia
Machine Learning Real Life Applications By Examples - Mario Cartia
 
Space visit
Space visitSpace visit
Space visit
 
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'AnnaBig Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
Big Data and Data Science @ BNL - D. Morgagni & L. Dell'Anna
 
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
Data Driven UX: Come lo facciamo? C. Frinolli & N. Molchanova (Nois3)
 
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
Polyglot Persistence e Big Data: tra innovazione e difficoltà su casi reali -...
 

Ähnlich wie Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015

L19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .pptL19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .pptMaruthiPrasad96
 
MapReduce and the New Software Stack
MapReduce and the New Software StackMapReduce and the New Software Stack
MapReduce and the New Software StackMaruf Aytekin
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e práticaPET Computação
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comsoftwarequery
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoopColin Su
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfTSANKARARAO
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Lecture 06 - CS-5040 - modern database systems
Lecture 06  - CS-5040 - modern database systemsLecture 06  - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systemsMichael Mathioudakis
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on HadoopVivian S. Zhang
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"Portland R User Group
 

Ähnlich wie Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015 (20)

ENAR short course
ENAR short courseENAR short course
ENAR short course
 
L19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .pptL19CloudMapReduce introduction for cloud computing .ppt
L19CloudMapReduce introduction for cloud computing .ppt
 
IOE MODULE 6.pptx
IOE MODULE 6.pptxIOE MODULE 6.pptx
IOE MODULE 6.pptx
 
MapReduce and the New Software Stack
MapReduce and the New Software StackMapReduce and the New Software Stack
MapReduce and the New Software Stack
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e prática
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
UNit4.pdf
UNit4.pdfUNit4.pdf
UNit4.pdf
 
Lecture 06 - CS-5040 - modern database systems
Lecture 06  - CS-5040 - modern database systemsLecture 06  - CS-5040 - modern database systems
Lecture 06 - CS-5040 - modern database systems
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Streaming Python on Hadoop
Streaming Python on HadoopStreaming Python on Hadoop
Streaming Python on Hadoop
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"
 

Mehr von Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

Mehr von Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Kürzlich hochgeladen

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Kürzlich hochgeladen (20)

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Big Data, a space adventure - Mario Cartia - Codemotion Rome 2015

  • 1. Hands On Big Data: Getting Started With NoSQL And Hadoop Mario Cartia mario@big-data.ninja
  • 2.
  • 3.
  • 4. Big Data Facts •  Google processes about 20Pb (E+15 bytes) of data each day •  About 5Eb (Exabytes, E+18 bytes) of data in the world. 90% generated over last 2 years •  Wearable computing and IoT…
  • 5.
  • 6. Big Data: 3V Model •  Big Data it’s not only about volume – Volume >= Petabytes, not Gigabytes – Variety Structured and unstructured data – Velocity Real-time or near real-time
  • 10. Big Data Success Stories Amazon.com, a pioneer of targeted advertising became a big data user when Greg Linden, one of its software engineers realized the potential of book reviewing from the average results of their in-house review project When Amazon compared the results of the computer sales against the in house reviews, the results were much better for the data- derived material, and revolutionized e- commerce
  • 11. Big Data Success Stories Google Flu Trends is a web service operated by Google. It provides estimates of influenza activity for more than 25 countries. By aggregating Google search queries, it attempts to make accurate predictions about flu activity In the 2009 flu pandemic Google Flu Trends tracked information about flu in the United States. In February 2010, the CDC identified influenza cases spiking in the mid-Atlantic region of the United States. However, Google’s data of search queries about flu symptoms was able to show that same spike two weeks prior to the CDC report being released
  • 12. Big Data Success Stories reCAPTCHA is a user-dialogue system originally developed by Luis von Ahn, Ben Maurer, Colin McMillen, David Abraham and Manuel Blum at Carnegie Mellon University's main Pittsburgh campus, and acquired by Google in September 2009 The reCAPTCHA service supplies subscribing websites with images of words that optical character recognition (OCR) software has been unable to read. The subscribing websites present these images for humans to decipher as CAPTCHA words, as part of their normal validation procedures. They then return the results to the reCAPTCHA service, which sends the results to the digitization projects Secondary data usage
  • 13. Big Data Techniques Statistics Data Warehouse Data Visualization Data Mining Prediction Machine Learning Advanced Analytics Correlation Analysis Business Intelligence
  • 14. The Traditional Approach ETL: Extract, Transform, Load •  Extracts data from outside sources •  Transforms it to fit operational needs, which can include quality levels •  Loads it into the end target (database, operational data store, data mart or data warehouse) Does it fit “big data” needs?
  • 15.
  • 16. Hadoop Basics Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data on clusters of commodity hardware
  • 17. Hadoop Basics Hadoop was created by Doug Cutting and Mike Cafarella in 2005. Cutting, who was working at Yahoo! at the time named it after his son's toy elephant
  • 18. Hadoop 1 vs. Hadoop 2
  • 22.
  • 23. From RDBMS to NoSQL A NoSQL (often interpreted as Not Only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases
  • 24. From RDBMS to NoSQL Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. The data structure (e.g. key-value, graph, or document) differs from the RDBMS, and therefore some operations are faster in NoSQL and some in RDBMS
  • 25.
  • 26. NoSQL Approaches Most popular NoSQL database types •  Document (MongoDB, CouchDB, Clusterpoint, Couchbase, MarkLogic, etc.) •  Key-value (Redis, MemcacheDB, Dynamo, FoundationDB, Riak, FairCom c-treeACE, Aerospike, etc.) •  Column (Accumulo, Cassandra, Druid, HBase, Vertica, etc.) •  Graph (Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog, etc.)
  • 28. NoSQL How To Choose(Brewer) CAP theorem (Brewer)
  • 29.
  • 32.
  • 33.
  • 34. MapReduce Model •  MapReduce is a programming model, and an associated implementation, for processing and generating large data sets with a parallel, distributed algorithm on a cluster •  The model is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as in their original forms
  • 36. MapReduce Overview •  Map step: Each worker node applies the map() function to the local data, and writes the output to a temporary storage. A master node orchestrates that for redundant copies of input data, only one is processed •  Shuffle step: Worker nodes redistribute data based on the output keys (produced by the map() function), such that all data belonging to one key is located on the same worker node •  Reduce step: Worker nodes now process each group of output data, per key, in parallel
  • 37.
  • 38. Map Reduce: A really simple introduction Dear <Your Name>, As you know we are building the blogging platform blogger2.com, I need some statistics. I need to find out, Acorss all blogs ever wrriten on blogger.com, how many times 1 character words occur(like 'a', 'I'), How many times two character words occur (like 'be', 'is').. and so on till how many times do ten character words occur. I know its a really big job. So, I will assign, all 50,000 employees working in our company to work with you on this for a week. I am going on a vacation for a week, and its really important that I've this when I return. Good luck. regds, The CEO (src: http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/)
  • 39. Map Reduce: A really simple introduction The next day, You stand with a mike on the dias before 50,000 and proclaim. For a week, you will all be divided into many groups: •  The Mappers (tens of Thousands of people will be in this group) •  The Grouper (Assume just one guy for now) •  The Reducers ( Around 10 of em.) and.. •  The Master (That’s you)
  • 40. Map Reduce: A really simple introduction •  Each mapper will get a set of 50 blog urls and really Big sheet of paper. Each one of you need to go to each of that url. and for each word in those blogs, write one line on the paper. The format of that line should be the number of characters in the word, then a commna, and then the actual word •  For example, if you find the word “a”, you write “1,a”, in a new line in your paper. since the word “a” has only 1 character. If you find the word “hello”, you write “5,hello” on the new line
  • 41. Map Reduce: A really simple introduction Each take 4 days. So, After 4 days, your sheet might look like this •  “1,a” •  “5,hello” •  “2,if” •  .. and a million more lines At the end of the 4th day. each one of you will give your sheet completely filled to the Grouper
  • 42. Map Reduce: A really simple introduction •  I will give you 10 papers. The first paper will be marked 1, the second paper will be marked 2, and so on, till 10 •  You collect the output from mappers and for each line in the mapper’s sheet, if it says “1,”, your write the on sheet 1, if it says “2, ”, you write it on sheet two •  For example, if the first line of a mapper’s sheet says “1,a”, you write “a” on sheet 1. if it says “2,if”, your write “if” on sheet 2. If it says “5,hello”, you write hello on sheet 5
  • 43. Map Reduce: A really simple introduction So at the end of your work, the 10 sheets you have might look like this •  Sheet 1: a, a ,a , I, I , i, a, i, i, i…. millions more •  Sheet 2: if, of, it, of, of, if, at, im, is,is, of, of … millions more •  Sheet 3 :the, the, and, for, met, bet, the, the, and, … millions more •  .. •  Sheet 10: …… once you are done, you distribute, each sheet to one reducer. For example sheet 1 goes to reducer 1, sheet 2 goes to reducer 2 and so on.
  • 44. Map Reduce: A really simple introduction •  Each one of you gets one sheet from the grouper. For each sheet you count the number of words written on it and write it in big bold letters on the back side of the paper. •  For ex, if you are reducer 2 you get sheet 2 from the grouper that looks like this: “Sheet 2: if, of, it, of, of, if, at, im, is,is, of, of …” •  You count the number of words on that sheet, say the number of words is 28838380044, You write it on the back side of the paper , in big bold letters and give it to the Master
  • 45. Map Reduce: A really simple introduction You essentially did map reduce. The greatest advantage in your approach was this: •  The mappers can work independently •  The reducers can work independently •  The grouper can work really fast, because, he din’t have to do any counting of words, all the had to do was to look at the first number and put that word in the appropriate sheet The process can be easily applied to other kinds of problems
  • 46. Map Reduce: formal definition The Map and Reduce functions of MapReduce are both defined with respect to data structured in (key, value) pairs. Map takes one pair of data with a type in one data domain, and returns a list of pairs in a different domain: •  Map(k1 ,v1) → list(k2, v2)
  • 47. Map Reduce: formal definition The Map function is applied in parallel to every pair in the input dataset This produces a list of pairs for each call After that, the MapReduce framework collects all pairs with the same key from all lists and groups them together, creating one group for each key
  • 48. Map Reduce: formal definition The Reduce function is then applied in parallel to each group, which in turn produces a collection of values in the same domain: •  Reduce(k2, list (v2)) → list(v3) Each Reduce call typically produces either one value v3 or an empty return, though one call is allowed to return more than one value. The returns of all calls are collected as the desired result list
  • 49. MapReduce job example package org.myorg; import java.io.IOException; … public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } }
  • 50. MapReduce job example public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
  • 51. MapReduce job example public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • 52. Machine Learning Machine learning is a scientific discipline that deals with the construction and study of algorithms that can learn from data. Such algorithms operate by building a model based on inputs and using that to make predictions or decisions, rather than following only explicitly programmed instructions
  • 53. Machine Learning Machine learning can be considered a subfield of computer science and statistics. It has strong ties to artificial intelligence and optimization, which deliver methods, theory and application domains to the field
  • 54. Machine Learning Example applications include spam filtering, optical character recognition (OCR), search engines and computer vision. Machine learning is sometimes conflated with data mining
  • 57. Machine Learning Tools Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification
  • 59. Data Visualization Studies show the brain processes images 60,000x faster than text. The final step in your big data analytics workflow, the big data analytics visualization is a visual representation of the insights gained from your analysis