Intro to Big Data

Zohar Elkayam
CTO, Brillix
Zohar@Brillix.co.il
Twitter: @realmgic
Introduction to Big Data

Agenda
• What is big Data and the 3-Vs
• Introduction to Hadoop
• Who Handles Big Data and Data Science
• NoSQL
http://brillix.co.il2

Who am I?
• Zohar Elkayam, CTO at Brillix
• Oracle ACE Associate
• DBA, team leader, instructor and senior consultant for over 16 years
• Editor (and manager) of ilDBA – Israel Database Community
• Blogger – www.realdbamagic.com

What is Big Data?

So, What is Big Data?
• When the data is too big or moves too fast to handle in a
sensible amount of time.
• When the data doesn’t fit conventional database structure.
• What the solution becomes part of the problem.

Big Problems with Big Data
• Unstructured
• Unprocessed
• Un-aggregated
• Un-filtered
• Repetitive
• Low quality
• And generally messy
• Oh, and there is a lot of it

MEDIA/
ENTERTAINMENT
Viewers /
advertising
effectiveness
COMMUNICATIONS
Location-based
advertising
EDUCATION
&
RESEARCH
Experiment
sensor
analysis
CONSUMER
PACKAGED
GOODS
Sentiment
analysis of what’s
hot, problems
HEALTH CARE
Patient sensors,
monitoring,
EHRs
Quality of care
LIFE
SCIENCES
Clinical trials
Genomics
HIGH
TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty analysis
OIL & GAS
Drilling
exploration
sensor
analysis
FINANCIAL
SERVICES
Risk & portfolio
analysis
New products
AUTOMOTIVE
Auto sensors
reporting
location,
problems
RETAIL
Consumer
sentiment
Optimized
marketing
LAW
ENFORCEMENT
& DEFENSE
Threat analysis -
social media
monitoring, photo
analysis
TRAVEL &
TRANSPORTATION
Sensor analysis for
optimal traffic flows
Customer sentiment
UTILITIES
Smart
Meter
analysis for
network
capacity,
Sample of Big Data Use Cases Today
ON-LINE
SERVICES /
SOCIAL
MEDIA
People &
career
matching
Web-site
optimization

Most Requested Uses of Big Data
• Log Analytics & Storage
• Smart Grid / Smarter Utilities
• RFID Tracking & Analytics
• Fraud / Risk Management & Modeling
• 360° View of the Customer
• Warehouse Extension
• Email / Call Center Transcript Analysis
• Call Detail Record Analysis

The Challenge

Big Data: Challenge to Value
Business
Value
 High Variety
 High Volume
 High Velocity
Today
 Deep Analytics
 High Agility
 Massive Scalability
 Real Time
Tomorrow
Challenges

Volume
• Big data come in one size: Big.
Size is measured in terabytes, petabytes and even exabytes
and zeta bytes.
• The storing and handling of the data becomes an issue.
• Producing value out of the data in a reasonable time is also
an issue.

Velocity
• The speed in which the data is being generated and collected.
• Streaming data and large volume data movement .
• High velocity of data capture – requires rapid ingestion.
• What happens on downtime (the backlog problem).

Variety
• Big Data extends beyond structured data: including semi-
structured and unstructured information: logs, text, audio and
videos.
• Wide variety of rapidly evolving data types requires highly
flexible stores and handling.

Big Data is ANY data
Unstructured, Semi-Structure and Structured
• Some has fixed structure
• Some is “bring own structure”
• We want to find value in all of it

Structured & Un-Structured
Un-Structured Structured
Objects Tables
Flexible Columns and Rows
Structure Unknown Predefined Structure
Textual and Binary Mostly Textual

Handling Big Data

Big Data in Practice
• Big data is big: technological infrastructure solutions needed.
• Big data is messy: data sources must be cleaned before use.
• Big data is complicated: need developers and system admins
to manage intake of data.

Big Data in Practice (cont.)
• Data must be broken out of silos in order to be mined,
analyzed and transformed into value.
• The organization must learn how to communicate and
interpret the results of analysis.

Infrastructure Challenges
• Infrastructure that is built for:
• Large-scale
• Distributed
• Data-intensive jobs that spread the problem across
clusters of server nodes

Infrastructure Challenges – Cont.
• Storage:
• Efficient and cost-effective enough to capture and store terabytes, if
not petabytes, of data
• With intelligent capabilities to reduce your data footprint such as:
• Data compression
• Automatic data tiering
• Data deduplication

Infrastructure Challenges – Cont.
• Network infrastructure that can quickly import large data sets
and then replicate it to various nodes for processing
• Security capabilities that protect highly-distributed
infrastructure and data

Intro to Hadoop

Apache Hadoop
• Open source project run by Apache (2006).
• Hadoop brings the ability to cheaply process large amounts
of data, regardless of its structure.
• Apache Hadoop has been the driving force behind the growth
of the big data Industry.

Key points
• An open-source framework that uses a simple programming model
to enable distributed processing of large data sets on clusters of
computers.
• The complete technology stack includes
• common utilities
• a distributed file system
• analytics and data storage platforms
• an application layer that manages distributed processing, parallel
computation, workflow, and configuration management
• Cost-effective for handling large unstructured data sets than
conventional approaches, and it offers massive scalability and
speed

Why use Hadoop?
Cost Flexibility
Near linear
performance up
to 1000s of nodes
Leverages
commodity HW &
open source SW
Versatility with
data, analytics &
operation
Scalability

Really, Why use Hadoop?
• Need to process Multi Petabyte Datasets
• Expensive to build reliability in each application.
• Nodes fail every day
• Failure is expected, rather than exceptional.
• The number of nodes in a cluster is not constant.
• Need common infrastructure
• Efficient, reliable, Open Source Apache License
• The above goals are same as Condor, but
• Workloads are IO bound and not CPU bound

Hadoop Benefits
• Reliable solution based on unreliable hardware
• Designed for large files
• Load data first, structure later
• Designed to maximize throughput of large scans
• Designed to leverage parallelism
• Designed to scale
• Flexible development platform
• Solution Ecosystem

Hadoop Limitations
• Hadoop is scalable but not fast
• Some assembly required
• Batteries not included
• Instrumentation not included either
• DIY mindset (remember Linux/MySQL?)
• On the larger scale – Hadoop is not cheap (but still cheaper
than using old solutions)

Example Comparison: RDBMS vs. Hadoop
Typical Traditional RDBMS Hadoop
Data Size Gigabytes Petabytes
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Scaling Nonlinear Linear
Query Response
Time
Can be near immediate Has latency (due to batch processing)

Relational Database
Best Used For:
 Interactive OLAP Analytics (<1sec)
 Multistep Transactions
 100% SQL Compliance
Best Used For:
 Structured or Not (Flexibility)
 Scalability of Storage/Compute
 Complex Data Processing
 Cheaper compared to RDBMS
Best when used together
Hadoop And Relational Database

Hadoop Components

Hadoop Main Components
• HDFS: Hadoop Distributed File System – distributed file
system that runs in a clustered environment.
• MapReduce – programming paradigm for running processes
over a clustered environments.

HDFS is...
• A distributed file system
• Redundant storage
• Designed to reliably store data using commodity hardware
• Designed to expect hardware failures
• Intended for large files
• Designed for batch inserts
• The Hadoop Distributed File System
39

HDFS Node Types
HDFS has three types of Nodes
• Namenode (MasterNode)
• Distribute files in the cluster
• Responsible for the replication between
the datanodes and for file blocks location
• Datanodes
• Responsible for actual file store
• Serving data from files(data) to client
• BackupNode (version 0.23 and up)
• It’s a backup of the NameNode

Typical implementation
• Nodes are commodity PCs
• 30-40 nodes per rack
• Uplink from racks is 3-4 gigabit
• Rack-internal is 1 gigabit

MapReduce is...
• A programming model for expressing distributed
computations at a massive scale
• An execution framework for organizing and performing such
computations
• An open-source implementation called Hadoop
42

MapReduce
Example: $HADOOP_HOME/bin/hadoop jar @HADOOP_HOME/hadoop-streaming.jar
- input myInputDirs
- output myOutputDir
- mapper /bin/cat
- reducer /bin/wc
• Runs programs (jobs) across many computers
• Protects against single server failure by re-run failed steps.
• MR jobs can be written in Java, C, Phyton, Ruby and etc.
• Users only write Map and Reduce functions
• MAP - Takes a large problem and divides into sub problems.
Performs the same function on all subsystems
• REDUCE - Combine the output from all sub-problems

Typical large-data problem
• Iterate over a large number of records
• Extract something of interest from each
• Shuffle and sort intermediate results
• Aggregate intermediate results
• Generate final output
44
Map
Reduce
(Dean and Ghemawat, OSDI 2004)

MapReduce paradigm
• Implement two functions:
• Map(k1, v1) -> list(k2, v2)
• Reduce(k2, list(v2)) -> list(v3)
• Framework handles everything else*
• Value with same key go to same reducer
45

MapReduce - word count example
function map(String name, String document):
for each word w in document:
emit(w, 1)
function reduce(String word, Iterator partialCounts):
totalCount = 0
for each count in partialCounts:
totalCount += count
emit(word, totalCount)
47

MapReduce Word Count Process

MapReduce is good for...
• Embarrassingly parallel algorithms
• Summing, grouping, filtering, joining
• Off-line batch jobs on massive data sets
• Analyzing an entire large dataset
49

MapReduce is ok for...
• Iterative jobs (i.e., graph algorithms)
• Each iteration must read/write data to disk
• IO and latency cost of an iteration is high
50

MapReduce is NOT good for...
• Jobs that need shared state/coordination
• Tasks are shared-nothing
• Shared-state requires scalable state store
• Low-latency jobs
• Jobs on small datasets
• Finding individual records
51

Improving Hadoop

Improving Hadoop
Core Hadoop is complicated so some tools were added to make
things easier so tools were created to make things easier.
Improving programmability:
• Pig: Programming language that simplifies Hadoop actions: loading,
transforming and sorting data
• Hive: enables Hadoop to operate as data warehouse using SQL-like
syntax.

Pig
• Data flow processing
• Uses Pig Latin query language
• Highly parallel in order to distribute data processing across many servers
• Combining multiple data sources (Files, Hbase, Hive)
• Example:

Hive
• Built on the MapReduce framework so it generates MR jobs behind it
• Hive is a data warehouse that enables easy data summarization and ad-hoc queries via
an SQL-like interface for large datasets stored in HDFS/Hbase.
• Have partitioning and partition swapping
• Good for random sampling
• Example: CREATE EXTERNAL TABLE vs_hdfs (
site_id string,
session_id string,
time_stamp bigint,
visitor_id bigint,
row_unit string,
evts string,
biz string,
plne string,
dims string)
partitioned by (site string,day string)
ROW FORMAT DELIMITED FIELDS TERMINATED
BY '001'
STORED AS SEQUENCEFILE LOCATION
'/home/data/';
select session_id,
get_json_object(concat(tttt, "}"), '$.BY'),
get_json_object(concat(tttt, "}"), '$.TEXT') from
(
select session_id,concat("{",
regexp_replace(event, "[{|}]", ""), "}") tttt
from (
select session_id,get_json_object(plne,
'$.PLine.evts[*]') pln
from vs_hdfs_v1 where site='6964264'
and day='20120201' and plne!='{}' limit 10 ) t
LATERAL VIEW explode(split(pln, "},{"))
adTable AS event )t2

HDFS
Map/Reduced
Hive PIG
Yahoo
persistence
Yahoo
scripting
Facebook
SQL Query
Google
Parallel
HADOOP Technology STACK

Improving Hadoop (cont.)
For improving access:
• HBase: column oriented database that runs on HDFS.
• Sqoop: a tool designed to import data from relational
databases (HDFS or Hive).

Hbase
What is Hbase and why should you use HBase?
• Huge volumes of randomly accessed data.
• There is no restrictions on column numbers for rows it’s dynamic.
• Consider HBase when you’re loading data by key, searching data by key (or range),
serving data by key, querying data by key or when storing data by row that doesn’t
conform well to a schema.
Hbase dont’s?
• It doesn’t talk SQL, have an optimizer, support in transactions or joins. If you don’t use
any of these in your database application then HBase could very well be the perfect fit.
Example:
create ‘blogposts’, ‘post’, ‘image’ ---create table
put ‘blogposts’, ‘id1′, ‘post:title’, ‘Hello World’ ---insert value
put ‘blogposts’, ‘id1′, ‘post:body’, ‘This is a blog post’ ---insert value
put ‘blogposts’, ‘id1′, ‘image:header’, ‘image1.jpg’ ---insert value
get ‘blogposts’, ‘id1′ ---select records

Sqoop
What is Sqoop?
• It’s a command line tool for moving data between HDFS and relational database systems.
• You can download drivers for Sqoop from Microsoft and
• Import Data/Query results from SQL Server to Hadoop.
• Export Data from Hadoop to SQL Server.
• It’s like BCP
• Example:
$bin/sqoop import --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch'
--table lineitem --hive-import
$bin/sqoop export --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' --table lineitem --export-dir
/data/lineitemData

Improving Hadoop (cont.)
• For improving coordination: Zookeeper
• For improving scheduling/orchestration: Oozie
• For Improving UI: Hue
• Machine learning: Mahout

63
Hadoop cluster
Cluster of machine running Hadoop at Yahoo! (credit: Yahoo!)

Hadoop In The Real World

Big Data Market Survey
• 3 major groups for rolling your own Big Data:
• Integrated Hadoop providers.
• Analytical database with Hadoop connectivity.
• Hadoop-centered companies.
• Big Data on the Cloud.

Integrated Hadoop Providers
• IBM InfoSphere
Database
DB2
Deployment options
Software (Enterprise Linux), Cloud
Hadoop
Bundled distribution (InfoSphere BigInsights); Hive,
Oozie, Pig, Zookeeper, Avro, Flume, HBase, Lucene
NoSQL
HBase

• Microsoft
Database
SQL Server
Deployment options
Software (Windows Server), Cloud (Windows Azure
Cloud)
Hadoop
Bundled distribution (Big Data Solution); Hive, Pig
NoSQL
None

• Oracle
Database
None
Deployment options
Appliance (Oracle Big Data Appliance)
Hadoop
Bundled distribution (Cloudera’s Distribution including
Apache Hadoop); Hive, Oozie, Pig, Zookeeper, Avro,
Flume, HBase, Sqoop, Mahout, Whirr
NoSQL
Oracle NoSQL Database

• Pivotal Greenplum
Database`
GreenPlum Database
Deployment options
Appliance (Modular Data Computing appliance), Software
(Enterprise Linux), Cloud (Cloud Foundry)
Hadoop
Bundled distribution (Pivotal HD); Hive, Pig, Zookeeper,
HBase
NoSQL
HBase

Hadoop Centered Companies
• Cloudera – longest-established of Hadoop distribution.
• Hortonworks – major contributor to the Hadoop code and core
components.
• MapR.

Big Data and Cloud
• Some Big Data solution can be provided using IaaS:
Infrastructure as a service.
• Private clouds can be constructed using Hadoop orchestration
tools.
• Public clouds provided by Rockspace or Amazon EC2 can be
use to start an Hadoop cluster.

Big Data and Cloud (cont.)
• PaaS: Platform as a Service can be used to remove the need to
configure or scale things.
• The major PaaS Providers are Amazon, Google and Microsoft.

PaaS Services: Amazon
• Amazon:
• Elastic Map Reduce (EMR): MapReduce programs submitted to a
cluster managed by Amazon. Good for EC2/S3 combinations.
• DynamoDB: NoSQL database provided by Amazon to replace HBase.

PaaS Services: Google
• Google:
• BigQuery: analytical database suitable for interactive analysis over
datasets of the order of 1TB.
• Prediction API: machine learning platform for classification and
sentiment analysis be done with their tools on customers data.

PaaS Services: Microsoft
• Microsoft:
• Windows Azure: a cloud computing platform and infrastructure that
can be used as PasS and as IaaS.

Who Handles Big Data
… and how?

Big Data Readiness
• The R&D Prototype Stage
• Skills needed:
• Distributed data deployment (e.g. Hadoop)
• Python or Java programming with MapReduce
• Statistical analysis (e.g. R)
• Data integration
• Ability to formulate business hypotheses
• Ability to convey business value of Big Data

Data Science
• A discipline that combines math, statistics, programming and
scientific instinct with the goal of extracting meaning from
data.
• Data scientists combine technical expertise curiosity,
storytelling and cleverness to find and deliver the signal in the
noise.

The Rise of the Data Scientist
• Data scientists are responsible for
• modeling complex business problems
• discovering business insights
• identifying opportunities.
• Demand is high for people who can help make sense of the
massive streams of digital information pouring into
organizations

Big Data Scientist
• Industry Expertise
• AnalyticsSkills
Big Data Engineers
• Hadoop/Java
• Non-Relational DB
Agility and Focus on Value
New Roles and Skills

Predictive Analytics
• Predictive analytics looks into the future to provide insight into
what will happen and includes what-if scenarios and risk
assessment. It can be used for
• Forecasting
• hypothesis testing
• risk modeling
• propensity modeling

Prescriptive analytics
• Prescriptive analytics is focused on understanding what would
happen based on different alternatives and scenarios, and then
choosing best options, and optimizing what’s ahead. Use
cases include
• Customer cross-channel optimization
• best-action-related offers
• portfolio and business optimization
• risk management

How Predictive Analytics Works
• Traditional BI tools use a deductive approach to data, which
assumes some understanding of existing patterns and
relationships.
• An analytics model approaches the data based on this
knowledge.
• For obvious reasons, deductive methods work well with
structured data

Inductive approach
• An inductive approach makes no presumptions of patterns or
relationships and is more about data discovery. Predictive
analytics applies inductive reasoning to big data using
sophisticated quantitative methods such as
• machine learning
• neural networks
• Robotics
• computational mathematics
• artificial intelligence
• Explore all the data and to discover interrelationships and
patterns

Inductive approach – Cont.
• Inductive methods use algorithms to perform complex
calculations specifically designed to run against highly varied
or large volumes of data
• The result of applying these techniques to a real-world
business problem is a predictive model
• The ability to know what algorithms and data to use to test and
create the predictive model is part of the science and art of
predictive analytics

Share nothing vs. Share everything
Share nothing Share everything
Many processing engines Many Servers
Data is spread on many nodes Data is located on a single storage
Joins are problematic Efficient Joins
Very Scalable Limited Scalability

Big Data and NoSQL

The Challenge
• We want scalable, durable, high volume, high velocity,
distributed data storage that can handle non-structured data
and that will fit our specific need
• RDBMS is too generic and doesn’t cut it any more – it can do
the job but it is not cost effective to our usages
90

The Solution: NoSQL
• Let’s take some parts of the standard RDBMS out to and
design the solution to our specific uses
• NoSQL databases have been around for ages under different
names/solutions
91

The NOSQL Movement
• NOSQL is not a technology – it’s a concept.
• We need high performance, scale out abilities or an agile
structure.
• We are now willing to sacrifice our sacred cows: consistency,
transactions.
• Over 150 different brands and solutions
(http://nosql-database.org/).

NoSQL or NOSQL
• NoSQL is not No to SQL
• NoSQL is not Never SQL
• NOSQL = Not Only SQL

Why NoSQL?
• Some applications need very few database features, but need
high scale.
• Desire to avoid data/schema pre-design altogether for simple
applications.
• Need for a low-latency, low-overhead API to access data.
• Simplicity -- do not need fancy indexing – just fast lookup by
primary key.

Why NoSQL? (cont.)
• Developer friendly, DBAs not needed (?).
• Schema-less.
• Agile: non-structured (or semi-structured).
• In Memory.
• No (or loose) Transactions.
• No joins.

Is NoSQL a RDMS Replacement?
NO
97
Well... Sometimes it does…

RDBMS vs. NoSQL
Rationale for choosing a persistent store:
98
Relational Architecture NoSQL Architecture
High value, high density, complex
Data
Low value, low density, simple data
Complex data relationships Very simple relationships
Schema-centric Schema-free, unstructured or
semistructured Data
Designed to scale up & out Distributed storage and processing
Lots of general purpose
features/functionality
Stripped down, special purpose
data store
High overhead ($ per operation) Low overhead ($ per operation)

Scalability and Consistency

Scalability
• NoSQL is sometimes very easy to scale out
• Most have dynamic data partitioning and easy data distribution
• But distributed system always come with a price: The CAP
Theorem and impact on ACID transactions
100

ACID Transactions
Most DBMS are built with ACID transactions in mind:
• Atomicity: All or nothing, performs write operations as a single
transaction
• Consistency: Any transaction will take the DB from one
consistent state to another with no broken constraints,
ensures replicas are identical on different nodes
• Isolation: Other operations cannot access data that has been
modified during a transaction that has not been completed yet
• Durability: Ability to recover the committed transaction
updates against any kind of system failure (transaction log)
101

ACID Transactions (cont.)
• ACID is usually implemented by a locking mechanism/manager
• Distributed systems central locking can be a bottleneck in that
system
• Most NoSQL does not use/limit the ACID transactions and
replaces it with something else…
102

CAP Theorem
• The CAP theorem states that in a distributed/partitioned
application, you can only pick two of the following
three characteristics:
• Consistency.
• Availability.
• Partition Tolerance.

CAP in Practice

NoSQL BASE
• NoSQL usually provide BASE characteristics instead of ACID.
BASE stands for:
• Basically Available
• Soft State
• Eventual Consistency
• It means that when an update is made in one place, the other
partitions will see it over time - there might be an inconsistency
window
• read and write operations complete more quickly, lowering
latency

Types of NoSQL

NoSQL Taxonomy
Type Examples
Key-Value Store
Document Store
Column Store
Graph Store

SQL comfort zone
size
Complex
Typical
RDBMS
Key
Value
Column
Store
Graph
DATABASE
Document
Database
Performance
NoSQL Map

Key Value Store
• Distributed hash tables.
• Very fast to get a single value.
• Examples:
• Amazon DynamoDB
• Berkeley DB
• Redis
• Riak
• Cassandra

Document Store
• Similar to Key/Value, but value is a document.
• JSON or something similar, flexible schema
• Agile technology.
• Examples:
• MongoDB
• CouchDB
• CouchBase

Column Store
• One key, multiple attributes.
• Hybrid row/column.
• Examples:
• Google BigTable
• Hbase
• Amazon’s SimpleDB
• Cassandra

How Records are Organized?
• This is a logical table in RDBMS systems
• Its physical organization is just like the logical one: column by
column, row by row
Row 1
Row 2
Row 3
Row 4
Col 1 Col 2 Col 3 Col 4

Query Data
• When we query data, records are read at the
order they are organized in the physical structure
• Even when we query a single
column, we still need to read the
entire table and extract the column
Row 1
Row 2
Row 3
Row 4
Col 1 Col 2 Col 3 Col 4
Select Col2
From MyTable
Select *
From MyTable

How Does Column Store Save Data
Organization in row store Organization in column store

Graph Store
• Inspired by Graph Theory.
• Data model: Nodes, relationships, properties on both.
• Relational Database have very hard time to represent a graph
in the Database.
• Example:
• Neo4j
• InfiniteGraph
• RDF

• An abstract representation of a set of objects where some
pairs are connected by links.
• Object (Vertex, Node) – can have attributes like name and
value
• Link (Edge, Arc, Relationship) – can have attributes like type
and name or date
What is Graph
NODE
Edge

Graph Types
Undirected Graph
Directed Graph
Pseudo Graph
Multi Graph
NODE
Edge
NODE
NODE
Edge
NODE
NODE
NODE NODE

More Graph Types
Weighted Graph
Labeled Graph
Property Graph
NODE
10
NODE
NODE
Like
NODE
NODE NODE
friend, date 2013
Name:yosi,
Age:40
Name:ami,
Age:30

Relationships
ID:1
TYPE:F
NAME:alice
ID:2
TYPE:M
NAME:bob
ID:1
TYPE:G
NAME:NoS
QL
ID:1
TYPE:F
NAME:dafn
a
TYPE: member
Since:2012

Conclusion
• Big Data is one of the hottest buzzwords in last few years – we
should all know what it’s all about
• DBAs are often called upon big data problems – today DBAs
needs to know what to ask to provide good solutions even if
it’s not a database related issue
• NoSQL doesn’t have to be Big Data solutions but Big Data
often use NoSQL solutions

Thank You
Zohar Elkayam
Brillix
Zohar@Brillix.co.il
www.realdbamagic.com

Intro to Big Data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Intro to Big Data

Ähnlich wie Intro to Big Data (20)

Mehr von Zohar Elkayam

Mehr von Zohar Elkayam (16)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Intro to Big Data