SlideShare ist ein Scribd-Unternehmen logo
1 von 84
Downloaden Sie, um offline zu lesen
Robert	Hryniewicz
Developer	Advocate
@RobertH8z
Apache	Spark	
Crash	Course	- DataWorks Summit	- Munich	2017
2 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
“Big	Data”
à Internet	of	Anything	(IoT)
– Wind	Turbines,	Oil	Rigs
– Beacons,	Wearables
– Smart	Cars
à User	Generated	Content	(Social,	Web	&	Mobile)
– Twitter,	Facebook,	Snapchat
– Clickstream
– Paypal,	Venmo
44ZB	in	2020
3 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Visualizing	44ZB	
100	pixels	=	1M	TB
100	px ->	1M	TB		assumes	5M	pixel	resolution	screen
4 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
5 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
The	“Big	Data”	Problem
à A	single	machine	cannot	process	or	even	store	all	the	data!
Problem
Solution
à Distribute	data	over	large	clusters
Difficulty
à How	to	split	work	across	machines?
à Moving	data	over	network	is	expensive
à Must	consider	data	&	network	locality
à How	to	deal	with	failures?
à How	to	deal	with	slow	nodes?
6 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Background
7 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What	Is	Apache	Spark?
à Apache	open	source	project	
originally	developed	at	AMPLab
(University	of	California	Berkeley)
à Unified	data	processing	engine	that	
operates	across	varied	data	
workloads	and	platforms
8 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Why	Apache	Spark?
à Elegant	Developer	APIs
– Single	environment	for	data	munging,	data	wrangling,	and	Machine	Learning	(ML)
à In-memory	computation	model	– Fast!
– Effective	for	iterative	computations	and	ML
à Machine	Learning
– Implementation	of	distributed	ML	algorithms
– Pipeline	API	(Spark	ML)
9 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	SQL
Structured	Data
Spark	Streaming
Near	Real-time
Spark	MLlib
Machine	Learning
GraphX
Graph	Analysis
10 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Basics
11 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
SparkSession
à Main	entry	point	for	Spark	functionality
à Allows	programming	with	DataFrame and	Dataset	APIs
– Fewer	concepts	and	constructs	a	developer	has	to	juggle	while	interacting	with	Spark
à Represented	as	spark	and	auto-initialized	in	Zeppelin	env.
What	is	it?
12 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	SQL
Structured	Data
Spark	Streaming
Near	Real-time
Spark	MLlib
Machine	Learning
GraphX
Graph	Analysis
13 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
More	Flexible Better	Storage	and	Performance///
14 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	SQL	Overview
à Spark	module	for	structured	data	processing	(e.g.	DB	tables,	JSON	files,	CSV)
à Three	ways	to	manipulate	data:
– DataFrames API
– SQL	queries
– Datasets	API
15 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
DataFrames
à Distributed collection of	data organized into	named	
columns
à Conceptually	equivalent	to	a	table	in	relational	DB	or	
a	data	frame	in	R/Python
à API	available	in	Scala,	Java,	Python,	and	R
Col1 Col2 … … ColN
DataFrame
Column
Row
Data	is	described	as	a	DataFrame
with	rows,	columns,	and	a	schema
16 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Sources
CSVAvro
HIVE
Spark	SQL
Col1 Col2 … … ColN
DataFrame
Column
Row
JSON
17 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Create	a	DataFrame
val path = "examples/flights.json"
val flights = spark.read.json(path)
Example
18 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Register	a	Temporary	View	(SQL	API)
Example
flights.createOrReplaceTempView("flightsView")
19 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Two	API	Examples:	DataFrame and	SQL	APIs
flights.select("Origin", "Dest", "DepDelay”)
.filter($"DepDelay" > 15).show(5)
Results
+------+----+--------+
|Origin|Dest|DepDelay|
+------+----+--------+
| IAD| TPA| 19|
| IND| BWI| 34|
| IND| JAX| 25|
| IND| LAS| 67|
| IND| MCO| 94|
+------+----+--------+
SELECT Origin, Dest, DepDelay
FROM flightsView
WHERE DepDelay > 15 LIMIT 5
SQL	API
DataFrame API
20 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	SQL
Structured	Data
Spark	Streaming
Near	Real-time
Spark	MLlib
Machine	Learning
GraphX
Graph	Analysis
21 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What	is	Stream	Processing?
Batch	Processing
• Ability	to	process	and	analyze	data	at-rest	(stored	data)
• Request-based,	bulk	evaluation	and	short-lived	processing
• Enabler	for	Retrospective,	Reactive	and	On-demand	Analytics
Stream	Processing
• Ability	to	ingest,	process	and	analyze	data	in-motion	in	real- or	near-real-time
• Event	or	micro-batch	driven,	continuous	evaluation	and	long-lived	processing
• Enabler	for	real-time	Prospective,	Proactive	and	Predictive	Analytics	 for	Next	Best	
Action
Stream	Processing	 +		Batch	Processing	 =			All	Data	Analytics
real-time (now) historical (past)
22 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Next Generation Analytics
Iterative & Exploratory
Data is the structure
Traditional Analytics
Structured & Repeatable
Structure built to store data
22
Modern	Data	Applications	approach	to	Insights
Start with hypothesis
Test against selected data
Data leads the way
Explore all data, identify correlations
Analyze after landing… Analyze in motion…
23 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
à Extension	of	Spark	Core	API
à Stream	processing	of	live	data	streams
– Scalable
– High-throughput
– Fault-tolerant
Overview
ZeroMQ
MQTT
No	longer	
supported	
in
Spark	2.x
24 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
25 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
Discretized	Streams	(DStreams)
à High-level	abstraction	representing	continuous	stream	of	data
à Internally	represented	as	a	sequence	of	RDDs
à Operation	applied	on	a	DStream translates	to	operations	on	the	underlying	RDDs
26 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
Example:	flatMap operation
27 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Streaming
à Apply	transformations	over	a	sliding	window	of	data,	e.g.	rolling	average
Window	Operations
28 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Challenges	in	Streaming	Data
à Consistency
à Fault	tolerance
à Out-of-order	data
29 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Structured	Streaming:	Basics
30 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Structured	Streaming:	Model
31 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Handling	late	arriving	data
32 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	SQL
Structured	Data
Spark	Streaming
Near	Real-time
Spark	MLlib
Machine	Learning
GraphX
Graph	Analysis
33 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
AI	in	Media	&	Pop	Culture
34 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Machine Learning use cases
Healthcare
• Predict	diagnosis
• Prioritize	screenings
• Reduce	re-admittance	rates
Financial	services
• Fraud	Detection/prevention
• Predict	underwriting	risk
• New	account	risk	screens
Public	Sector
• Analyze	public	sentiment
• Optimize	resource	allocation
• Law	enforcement	&	security	
Retail
• Product	recommendation
• Inventory	management
• Price	optimization
Telco/mobile
• Predict	customer	churn
• Predict	equipment	failure
• Customer	behavior	analysis
Oil	&	Gas
• Predictive	maintenance
• Seismic	data	management
• Predict	well	production	levels
35 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Scatter 2D Data Visualized
scatterData ç DataFrame
+-----+--------+
|label|features|
+-----+--------+
|-12.0| [-4.9]|
| -6.0| [-4.5]|
| -7.2| [-4.1]|
| -5.0| [-3.2]|
| -2.0| [-3.0]|
| -3.1| [-2.1]|
| -4.0| [-1.5]|
| -2.2| [-1.2]|
| -2.0| [-0.7]|
| 1.0| [-0.5]|
| -0.7| [-0.2]|
...
...
...
36 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Linear Regression Model Training (one feature)
Coefficients:	2.81				Intercept:	3.05
y	=	2.81x	+	3.05
Training
Result
37 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Linear Regression (two features)
Coefficients: [0.464, 0.464]
Intercept: 0.0563
38 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark API for building ML pipelines
Feature	
transform	
1
Feature	
transform	
2
Combine	
features
Linear
Regression
Input
DataFrame
Input
DataFrame
Output
DataFrame
Pipeline
Pipeline	Model
Train
Predict
Export	Model
39 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	SQL
Structured	Data
Spark	Streaming
Near	Real-time
Spark	MLlib
Machine	Learning
GraphX
Graph	Analysis
40 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
41 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
à Page	Rank
à Topic	Modeling	(LDA)
à Community	Detection
Source:	ampcamp.berkeley.edu
42 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Zeppelin	&	HDP
43 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What’s Apache Zeppelin?
Web-based notebook
that enables interactive
data analytics.
You can make beautiful
data-driven, interactive
and collaborative
documents with SQL,
Python, Scala and more
44 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
imple	line	chart
45 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
orizontal	plot	of	three	line	charts
46 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
treaming	data	into	a	line	chart
47 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
lotting	Iris	data	features	in	one	plot
48 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
omparing	Iris	data	distributions
49 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What is a Note/Notebook?
• A	web	based	GUI	for	small	code	snippets
• Write	code	snippets	in	browser
• Zeppelin	sends	code	to	backend	for	execution
• Zeppelin	gets	data	back	from	backend
• Zeppelin	visualizes	data
• Zeppelin	Note	=	Set	of	(Paragraphs/Cells)
• Other	Features	- Sharing/Collaboration/Reports/Import/Export
50 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
How	does	Zeppelin	work?
Notebook	
Author
Collaborators/
Report	viewers
Zeppelin
Cluster
Spark	|	Hive	|	HBase
Any	of	30+	back	ends
51 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Big	Data	Lifecycle
Collect
ETL	/
Process
Analysis
Report
Data
Product
Business	user
Customer
Data	ScientistData	Engineer
All	in	Zeppelin!
52 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
à Zeppelin	è Interactive	notebook
à Spark
à YARN	è Resource	Management
à HDFS	è Distributed	Storage	Layer
YARN
Scala
Java
Python
R
APIs
Spark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
N
HDFS
53 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Access patterns enabled by YARN
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
°
°N
HDFS
Hadoop Distributed File System
Interactive Real-TimeBatch
Applications Batch
Needs to happen but, no
timeframe limitations
Interactive
Needs to happen at
Human time
Real-Time
Needs to happen at
Machine Execution time.
54 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Why	Apache	Spark	on	YARN?
à Resource	management	
à Utilizes	existing	HDP	cluster	
infrastructure
à Scheduling	and	queues
Spark	Driver
Client
Spark
Application	Master
YARN	container
Spark	Executor
YARN	container
Task Task
Spark	Executor
YARN	container
Task Task
Spark	Executor
YARN	container
Task Task
55 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Why HDFS?
Fault Tolerant Distributed Storage
• Divide	files	into	big	blocks	and	distribute	3	copies	randomly across	the	cluster
• Processing	Data	Locality
• Not	Just	storage	but	computation
10110100101
00100111001
11111001010
01110100101
00101100100
10101001100
01010010111
01011101011
11011011010
10110100101
01001010101
01011100100
11010111010
0
Logical File
1
2
3
4
Blocks
1
Cluster
1
1
2
2
2
3
3
34
4
4
56 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark and HDP
57 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
HDCloud
58 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hortonworks	Cloud	Solutions
Microsoft AWS Google
Managed Azure	HDInsight
Non-Managed	/
Marketplace
Hortonworks	Data	
Cloud	for	AWS
Cloud	IaaS
Hortonworks	Data	Platform
(via	Ambari	and	via	Cloudbreak)
59 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hortonworks	Cloud	Solutions:	Flexibility	and	Choice
Hortonworks	Data	
Cloud	for	AWS
Cloudbreak
HDP	on	Cloud	IaaS
More	Prescriptive
More	Ephemeral
More	Options
More	Long	Running
60 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
HDP	2.6	and	New	Cluster	Types
Spark	
2.1
Druid
TP
Interactive	
Hive
61 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
62 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
63 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Multitenancy	with	Zeppelin
64 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Livy
à Livy	is	the	open	source	REST	interface	for	interacting	with	Apache	Spark	from	anywhere	
à Installed	as	Spark	Ambari Service
Livy Client
HTTP HTTP	(RPC)
Spark	Interactive	Session
SparkContext
Spark	Batch	Session
SparkContext
Livy Server
65 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Security	Across	Zeppelin-Livy-Spark
Shiro
Ispark	Group	Interpreter
SPNego:	Kerberos Kerberos
Livy	APIs
Spark	on	YARN
Zeppelin
Driver
LDAP
Livy Server
66 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Reasons	to	Integrate	with	Livy
à Bring	Sessions	to	Apache	Zeppelin
– Isolation
– Session	sharing	
à Enable	efficient	cluster	resource	utilization
– Default	Spark	interpreter	keeps	YARN/Spark	job	running	forever
– Livy	interpreter	recycled	after	60	minutes	of	inactivity	
(controlled	by	livy.server.session.timeout )
à To	Identity	Propagation
– Send	user	identity	from	Zeppelin		>	Livy		>	Spark	on	YARN
67 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Livy Server
SparkSession Sharing
Session-2
Session-1
SparkSession-1
SparkContext
SparkSession-2
SparkContext
Client	1
Client	2
Client	3
Session-1
Session-1
Session-2
68 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Zeppelin	Security:	Authentication	+	SSL
Tommy	Callahan
Zeppelin Spark	on	YARN
LDAP
SSL
Firewall
1
2
3
69 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Zeppelin	+	Livy	End-to-End	Security
Ispark	Group	Interpreter
SPNego:	Kerberos Kerberos/RPC
Livy	APIs
Spark	on	YARN
Zeppelin
LDAP
Livy Server
Job	runs	as
Tommy	Callahan
Tommy	Callahan
70 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Sample	Architecture
71 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Modern	Data	Apps
à HDP	2.6
– Batch	Processing
à HDF	2.1
– Streaming	Apps
DATA	AT
REST
DATA	IN	
MOTION
ACTIONABLE
INTELLIGENCE
Modern	Data	Applications
72 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Modern	Data	Applications
Custom	or	Off	the	Shelf
Real-Time	Cyber	Security
protects	systems	with	superior	threat	
detection
Smart	Manufacturing
dramatically	improves	yields	by	managing	
more	variables	in	greater	detail
Connected,	Autonomous	Cars
drive	themselves	and	improve	road	safety
Future	Farming
optimizing	soil,	seeds	and	equipment	to	
measured	conditions	on	each	square	foot
Automatic	Recommendation	Engines
match	products	to	preferences	in	milliseconds
DATA	AT
REST
DATA	IN	
MOTION
ACTIONABLE
INTELLIGENCE
Modern	Data	Applications
Hortonworks	
DataFlow
Hortonworks	
Data	Platform
73 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Managed	Dataflow
SOURCES
REGIONAL	
INFRASTRUCTURE
CORE	
INFRASTRUCTURE
74 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
75 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
High-Level	Overview
IoT Edge
(single	node)
IoT Edge
(single	node)
IoT Devices
IoT Devices
NiFi Hub Data	Broker
Column	
DB
Data	
Store
Live	Dashboard
Data	Center
(on	prem/cloud)
HDFS/S3 HBase/Cassandra
76 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Labs	/	Tutorials
77 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Future	Tutorials
à Deploying	Models	with	Spark	Structured	Streaming
à Predicting	Airline	Delays	with	SparkR
à Sentiment	Analysis	with	Apache	Spark	(Gradient	Boosting)
à Auto	Text	Classification	(Naïve	Bayes)
78 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hortonworks	Community	Connection
Read access for everyone, join to participate and be recognized
• Full	Q&A	Platform	(like	StackOverflow)
• Knowledge	Base	Articles
• Code	Samples	and	Repositories
79 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Community	Engagement
Participate now at: community.hortonworks.com©	Hortonworks	Inc.	2011	– 2015.	All	Rights	Reserved
12,000+
Registered	Users
35,000+
Answers
55,000+
Technical	Assets
One Website!
80 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
www.futureofdata.io
Future	of	Data	Meetups
81 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
FB	Sort
à Spark	job	that	reads	60	TB	of	compressed	data	
and	performs	a	90	TB	shuffle	and	sort.
à Largest	real-world	Spark	job	to	date!
– Databricks’	PetaByte sort	was	on	synthetic	data.
à Multiple	reliability	fixes.
à Spark	job	that	reads	60	TB	of	compressed	data	
and	performs	a	90	TB	shuffle	and	sort.
à Largest	real-world	Spark	job	to	date!
– Databricks’	PetaByte sort	was	on	synthetic	data.
à Multiple	reliability	fixes.
“Spark	could	reliably	shuffle	and	sort	90	TB+	intermediate	data	and	run	250,000	tasks	in	a	
single	job	[...]	and	it	has	been	running	in	production	for	several	months.”
82 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	SQL
Structured	Data
Spark	Streaming
Near	Real-time
Spark	MLlib
Machine	Learning
GraphX
Graph	Analysis
Robert	Hryniewicz
@robertH8z
83 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What’s	new	in	HDP	2.6	– Spark	&	Zeppelin
à Spark	1.6.3	GA
à Spark	2.1	GA
à REST	API	(Livy)	GA
à Spark	Thrift	Server	doAS GA
à SparkSQL – Row/Column	Security	(GA)
à Spark	Streaming	+	Kafka	over	SSL
à Multi	Cluster	HBase support	for	SHC
à Package	support	in	PySpark &	SparkR
Spark
à Spark	2.x	support
à Improved	Livy	integration
à No	password	in	clear
à JDBC	interpreter	improvements
à Smart	Sense	integration
à Knox	proxy	Zeppelin	UI
Zeppelin	0.7.x
Robert	Hryniewicz
@RobertH8z
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for DummiesRodney Joyce
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Casesnzhang
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big DataSeval Çapraz
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index FileNavis Ryu
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Designing Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLDesigning Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLVenu Anuganti
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data StreamsSujaAldrin
 

Was ist angesagt? (20)

Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index File
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Designing Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLDesigning Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQL
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 

Ähnlich wie Apache Spark Crash Course

Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXKirk Haslbeck
 
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks PresentationParis FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks PresentationAbdelkrim Hadjidj
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin DataWorks Summit/Hadoop Summit
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming dataCarolyn Duby
 
Make Streaming IoT Analytics Work for You
Make Streaming IoT Analytics Work for YouMake Streaming IoT Analytics Work for You
Make Streaming IoT Analytics Work for YouHortonworks
 
IBM Cloud Paris meetup 20180213 - Hortonworks
IBM Cloud Paris meetup   20180213 - HortonworksIBM Cloud Paris meetup   20180213 - Hortonworks
IBM Cloud Paris meetup 20180213 - HortonworksIBM France Lab
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...DataWorks Summit
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at ScaleDataWorks Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureMats Johansson
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceThiago Santiago
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Hortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 

Ähnlich wie Apache Spark Crash Course (20)

Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWX
 
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks PresentationParis FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
 
Unlocking insights in streaming data
Unlocking insights in streaming dataUnlocking insights in streaming data
Unlocking insights in streaming data
 
Make Streaming IoT Analytics Work for You
Make Streaming IoT Analytics Work for YouMake Streaming IoT Analytics Work for You
Make Streaming IoT Analytics Work for You
 
IBM Cloud Paris meetup 20180213 - Hortonworks
IBM Cloud Paris meetup   20180213 - HortonworksIBM Cloud Paris meetup   20180213 - Hortonworks
IBM Cloud Paris meetup 20180213 - Hortonworks
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern ArchitectureData in Motion - Data at Rest - Hortonworks a Modern Architecture
Data in Motion - Data at Rest - Hortonworks a Modern Architecture
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data Science
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 

Mehr von DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

Mehr von DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Kürzlich hochgeladen

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Apache Spark Crash Course