SlideShare ist ein Scribd-Unternehmen logo
1 von 71
In-memory Caching in HDFS: Lower
Latency, Same Great Taste
Andrew Wang and Colin McCabe
Cloudera
2
In-memory Caching in HDFS
Lower latency, same great taste
Andrew Wang | awang@cloudera.com
Colin McCabe | cmccabe@cloudera.com
Alice
Hadoop
cluster
Query
Result set
Alice
Fresh data
Fresh data
Alice
Rollup
Problems
• Data hotspots
• Everyone wants to query some fresh data
• Shared disks are unable to handle high load
• Mixed workloads
• Data analyst making small point queries
• Rollup job scanning all the data
• Point query latency suffers because of I/O contention
• Same theme: disk I/O contention!
7
How do we solve I/O issues?
• Cache important datasets in memory!
• Much higher throughput than disk
• Fast random/concurrent access
• Interesting working sets often fit in cluster memory
• Traces from Facebook’s Hive cluster
• Increasingly affordable to buy a lot of memory
• Moore’s law
• 1TB RAM server is 40k on HP’s website
8
Alice
Page cache
Alice
Repeated
query
?
Alice
Rollup
Alice
Extra copies
Alice
Checksum
verification
Extra copies
Design Considerations
1. Placing tasks for memory locality
• Expose cache locations to application schedulers
2. Contention for page cache from other users
• Explicitly pin hot datasets in memory
3. Extra copies when reading cached data
• Zero-copy API to read cached data
14
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Microbenchmarks
• MapReduce
• Impala
• Future work
15
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Microbenchmarks
• MapReduce
• Impala
• Future work
16
Cache Directives
• A cache directive describes a file or directory that
should be cached
• Path
• Cache replication factor: 1-N
• Stored permanently on the NameNode
• Also have cache pools for access control and quotas,
but we won’t be covering that here
17
Architecture
18
DataNode
DataNode DataNode
NameNode
DFSClient
Cache /foo
Cache commandsCache Heartbeats
DFSClient
getBlockLocations
mlock
• The DataNode pins each cached block into the page
cache using mlock, and checksums it.
• Because we’re using the page cache, the blocks don’t
take up any space on the Java heap.
19
DataNode
Page Cache
DFSClient read
mlock
Zero-copy read API
• Clients can use the zero-copy read API to map the
cached replica into their own address space
• The zero-copy API avoids the overhead of the
read() and pread() system calls
• However, we don’t verify checksums when using the
zero-copy API
• The zero-copy API can be only used on cached data, or
when the application computes its own checksums.
20
Zero-copy read API
New FSDataInputStream methods:
ByteBuffer read(ByteBufferPool pool,
int maxLength, EnumSet<ReadOption> opts);
void releaseBuffer(ByteBuffer buffer);
21
Skipping Checksums
• We would like to skip checksum verification when
reading cached data
• DataNode already checksums when caching the block
• Enables more efficient SCR, ZCR
• Requirements
• Client needs to know that the replica is cached
• DataNode needs to notify the client if the replica is
uncached
22
Skipping Checksums
• The DataNode and DFSClient use shared memory
segments to communicate which blocks are cached.
23
DataNode
Page Cache
DFSClient read
mlock
Shared
Memory
Segment
Skipping Checksums
24
Block 123
DataNode DFSClient
Can Skip Csums
In Use
Zero-Copy
MappedByteBuffer
Skipping Checksums
25
Block 123
DataNode DFSClient
Can Skip Csums
In Use
Zero-Copy
MappedByteBuffer
Architecture Summary
• The Cache Directive API provides per-file control over
what is cached
• The NameNode tracks cached blocks and coordinates
DataNode cache work
• The DataNodes use mlock to lock page cache blocks
into memory
• The DFSClient can determine whether it is safe to
skip checksums via the shared memory segment
• Caching makes it possible to use the efficient Zero-
Copy API on cached data
26
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Single-Node Microbenchmarks
• MapReduce
• Impala
• Future work
27
Test Node
• 48GB of RAM
• Configured 38GB of HDFS cache
• 11x SATA hard disks
• 2x4 core 2.13 GHz Westmere Xeon processors
• 10 Gbit/s full-bisection bandwidth network
28
Single-Node Microbenchmarks
• How much faster are cached and zero-copy reads?
• Introducing vecsum (vector sum)
• Computes sums of a file of doubles
• Highly optimized: uses SSE intrinsics
• libhdfs program
• Can toggle between various read methods
• Terminology
• SCR: short-circuit reads
• ZCR: zero-copy reads
29
Throughput Reading 1G File 20x
30
0.8 0.9
1.9
2.4
5.9
0
1
2
3
4
5
6
7
TCP TCP no
csums
SCR SCR no
csums
ZCR
GB/s
ZCR 1GB vs 20GB
31
5.9
2.7
0
1
2
3
4
5
6
7
1GB 20GB
GB/s
Throughput
• Skipping checksums matters more when going faster
• ZCR gets close to bus bandwidth
• ~6GB/s
• Need to reuse client-side mmaps for maximum perf
• page_fault function is 1.16% of cycles in 1G
• 17.55% in 20G
32
Client CPU cycles
33
57.6
51.8
27.1
23.4
12.7
0
10
20
30
40
50
60
70
TCP TCP no
csums
SCR SCR no
csums
ZCR
CPUcycles(billions)
Why is ZCR more CPU-efficient?
34
Why is ZCR more CPU-efficient?
35
Remote Cached vs. Local Uncached
• Zero-copy is only possible for local cached data
• Is it better to read from remote cache, or local disk?
36
Remote Cached vs. Local Uncached
37
841
1092
125 137
0
200
400
600
800
1000
1200
TCP iperf SCR dd
MB/s
Microbenchmark Conclusions
• Short-circuit reads need less CPU than TCP reads
• ZCR is even more efficient, because it avoids a copy
• ZCR goes much faster when re-reading the same
data, because it can avoid mmap page faults
• Network and disk may be bottleneck for remote or
uncached reads
38
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Microbenchmarks
• MapReduce
• Impala
• Future work
39
MapReduce
• Started with example MR jobs
• Wordcount
• Grep
• 5 node cluster: 4 DNs, 1 NN
• Same hardware configuration as single node tests
• 38GB HDFS cache per DN
• 11 disks per DN
• 17GB of Wikipedia text
• Small enough to fit into cache at 3x replication
• Ran each job 10 times, took the average
40
wordcount and grep
41
275
52
280
55
0
50
100
150
200
250
300
350
400
wordcount wordcount
cached
grep grep cached
wordcount and grep
42
275
52
280
55
0
50
100
150
200
250
300
350
400
wordcount wordcount
cached
grep grep cached
Almost no
speedup!
wordcount and grep
43
275
52
280
55
0
50
100
150
200
250
300
350
400
wordcount wordcount
cached
grep grep cached
~60MB/s
~330MB/s
Not I/O bound
wordcount and grep
• End-to-end latency barely changes
• These MR jobs are simply not I/O bound!
• Best map phase throughput was about 330MB/s
• 44 disks can theoretically do 4400MB/s
• Further reasoning
• Long JVM startup and initialization time
• Many copies in TextInputFormat, doesn’t use zero-copy
• Caching input data doesn’t help reduce step
44
Introducing bytecount
• Trivial version of wordcount
• Counts # of occurrences of byte values
• Heavily CPU optimized
• Each mapper processes an entire block via ZCR
• No additional copies
• No record slop across block boundaries
• Fast inner loop
• Very unrealistic job, but serves as a best case
• Also tried 2GB block size to amortize startup costs
45
bytecount
46
52
39 35
55
45
58
0
10
20
30
40
50
60
70
bytecount
47
52
39 35
55
45
58
0
10
20
30
40
50
60
70
1.3x faster
bytecount
48
52
39 35
55
45
58
0
10
20
30
40
50
60
70
Still only
~500MB/s
MapReduce Conclusions
49
• Many MR jobs will see marginal improvement
• Startup costs
• CPU inefficiencies
• Shuffle and reduce steps
• Even bytecount sees only modest gains
• 1.3x faster than disk
• 500MB/s with caching and ZCR
• Nowhere close to GB/s possible with memory
• Needs more work to take full advantage of caching!
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Microbenchmarks
• MapReduce
• Impala
• Future work
50
Impala Benchmarks
• Open-source OLAP database developed by Cloudera
• Tested with Impala 1.3 (CDH 5)
• Same 4 DN cluster as MR section
• 38GB of 48GB per DN configured as HDFS cache
• 152GB aggregate HDFS cache
• 11 disks per DN
51
Impala Benchmarks
• 1TB TPC-DS store_sales table, text format
• count(*) on different numbers of partitions
• Has to scan all the data, no skipping
• Queries
• 51GB small query (34% cache capacity)
• 148GB big query (98% cache capacity)
• Small query with concurrent workload
• Tested “cold” and “hot”
• echo 3 > /proc/sys/vm/drop_caches
• Lets us compare HDFS caching against page cache
52
Small Query
53
19.8
5.8
4.0 3.0
0
5
10
15
20
25
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
Small Query
54
19.8
5.8
4.0 3.0
0
5
10
15
20
25
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
2550 MB/s 17 GB/s
I/O bound!
Small Query
55
0
5
10
15
20
25
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
3.4x faster,
disk vs. memory
Small Query
56
0
5
10
15
20
25
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
1.3x after warmup, still
wins on CPU efficiency
Big Query
57
48.2
11.5
40.9
9.4
0
10
20
30
40
50
60
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
Big Query
58
0
10
20
30
40
50
60
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
4.2x
faster, disk
vs mem
Big Query
59
0
10
20
30
40
50
60
Uncached
cold
Cached cold Uncached
hot
Cached hot
Averageresponsetime(s)
4.3x
faster, does
n’t fit in
page cache
Cannot schedule for
page cache locality
Small Query with Concurrent Workload
60
0
10
20
30
40
50
60
Uncached Cached Cached (not
concurrent)
Averageresponsetime(s)
Small Query with Concurrent Workload
61
0
10
20
30
40
50
60
Uncached Cached Cached (not
concurrent)
Averageresponsetime(s)
7x faster when small query
working set is cached
Small Query with Concurrent Workload
62
0
10
20
30
40
50
60
Uncached Cached Cached (not
concurrent)
Averageresponsetime(s)
2x slower than
isolated, CPU
contention
Impala Conclusions
• HDFS cache is faster than disk or page cache
• ZCR is more efficient than SCR from page cache
• Better when working set is approx. cluster memory
• Can schedule tasks for cache locality
• Significantly better for concurrent workloads
• 7x faster when contending with a single background query
• Impala performance will only improve
• Many CPU improvements on the roadmap
63
Outline
• Implementation
• NameNode and DataNode modifications
• Zero-copy read API
• Evaluation
• Microbenchmarks
• MapReduce
• Impala
• Future work
64
Future Work
• Automatic cache replacement
• LRU, LFU, ?
• Sub-block caching
• Potentially important for automatic cache replacement
• Columns in Parquet
• Compression, encryption, serialization
• Lose many benefits of zero-copy API
• Write-side caching
• Enables Spark-like RDDs for all HDFS applications
65
Conclusion
• I/O contention is a problem for concurrent workloads
• HDFS can now explicitly pin working sets into RAM
• Applications can place their tasks for cache locality
• Use zero-copy API to efficiently read cached data
• Substantial performance improvements
• 6GB/s for single thread microbenchmark
• 7x faster for concurrent Impala workload
66
bytecount
70
52
39 35
55
45
58
0
10
20
30
40
50
60
70
Less disk parallelism
In-memory Caching in HDFS: Lower Latency, Same Great Taste

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and Safer
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
 

Ähnlich wie In-memory Caching in HDFS: Lower Latency, Same Great Taste

Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
marvin herrera
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 

Ähnlich wie In-memory Caching in HDFS: Lower Latency, Same Great Taste (20)

In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
Caching Methodology & Strategies
Caching Methodology & StrategiesCaching Methodology & Strategies
Caching Methodology & Strategies
 
Caching methodology and strategies
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategies
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin Charles
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 

Mehr von DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

In-memory Caching in HDFS: Lower Latency, Same Great Taste