SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Securely explore your data

APPACHE
ACCUMULO
Adam Fuchs and John Vines
sqrrldata, inc.
January 9, 2013
APACHE ACCUMULO

Sorted, Distributed Key/Value Store
Based on Google’s Big Table Design
Built on Top of Apache Hadoop and Apache Zookeeper
Augments and Integrates With the Hadoop ecosystem

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

2
TODAY’S TALK
Overview of the Accumulo Project
Accumulo Design
Table Design Strategy
Live Demonstration

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

3
ACCUMULO TIMELINE
NSA open
sources
Accumulo into
incubation at
Apache

Google
publishes
Bigtablepaper

2005

2006

Google Publishes
Papers:
GFS (2003)
Map Reduce (2004)

2007

2008

2009

NSA begins
development of
Accumulo

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Accumulo becomes
a top-level Apache
project

2010

2011

First sqrrl
release planned

2012

2013

sqrrl is founded

4
ACCUMULO’S STRENGTHS
Apache Accumulo excels at:
- Security
Cell-level security reduces the cost of application development in
the presence of complex legal or policy restrictions on data use
Mandatory access control keeps your data safe
- Scalability
Proven reliability and performance at the multi-petabyte scale
High-performance parallel I/O library
- Adaptability
Flexible schema support to quickly ingest new data sources
Sorted key/value paradigm supports a multitude of search and
analysis applications
Server-side programming framework “iterator trees” support bestin-class aggregation, filtering, and complex query semantics
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

5
BASIC SCHEMA
Accumulo stores sorted key/value pairs (entries).

An Accumulo key is a 5-tuple, consisting of:
- Row: Controls Atomicity
- Column Family: Controls Locality
- Column Qualifier: Controls Uniqueness
- Visibility Label: Controls Access
- Timestamp: Controls Versioning

Keys are sorted:
-Hierarchically: Row first, then column family, and so on.
- Lexicographically: Compare first byte, then second, and so on.

Values are byte arrays.

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

6
KEY/VALUE EXAMPLES
Row

Col. Fam.

Col. Qual.

John Doe

Visibility

JD

Timesta
mp

Value

Jane Doe

Friends

20121130

Jane Doe

PhoneNumbe
555-1212
r

John Doe

Friends

Jane Doe

JD

20121201

John Doe

Notes

PCP

PCP_JD

20120912

Patient suffers
from an acute …

John Doe

Test Results

Cholesterol

JD|PCP_JD

20120912

183

John Doe

Test Results

Mental Health

JD|PSYCH_JD

20120801

Pass

John Doe

Test Results

Mental Health

PSYCH_JD

20120801

Crazy!

John Doe

Test Results

X-Ray

JD|PHYS_JD

20120513

1010110110100
…

20090115

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

7
VISIBILITY SYNTAX & SEMANTICS

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

8
TABLET ORGANIZATION
Well-Known
Location
(zookeeper)

Collections of entries from tables
Tables are partitioned into Tablets
Metadata tablets hold info about other tablets,
forming a 3-level hierarchy
A Tablet is a unit of work for a Tablet Server

Root Tablet
-∞ to ∞

Metadata Tablet 1

Metadata Tablet 2

-∞ to
“Encyclopedia:Ocelot”

“Encyclopedia:Ocelot” to ∞

Table: Adam’s Table
Data Tablet
-∞ : thing

Data Tablet
thing : ∞

Table: Encyclopedia
Data Tablet
-∞ : Ocelot

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Data Tablet
Ocelot : Yak

Data Tablet
Yak : ∞

Table: Foo
Data Tablet
-∞ to ∞

9
ACCUMULO ARCHITECTURE
Zookeeper

Zookeeper

Delegate
Authority,
Configs

Zookeeper
Delegate
Authority,
Configs

Tablet Server

Tablet
Read/Write
Assign/Balance

Tablet Server

Master

Application

Application
Tablet
Store/Replicate

Tablet Server

Application

HDFS
Scan

Delete

Tablet

Garbage
Collector
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

10
TABLET DATA FLOW
Tablet
Writes

In-Memory
Map

Scan

Iterator
Tree

Iterator
Tree

Minor
Compaction

Sorted,
Indexed
File

Sorted,
Indexed
File
Write Ahead
Log
(For Recovery)

Iterator
Major Tree

Merging /
Compaction

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Reads

Sorted,
Indexed
File

11
ITERATOR FRAMEWORK
Iterator Operations:
- File Reads
- Block Caching
- Merging
- Deletion
- Isolation
- Locality Groups
- Range Selection
- Column Selection
- Cell-level Security
- Versioning
- Filtering
- Aggregation
- Partitioned Joins

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

12
CLIENT API
new ZooKeeperInstance(...)

Instance

new MockInstance()

getConnector(auth info...)

Range
IteratorOption

Connector

TableOperations

Authorizations

InstanceOperations

createScanner(...)

createBatchScanner(...) createBatchWriter(...)

SecurityOperations

Scanner

BatchScanner

BatchWriter

iterator()
addMutation(...)

Map.Entry
Key

Mutation

Value

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

13
TABLE DESIGN
Table:

Graphs
Document-distributed indexing
Multi-dimensional index
Custom index

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

Inverted Index

Row:

<UUID>

<Term>

Column
Family:

<Type>

<Type> +
<Field>

Column
Qualifier:

<Field>

<UUID>

Value:

No built-in secondary indices
Sort Order  Index
Basic design pattern: forward and
inverted index tables
Additional table design patterns

Forward Index

<Term>

<Digest of
Event>

14
DEMO TIME!

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

15
OPEN SOURCE PROJECT
Apache Software Foundation project since October 2011
site: http://accumulo.apache.org
jira: https://issues.apache.org/jira/browse/ACCUMULO
lists: http://accumulo.apache.org/mailing_list.html

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

16
CURRENT CONTRIBUTORS

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

17
CONTACT

Adam Fuchs
CTO
adam@sqrrl.com

John Vines
Director of Ecosystems
john@sqrrl.com

sqrrl data, Inc.
www.sqrrl.com
@sqrrl_inc
info@sqrrl.com
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

18

Weitere ähnliche Inhalte

Was ist angesagt?

Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated GenomicsIdan Tohami
 
Achieving HIPAA on GCP
Achieving HIPAA on GCPAchieving HIPAA on GCP
Achieving HIPAA on GCPIdan Tohami
 
Search Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSearch Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSematext Group, Inc.
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
 

Was ist angesagt? (7)

Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated Genomics
 
Achieving HIPAA on GCP
Achieving HIPAA on GCPAchieving HIPAA on GCP
Achieving HIPAA on GCP
 
AcceleTest
AcceleTestAcceleTest
AcceleTest
 
Using Graphs for Data Analysis
Using Graphs for Data AnalysisUsing Graphs for Data Analysis
Using Graphs for Data Analysis
 
Search Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSearch Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL Backend
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
 

Ähnlich wie Accumulo meetup 20130109

Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013Sqrrl
 
Sqrrl Overview for Stac Research
Sqrrl Overview for Stac ResearchSqrrl Overview for Stac Research
Sqrrl Overview for Stac ResearchSqrrl
 
206590 mobilizing your primavera workforce
206590 mobilizing your primavera workforce206590 mobilizing your primavera workforce
206590 mobilizing your primavera workforcep6academy
 
Harnessing the Power of Apache Hadoop Series
Harnessing the Power of Apache Hadoop SeriesHarnessing the Power of Apache Hadoop Series
Harnessing the Power of Apache Hadoop SeriesCloudera, Inc.
 
Sqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl
 
OSI_MySQL_Performance Schema
OSI_MySQL_Performance SchemaOSI_MySQL_Performance Schema
OSI_MySQL_Performance SchemaMayank Prasad
 
dlux splunk>live! 2012 Beginners Session
dlux splunk>live! 2012 Beginners Sessiondlux splunk>live! 2012 Beginners Session
dlux splunk>live! 2012 Beginners SessionDavid Lutz
 
Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Sqrrl
 
Getting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionSplunk
 
What's Next for Google's BigTable
What's Next for Google's BigTableWhat's Next for Google's BigTable
What's Next for Google's BigTableSqrrl
 
3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdf3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdfPawachMetharattanara
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewPaulo Fagundes
 
High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2Mario Redón Luz
 
Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Alexandre (Shura) Iline
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataInfiniteGraph
 
Efficient content structures and queries in CRX/CQ
Efficient content structures and queries in CRX/CQEfficient content structures and queries in CRX/CQ
Efficient content structures and queries in CRX/CQconnectwebex
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias StorageFran Navarro
 
Hadoop Design Patterns
Hadoop Design PatternsHadoop Design Patterns
Hadoop Design PatternsEMC
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Glen Hawkins
 

Ähnlich wie Accumulo meetup 20130109 (20)

Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
Adam Fuchs' Accumulo Talk at NoSQL Now! 2013
 
Sqrrl Overview for Stac Research
Sqrrl Overview for Stac ResearchSqrrl Overview for Stac Research
Sqrrl Overview for Stac Research
 
206590 mobilizing your primavera workforce
206590 mobilizing your primavera workforce206590 mobilizing your primavera workforce
206590 mobilizing your primavera workforce
 
Harnessing the Power of Apache Hadoop Series
Harnessing the Power of Apache Hadoop SeriesHarnessing the Power of Apache Hadoop Series
Harnessing the Power of Apache Hadoop Series
 
Sqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data SilosSqrrl February Webinar: Breaking Down Data Silos
Sqrrl February Webinar: Breaking Down Data Silos
 
OSI_MySQL_Performance Schema
OSI_MySQL_Performance SchemaOSI_MySQL_Performance Schema
OSI_MySQL_Performance Schema
 
dlux splunk>live! 2012 Beginners Session
dlux splunk>live! 2012 Beginners Sessiondlux splunk>live! 2012 Beginners Session
dlux splunk>live! 2012 Beginners Session
 
Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02Hugaccumulo 121018192044-phpapp02
Hugaccumulo 121018192044-phpapp02
 
Getting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout SessionGetting Started with Splunk Enterprise Hands-On Breakout Session
Getting Started with Splunk Enterprise Hands-On Breakout Session
 
What's Next for Google's BigTable
What's Next for Google's BigTableWhat's Next for Google's BigTable
What's Next for Google's BigTable
 
3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdf3. EMC Storage for future Surveillance.pdf
3. EMC Storage for future Surveillance.pdf
 
con9578-2088758.pdf
con9578-2088758.pdfcon9578-2088758.pdf
con9578-2088758.pdf
 
Oracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overviewOracle NoSQL Database release 3.0 overview
Oracle NoSQL Database release 3.0 overview
 
High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2
 
Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.Java code coverage with JCov. Implementation details and use cases.
Java code coverage with JCov. Implementation details and use cases.
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
Efficient content structures and queries in CRX/CQ
Efficient content structures and queries in CRX/CQEfficient content structures and queries in CRX/CQ
Efficient content structures and queries in CRX/CQ
 
Tendencias Storage
Tendencias StorageTendencias Storage
Tendencias Storage
 
Hadoop Design Patterns
Hadoop Design PatternsHadoop Design Patterns
Hadoop Design Patterns
 
Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive Oracle Active Data Guard: Best Practices and New Features Deep Dive
Oracle Active Data Guard: Best Practices and New Features Deep Dive
 

Mehr von Sqrrl

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government TechnologySqrrl
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsSqrrl
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkSqrrl
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Sqrrl
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphSqrrl
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Sqrrl
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivitySqrrl
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingSqrrl
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Sqrrl
 
Leveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivityLeveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivitySqrrl
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert TriageSqrrl
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to KnowSqrrl
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data AdvantageSqrrl
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreSqrrl
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelSqrrl
 

Mehr von Sqrrl (20)

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government Technology
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar Users
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
 
Leveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivityLeveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker Activity
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to Know
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber Hunting
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value Store
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 

Kürzlich hochgeladen

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Accumulo meetup 20130109

  • 1. Securely explore your data APPACHE ACCUMULO Adam Fuchs and John Vines sqrrldata, inc. January 9, 2013
  • 2. APACHE ACCUMULO Sorted, Distributed Key/Value Store Based on Google’s Big Table Design Built on Top of Apache Hadoop and Apache Zookeeper Augments and Integrates With the Hadoop ecosystem © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 2
  • 3. TODAY’S TALK Overview of the Accumulo Project Accumulo Design Table Design Strategy Live Demonstration © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 3
  • 4. ACCUMULO TIMELINE NSA open sources Accumulo into incubation at Apache Google publishes Bigtablepaper 2005 2006 Google Publishes Papers: GFS (2003) Map Reduce (2004) 2007 2008 2009 NSA begins development of Accumulo © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Accumulo becomes a top-level Apache project 2010 2011 First sqrrl release planned 2012 2013 sqrrl is founded 4
  • 5. ACCUMULO’S STRENGTHS Apache Accumulo excels at: - Security Cell-level security reduces the cost of application development in the presence of complex legal or policy restrictions on data use Mandatory access control keeps your data safe - Scalability Proven reliability and performance at the multi-petabyte scale High-performance parallel I/O library - Adaptability Flexible schema support to quickly ingest new data sources Sorted key/value paradigm supports a multitude of search and analysis applications Server-side programming framework “iterator trees” support bestin-class aggregation, filtering, and complex query semantics © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5
  • 6. BASIC SCHEMA Accumulo stores sorted key/value pairs (entries). An Accumulo key is a 5-tuple, consisting of: - Row: Controls Atomicity - Column Family: Controls Locality - Column Qualifier: Controls Uniqueness - Visibility Label: Controls Access - Timestamp: Controls Versioning Keys are sorted: -Hierarchically: Row first, then column family, and so on. - Lexicographically: Compare first byte, then second, and so on. Values are byte arrays. © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 6
  • 7. KEY/VALUE EXAMPLES Row Col. Fam. Col. Qual. John Doe Visibility JD Timesta mp Value Jane Doe Friends 20121130 Jane Doe PhoneNumbe 555-1212 r John Doe Friends Jane Doe JD 20121201 John Doe Notes PCP PCP_JD 20120912 Patient suffers from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results Mental Health PSYCH_JD 20120801 Crazy! John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100 … 20090115 © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7
  • 8. VISIBILITY SYNTAX & SEMANTICS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 8
  • 9. TABLET ORGANIZATION Well-Known Location (zookeeper) Collections of entries from tables Tables are partitioned into Tablets Metadata tablets hold info about other tablets, forming a 3-level hierarchy A Tablet is a unit of work for a Tablet Server Root Tablet -∞ to ∞ Metadata Tablet 1 Metadata Tablet 2 -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Table: Adam’s Table Data Tablet -∞ : thing Data Tablet thing : ∞ Table: Encyclopedia Data Tablet -∞ : Ocelot © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Data Tablet Ocelot : Yak Data Tablet Yak : ∞ Table: Foo Data Tablet -∞ to ∞ 9
  • 10. ACCUMULO ARCHITECTURE Zookeeper Zookeeper Delegate Authority, Configs Zookeeper Delegate Authority, Configs Tablet Server Tablet Read/Write Assign/Balance Tablet Server Master Application Application Tablet Store/Replicate Tablet Server Application HDFS Scan Delete Tablet Garbage Collector © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 10
  • 11. TABLET DATA FLOW Tablet Writes In-Memory Map Scan Iterator Tree Iterator Tree Minor Compaction Sorted, Indexed File Sorted, Indexed File Write Ahead Log (For Recovery) Iterator Major Tree Merging / Compaction © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Reads Sorted, Indexed File 11
  • 12. ITERATOR FRAMEWORK Iterator Operations: - File Reads - Block Caching - Merging - Deletion - Isolation - Locality Groups - Range Selection - Column Selection - Cell-level Security - Versioning - Filtering - Aggregation - Partitioned Joins © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12
  • 13. CLIENT API new ZooKeeperInstance(...) Instance new MockInstance() getConnector(auth info...) Range IteratorOption Connector TableOperations Authorizations InstanceOperations createScanner(...) createBatchScanner(...) createBatchWriter(...) SecurityOperations Scanner BatchScanner BatchWriter iterator() addMutation(...) Map.Entry Key Mutation Value © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13
  • 14. TABLE DESIGN Table: Graphs Document-distributed indexing Multi-dimensional index Custom index © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential Inverted Index Row: <UUID> <Term> Column Family: <Type> <Type> + <Field> Column Qualifier: <Field> <UUID> Value: No built-in secondary indices Sort Order  Index Basic design pattern: forward and inverted index tables Additional table design patterns Forward Index <Term> <Digest of Event> 14
  • 15. DEMO TIME! © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15
  • 16. OPEN SOURCE PROJECT Apache Software Foundation project since October 2011 site: http://accumulo.apache.org jira: https://issues.apache.org/jira/browse/ACCUMULO lists: http://accumulo.apache.org/mailing_list.html © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16
  • 17. CURRENT CONTRIBUTORS © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 17
  • 18. CONTACT Adam Fuchs CTO adam@sqrrl.com John Vines Director of Ecosystems john@sqrrl.com sqrrl data, Inc. www.sqrrl.com @sqrrl_inc info@sqrrl.com © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18

Hinweis der Redaktion

  1. Sort order across all keys.Columns can differ across rows.Value can have many different types.Entry can convey information with an empty value.Each key has a visibility – a given read will see a subset of the data.Visibility is part of key uniqueness – PSYCH_JD is withholding information from JD.
  2. Tablet Servers have 4 primary functions:Hosting RPCs (read, write, etc.)Managing resources (RAM, CPU, File I/O, etc.)Scheduling background tasks (compactions, caching, etc.)Handling key/value pairs (via Iterators)BecauseAccumulodoesn’t use hashing to assign key-value pairs to servers, we need:We need to store the mapping of TabletServer-to-Tablet . This mapping is stored in another Tablet in Accumulo called the Metadata Table. A client need only scan a portion of the Metdata table to find which TabletServers have the Tablets it wants. (binary search through the metadata hierarchy (NEED input, is this correct) The Metadata table’sTabletServer-to-Tablet assignments must also be stored somewhere. These are written to the first Tablet of the Metadata table, called the Root Tablet. However, you may notice that the Root Tablet itself is stored in Accumulo! Somewhat of a circular dependency. That’s what we use Zookeeper for: The location of the Root Tablet is always known to ZooKeeper.
  3. ApachAccumulo runs on top of Hadoop and ZooKeeperIt relies on HDFS for storage of data and Zookeeper for:- storage of config data (location of metadata Root Tablet) - locking of tabletsZookeeper uses Quorum consistency algorithms for High Availability to prevent Single Point of FailureZookeeper itself is actually a K/V datastore, but holds very little dataAlthough we’ll be running ZK on the same machines as Accumulo and Hadoop, it’s recommended that the Quorum of servers be separate.