SlideShare a Scribd company logo
1 of 22
Use Cases For Cassandra in
Federal and State
Government
Chris Bradford and Matt Overstreet
Matt Overstreet
● Software Architect
● Search relevancy engineer
● Has worked on systems ranging
from Tractor Trailer weigh stations
to celebrity websites
● Likes Cassandra
GitHub: omnifroodle
● DataStax Cassandra Architect
● Contributor to CQLEngine -
Python C* ORM
● Developed Trireme -
a C* migration engine
● Created the world’s smallest C*
cluster
Chris Bradford
Twitter: @bradfordcp
GitHub: bradfordcp
Who we are
● Consulting firm based in Charlottesville
Virginia
● Founded in 2005
● 30 consultants delivering projects
● Focused on Search in 2010, specifically Solr
and Lucene
● Delivering Cassandra Consulting since 2012
● Datastax Gold partner
● Great with Search, Analytics and Discovery
Blog & Publications
● Blog: http://o19s.com/blog/
● Twitter: @o19s
● Books
o Relevant Search
(Manning)
o Building a Search
Server with
Elasticsearch (Packt)
o Apache Solr
Enterprise Search
Server (Packt)
How we got here
OpenSource Connections started with a deep
expertise in full text search.
As the size and velocity of the data we interact
with grew, so did our toolset for storing,
presenting and processing that data.
OSC Toolkit
Some Use Cases
- Analytics Workloads
- Welfare Fraud Detection
- Intrusion Detection
- Distributed Data Warehousing
- Data Warehouse/Sink
- Replication & Recovery
Analytics Workloads
Look for patterns of user error, fraud and abuse
in forms submitted to an agency.
Requires the ability to compare submissions to
look for similar identifiers such like name, street
address, etc
Welfare Fraud Detection
● Massive amounts of data
● Hard to compare and find patterns
● Difficult to incorporate human analysis
Welfare Fraud Detection
● Ingest data into the system or work on data
in place
● Fraud Score Generation
o Automated rules
o Manually
● Employees can now focus on reviewing the
flagged records
Intrusion Detection
● Stream log data in to C* from applications
● Surface metrics through a security
dashboard
● Perform analysis on records looking for
anomalies (Optional) CREATE TABLE ids (
window TIMESTAMP,
route VARCHAR,
status_code VARCHAR,
request_id TIMEUUID,
PRIMARY KEY ((window, route,
Intrusion Detection
Distributed Data Warehouse
● Cassandra is designed in a peer
to peer architecture. There are no
“masters” or “slaves”.
● True distributed load, write anywhere, read
anywhere.
● Built-in replication between data centers.
Simple Distributed Applications
Data Warehouse
● Cassandra is used to house case data from
disparate systems
● Data is then pushed into a full text search
index
● Cases may now be searched through an
intuitive web interface
Operations
● Widely compatible with programming
languages used in enterprise development
● OpsCenter monitoring tool
● Cassandra scales predictably
● Fault-tolerant
Use Case Review
● Analytics Workloads
○ Welfare Fraud Detection
○ Intrusion Detection
● Distributed Data Warehousing
○ Data Warehouse/Sink
○ Replication & Recovery
Q & A

More Related Content

What's hot

Introducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing FrameworkIntroducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing Frameworklucenerevolution
 
Search Analytics with ELK (Elastic Stack)
Search Analytics with ELK (Elastic Stack)Search Analytics with ELK (Elastic Stack)
Search Analytics with ELK (Elastic Stack)MC+A
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data ScienceLucidworks
 
This Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineGrant Ingersoll
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
So we all have ORCID integrations, now what?
So we all have ORCID integrations, now what?So we all have ORCID integrations, now what?
So we all have ORCID integrations, now what?Bram Luyten
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaAvinash Ramineni
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stackVikrant Chauhan
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureObjectRocket
 
Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Grant Ingersoll
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...Edureka!
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseRobert Lujo
 
Webinar: Rapid Solr Development with Fusion
Webinar: Rapid Solr Development with FusionWebinar: Rapid Solr Development with Fusion
Webinar: Rapid Solr Development with FusionLucidworks
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012Amazon Web Services
 
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Caserta
 

What's hot (20)

Introducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing FrameworkIntroducing Hydra – An Open Source Document Processing Framework
Introducing Hydra – An Open Source Document Processing Framework
 
Intro to Search
Intro to SearchIntro to Search
Intro to Search
 
Search Analytics with ELK (Elastic Stack)
Search Analytics with ELK (Elastic Stack)Search Analytics with ELK (Elastic Stack)
Search Analytics with ELK (Elastic Stack)
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data Science
 
This Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search Engine
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
So we all have ORCID integrations, now what?
So we all have ORCID integrations, now what?So we all have ORCID integrations, now what?
So we all have ORCID integrations, now what?
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
 
Log analysis with the elk stack
Log analysis with the elk stackLog analysis with the elk stack
Log analysis with the elk stack
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4
 
Elastic Stack Roadmap
Elastic Stack RoadmapElastic Stack Roadmap
Elastic Stack Roadmap
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 
Webinar: Rapid Solr Development with Fusion
Webinar: Rapid Solr Development with FusionWebinar: Rapid Solr Development with Fusion
Webinar: Rapid Solr Development with Fusion
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
 
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
Big Data Warehousing Meetup: Developing a super-charged NoSQL data mart using...
 

Viewers also liked

Lucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesLucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesSylvain Wallez
 
Lessons Learned with Spark at the US Patent & Trademark Office
Lessons Learned with Spark at the US Patent & Trademark OfficeLessons Learned with Spark at the US Patent & Trademark Office
Lessons Learned with Spark at the US Patent & Trademark OfficeOpenSource Connections
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
Core Techs Et Lucene
Core Techs Et LuceneCore Techs Et Lucene
Core Techs Et LuceneCore-Techs
 
User Experience Design Considerations for Multi-Museum Collaborations
User Experience Design Considerations for Multi-Museum CollaborationsUser Experience Design Considerations for Multi-Museum Collaborations
User Experience Design Considerations for Multi-Museum CollaborationsDesign for Context
 
Presentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGPresentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGfrancelabs
 

Viewers also liked (6)

Lucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiquesLucene - 10 ans d'usages plus ou moins classiques
Lucene - 10 ans d'usages plus ou moins classiques
 
Lessons Learned with Spark at the US Patent & Trademark Office
Lessons Learned with Spark at the US Patent & Trademark OfficeLessons Learned with Spark at the US Patent & Trademark Office
Lessons Learned with Spark at the US Patent & Trademark Office
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
Core Techs Et Lucene
Core Techs Et LuceneCore Techs Et Lucene
Core Techs Et Lucene
 
User Experience Design Considerations for Multi-Museum Collaborations
User Experience Design Considerations for Multi-Museum CollaborationsUser Experience Design Considerations for Multi-Museum Collaborations
User Experience Design Considerations for Multi-Museum Collaborations
 
Presentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUGPresentation Lucene / Solr / Datafari - Nantes JUG
Presentation Lucene / Solr / Datafari - Nantes JUG
 

Similar to Use cases for cassandra in federal and state government

Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucFraugster
 
Big data at scrapinghub
Big data at scrapinghubBig data at scrapinghub
Big data at scrapinghubDana Brophy
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricCambridge Semantics
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Neo4j
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"Rob Winters
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSDataStax Academy
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...VMware Tanzu
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learnconfluent
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB
 
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerApache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerStratio
 
Key Skills Required for Data Engineering
Key Skills Required for Data EngineeringKey Skills Required for Data Engineering
Key Skills Required for Data EngineeringFibonalabs
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Databricks
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 

Similar to Use cases for cassandra in federal and state government (20)

Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana Goriuc
 
Big data at scrapinghub
Big data at scrapinghubBig data at scrapinghub
Big data at scrapinghub
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
DataStax
DataStaxDataStax
DataStax
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
 
Building data "Py-pelines"
Building data "Py-pelines"Building data "Py-pelines"
Building data "Py-pelines"
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBSCassandra Summit 2014: Apache Cassandra at Telefonica CBS
Cassandra Summit 2014: Apache Cassandra at Telefonica CBS
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learn
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio AlcacerApache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
 
Key Skills Required for Data Engineering
Key Skills Required for Data EngineeringKey Skills Required for Data Engineering
Key Skills Required for Data Engineering
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 

More from OpenSource Connections

How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullOpenSource Connections
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonOpenSource Connections
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...OpenSource Connections
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajOpenSource Connections
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...OpenSource Connections
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlOpenSource Connections
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...OpenSource Connections
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...OpenSource Connections
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...OpenSource Connections
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...OpenSource Connections
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...OpenSource Connections
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...OpenSource Connections
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah ViaOpenSource Connections
 

More from OpenSource Connections (20)

Encores
EncoresEncores
Encores
 
Test driven relevancy
Test driven relevancyTest driven relevancy
Test driven relevancy
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
 
Payloads and OCR with Solr
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with Solr
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
 

Recently uploaded

Yellow is My Favorite Color By Annabelle.pdf
Yellow is My Favorite Color By Annabelle.pdfYellow is My Favorite Color By Annabelle.pdf
Yellow is My Favorite Color By Annabelle.pdfAmir Saranga
 
Panet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILPanet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILChristina Parmionova
 
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...narwatsonia7
 
High-Level Thematic Event on Tourism - SUSTAINABILITY WEEK 2024- United Natio...
High-Level Thematic Event on Tourism - SUSTAINABILITY WEEK 2024- United Natio...High-Level Thematic Event on Tourism - SUSTAINABILITY WEEK 2024- United Natio...
High-Level Thematic Event on Tourism - SUSTAINABILITY WEEK 2024- United Natio...Christina Parmionova
 
WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.Christina Parmionova
 
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdfMonastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdfCharlynTorres1
 
2024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 262024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 26JSchaus & Associates
 
Jewish Efforts to Influence American Immigration Policy in the Years Before t...
Jewish Efforts to Influence American Immigration Policy in the Years Before t...Jewish Efforts to Influence American Immigration Policy in the Years Before t...
Jewish Efforts to Influence American Immigration Policy in the Years Before t...yalehistoricalreview
 
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
2023 Ecological Profile of Ilocos Norte.pdf
2023 Ecological Profile of Ilocos Norte.pdf2023 Ecological Profile of Ilocos Norte.pdf
2023 Ecological Profile of Ilocos Norte.pdfilocosnortegovph
 
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
If there is a Hell on Earth, it is the Lives of Children in Gaza.pdf
If there is a Hell on Earth, it is the Lives of Children in Gaza.pdfIf there is a Hell on Earth, it is the Lives of Children in Gaza.pdf
If there is a Hell on Earth, it is the Lives of Children in Gaza.pdfKatrina Sriranpong
 
Earth Day 2024 - AMC "COMMON GROUND'' movie night.
Earth Day 2024 - AMC "COMMON GROUND'' movie night.Earth Day 2024 - AMC "COMMON GROUND'' movie night.
Earth Day 2024 - AMC "COMMON GROUND'' movie night.Christina Parmionova
 
2024: The FAR, Federal Acquisition Regulations - Part 25
2024: The FAR, Federal Acquisition Regulations - Part 252024: The FAR, Federal Acquisition Regulations - Part 25
2024: The FAR, Federal Acquisition Regulations - Part 25JSchaus & Associates
 
Call Girls Near Surya International Hotel New Delhi 9873777170
Call Girls Near Surya International Hotel New Delhi 9873777170Call Girls Near Surya International Hotel New Delhi 9873777170
Call Girls Near Surya International Hotel New Delhi 9873777170Sonam Pathan
 
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...narwatsonia7
 
In credit? Assessing where Universal Credit’s long rollout has left the benef...
In credit? Assessing where Universal Credit’s long rollout has left the benef...In credit? Assessing where Universal Credit’s long rollout has left the benef...
In credit? Assessing where Universal Credit’s long rollout has left the benef...ResolutionFoundation
 
Club of Rome: Eco-nomics for an Ecological Civilization
Club of Rome: Eco-nomics for an Ecological CivilizationClub of Rome: Eco-nomics for an Ecological Civilization
Club of Rome: Eco-nomics for an Ecological CivilizationEnergy for One World
 

Recently uploaded (20)

Yellow is My Favorite Color By Annabelle.pdf
Yellow is My Favorite Color By Annabelle.pdfYellow is My Favorite Color By Annabelle.pdf
Yellow is My Favorite Color By Annabelle.pdf
 
Panet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILPanet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRIL
 
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in moti bagh DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...
Premium Call Girls Btm Layout - 7001305949 Escorts Service with Real Photos a...
 
High-Level Thematic Event on Tourism - SUSTAINABILITY WEEK 2024- United Natio...
High-Level Thematic Event on Tourism - SUSTAINABILITY WEEK 2024- United Natio...High-Level Thematic Event on Tourism - SUSTAINABILITY WEEK 2024- United Natio...
High-Level Thematic Event on Tourism - SUSTAINABILITY WEEK 2024- United Natio...
 
WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.
 
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdfMonastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
 
2024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 262024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 26
 
Jewish Efforts to Influence American Immigration Policy in the Years Before t...
Jewish Efforts to Influence American Immigration Policy in the Years Before t...Jewish Efforts to Influence American Immigration Policy in the Years Before t...
Jewish Efforts to Influence American Immigration Policy in the Years Before t...
 
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
2023 Ecological Profile of Ilocos Norte.pdf
2023 Ecological Profile of Ilocos Norte.pdf2023 Ecological Profile of Ilocos Norte.pdf
2023 Ecological Profile of Ilocos Norte.pdf
 
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Narela DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
If there is a Hell on Earth, it is the Lives of Children in Gaza.pdf
If there is a Hell on Earth, it is the Lives of Children in Gaza.pdfIf there is a Hell on Earth, it is the Lives of Children in Gaza.pdf
If there is a Hell on Earth, it is the Lives of Children in Gaza.pdf
 
Earth Day 2024 - AMC "COMMON GROUND'' movie night.
Earth Day 2024 - AMC "COMMON GROUND'' movie night.Earth Day 2024 - AMC "COMMON GROUND'' movie night.
Earth Day 2024 - AMC "COMMON GROUND'' movie night.
 
2024: The FAR, Federal Acquisition Regulations - Part 25
2024: The FAR, Federal Acquisition Regulations - Part 252024: The FAR, Federal Acquisition Regulations - Part 25
2024: The FAR, Federal Acquisition Regulations - Part 25
 
Call Girls Near Surya International Hotel New Delhi 9873777170
Call Girls Near Surya International Hotel New Delhi 9873777170Call Girls Near Surya International Hotel New Delhi 9873777170
Call Girls Near Surya International Hotel New Delhi 9873777170
 
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...
 
In credit? Assessing where Universal Credit’s long rollout has left the benef...
In credit? Assessing where Universal Credit’s long rollout has left the benef...In credit? Assessing where Universal Credit’s long rollout has left the benef...
In credit? Assessing where Universal Credit’s long rollout has left the benef...
 
Club of Rome: Eco-nomics for an Ecological Civilization
Club of Rome: Eco-nomics for an Ecological CivilizationClub of Rome: Eco-nomics for an Ecological Civilization
Club of Rome: Eco-nomics for an Ecological Civilization
 

Use cases for cassandra in federal and state government

  • 1. Use Cases For Cassandra in Federal and State Government Chris Bradford and Matt Overstreet
  • 2. Matt Overstreet ● Software Architect ● Search relevancy engineer ● Has worked on systems ranging from Tractor Trailer weigh stations to celebrity websites ● Likes Cassandra GitHub: omnifroodle
  • 3. ● DataStax Cassandra Architect ● Contributor to CQLEngine - Python C* ORM ● Developed Trireme - a C* migration engine ● Created the world’s smallest C* cluster Chris Bradford Twitter: @bradfordcp GitHub: bradfordcp
  • 4. Who we are ● Consulting firm based in Charlottesville Virginia ● Founded in 2005 ● 30 consultants delivering projects ● Focused on Search in 2010, specifically Solr and Lucene ● Delivering Cassandra Consulting since 2012 ● Datastax Gold partner ● Great with Search, Analytics and Discovery
  • 5. Blog & Publications ● Blog: http://o19s.com/blog/ ● Twitter: @o19s ● Books o Relevant Search (Manning) o Building a Search Server with Elasticsearch (Packt) o Apache Solr Enterprise Search Server (Packt)
  • 6. How we got here OpenSource Connections started with a deep expertise in full text search. As the size and velocity of the data we interact with grew, so did our toolset for storing, presenting and processing that data.
  • 8. Some Use Cases - Analytics Workloads - Welfare Fraud Detection - Intrusion Detection - Distributed Data Warehousing - Data Warehouse/Sink - Replication & Recovery
  • 9. Analytics Workloads Look for patterns of user error, fraud and abuse in forms submitted to an agency. Requires the ability to compare submissions to look for similar identifiers such like name, street address, etc
  • 10. Welfare Fraud Detection ● Massive amounts of data ● Hard to compare and find patterns ● Difficult to incorporate human analysis
  • 11. Welfare Fraud Detection ● Ingest data into the system or work on data in place ● Fraud Score Generation o Automated rules o Manually ● Employees can now focus on reviewing the flagged records
  • 12.
  • 13. Intrusion Detection ● Stream log data in to C* from applications ● Surface metrics through a security dashboard ● Perform analysis on records looking for anomalies (Optional) CREATE TABLE ids ( window TIMESTAMP, route VARCHAR, status_code VARCHAR, request_id TIMEUUID, PRIMARY KEY ((window, route,
  • 15. Distributed Data Warehouse ● Cassandra is designed in a peer to peer architecture. There are no “masters” or “slaves”. ● True distributed load, write anywhere, read anywhere. ● Built-in replication between data centers.
  • 17.
  • 18. Data Warehouse ● Cassandra is used to house case data from disparate systems ● Data is then pushed into a full text search index ● Cases may now be searched through an intuitive web interface
  • 19.
  • 20. Operations ● Widely compatible with programming languages used in enterprise development ● OpsCenter monitoring tool ● Cassandra scales predictably ● Fault-tolerant
  • 21. Use Case Review ● Analytics Workloads ○ Welfare Fraud Detection ○ Intrusion Detection ● Distributed Data Warehousing ○ Data Warehouse/Sink ○ Replication & Recovery
  • 22. Q & A

Editor's Notes

  1. Matt - We are based in Charlottesville Virginia. (and big fans of the amtrak line to DC) We’ve always been interested in search, (one of our founders wrote the book on it - see next slide). In 2010 we really made search our focus and have been adding related technologies to really help deliver on full text search. In 2012 we also started delivering Cassandra consulting, and we are currently a Datastax Gold Partner.
  2. Relevant search will be out soon, great book about the art of tuning search results. Building a search server with ElasticSearch -> is a great video introduction to both the Angular javascript framework and ElasticSearch. Apache Solr Enterprise is the definitive guide for planning, building and maintaining Apache Solr
  3. OpenSource connections started with a deep expertise in full text search. As the size and velocity of the data we interact with grew, so did our toolset for storing and processing that data. The size of the documents we needed to search over grew, as did the demands for better pre-processing of those documents. As we were storing and searching increasing millions of documents we needed a better place to store and process them. Apache Cassandra has been a great tool for that purpose, particularly with Datastax Enterprise. DSE brings along Apache Spark and Apache Solr, both of which we’ll talk about a bit here.
  4. Here is an idea of the breadth of knowledge we have in the “Search, Analytics and Discovery” stack. This includes multiple search systems (Elasticsearch, Solr), Big Data stores (Cassandra, Spark), and frontend systems (Angular, Ember)
  5. We’ll cover a few cases where Cassandra has been a great solution. Loosely we can break the examples down into two categories, Analytics Workloads like Fraud Detection Intrusion Detection and Distributed Data Warehousing
  6. Why is Cassandra a good choice for analytics workloads? Great for time series data, which is often the core of analytics data. Cassandra is incredibly fast at writing data, which is often an issue with analytics data. Cassandra has no single point of failure, which means analytics data isn’t dropped. It scales linearly. Also, Datastax has create an Apache Spark connector. Apache Spark is data processing engine. It is capable of running on a cluster of machines, and smartely scheduling work accross them. It also supports processing “streaming” data, which is great when dealing with analytics data.
  7. Data may be ingested in batches or streamed in as data is acquired Automated rules may be run during ingestion or periodic batch jobs Manually flagged entries may be used to tune and generate automated rules Look for patterns in new data including existing data
  8. Velocity and data locality are the big stories here Spark performs some automated rule checks in both streaming and batch configurations Streaming - good for small window based checks Batch - ideal for larger jobs against the bigger dataset Machine learning may be used to develop new classifications and groups of records
  9. Why Cassandra for Intrusion Detection: Blazing fast write speed. No single point of failure. How it works: data is streamed to Cassandra into a wide row based timeslice/route/status_code data can then be monitored by timeslice to look for spikes Warning, make sure someone attends the datamodeling talk before trying this at home, you’ll need to understand how cassandra stores and access data to get the most out of this approach
  10. --- Repeat from last slide ---- Why Cassandra for Intrusion Detection: Blazing fast write speed. No single point of failure. How it works: data is streamed to Cassandra into a wide row based timeslice/route/status_code data can then be monitored by timeslice to look for spikes Warning, make sure someone attends the datamodeling talk before trying this at home, you’ll need to understand how cassandra stores and access data to get the most out of this approach
  11. Why Cassandra for this: Data replication, both locally in the data center on between data centers “tunable” consistency Cassandra is highly available as soon as you have two nodes. Data is automatically copied between nodes. Other solutions require special configuration for multi-master configurations or are only available as a commercial product. Cassandra gives you true multi-master out of the box.
  12. Netflix Example: They set up a Cassandra Cluster with nodes in Oregon and Northern Virginia. Load was simulated to a production level. To test the speed of replication they wrote 1 million records in one region. 500ms later they read all records from the data center in VA.
  13. Within the scope of a datacenter application developers interact with the cluster as though it’s a local data store. Should the local cluster go down the driver automatically routes requests to another datacenter if available.
  14. 225 YEARS of data spanning tens of millions documents Each document has over 250 fields Note that columns without data do not consume storage space Compare this to dealing with distributed Master-Slave in MS SQL or other
  15. Source documents are coming from various systems with information about part of the claim. In this case there were 10 different types of source documents including metadata about the cases.
  16. Drivers in C# .Net C++ Java Node PHP Ruby