SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Friday, August 10, 12
Friday, August 10, 12
Friday, August 10, 12
Friday, August 10, 12
Friday, August 10, 12
Flexible schema
                        Easily to scale, increased redundancy
                        Fast enough for web requests
                        Consolidate existing services
                        Hadoop support



Friday, August 10, 12
Friday, August 10, 12
Friday, August 10, 12
FUD
                        No Ad-hoc queries

                        No Indexes

                        No range queries

                        Limited tooling

                        Code complexity

Friday, August 10, 12
Friday, August 10, 12
Friday, August 10, 12
REST

                        CQL

                        Thrift




Friday, August 10, 12
SOLR Schema
                    <?xml version="1.0" encoding="UTF-8" ?>
                 <schema name="my_column_family" version="1.0">

                        <types>
                          <fieldType name="string" class="solr.StrField"/>
                          <fieldType name="date" class="solr.DateField"/>
                        </types>

                        <fields>
                          <field name="id" type="string" indexed="true" stored="true"/>
                          <field name="name" type="string" indexed="true" stored="true"/>
                          <field name="released_at" type="date" indexed="true" stored="true"/>
                        </fields>

                   <uniqueKey>id</uniqueKey>
                   <defaultSearchField>name</defaultSearchField>
                 </schema>




Friday, August 10, 12
Basic Queries

      http://localhost:8983/solr/my_keyspace.my_column_family/select?q=name:foo



               SELECT * FROM my_column_family WHERE solr_query='name:foo';




Friday, August 10, 12
Wide Rows
                <?xml version="1.0" encoding="UTF-8" ?>
             <schema name="my_column_family" version="1.0">

                 <types>
                   <fieldType name="string" class="solr.StrField"/>
                   <fieldType name="date" class="solr.DateField"/>
                 </types>

                 <fields>
                   <field name="id" type="string" indexed="true" stored="true"/>
                   <field name="name" type="string" indexed="true" stored="true"/>
                   <field name="released_at" type="date" indexed="true" stored="true"/>
                   <dynamicField name="wide_*" type="string" indexed="true" stored="true"/>
                 </fields>

               <uniqueKey>id</uniqueKey>
               <defaultSearchField>name</defaultSearchField>
             </schema>




Friday, August 10, 12
Fuzzy Search
     <schema name="my_column_family" version="1.0">
    <types>
      <fieldType name="string" class="solr.StrField"/>
      <fieldType name="ngram" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
                              generateNumberParts="1" catenateWords="1" catenateNumbers="1"
                              catenateAll="1" preserveOriginal="1"/>
          <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
        </analyzer>
        <analyzer type="query">
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        </analyzer>
      </fieldType>
    </types>
    <fields>
      <field name="id" type="string" indexed="true" stored="true" />
      <field name="name" type="string" indexed="true" stored="true" />
      <field name="name_fuzzy" type="ngram" indexed="true" stored="true" />
    </fields>
    <copyField source="name" dest="name_fuzzy"/>
    <uniqueKey>id</uniqueKey>
    <defaultSearchField>name</defaultSearchField>
  </schema>


Friday, August 10, 12
• Full-text indexing
                        • Trigrams
                        • Rich data formats (PDF, Word, HTML)
                        • Easy interop (REST,CSV, XML, JSON)
                        • Geo-spatial search
                        • Highlighting
                        • Auto-suggest
                        • Faceted search and filtering

Friday, August 10, 12
Friday, August 10, 12
Storm




Friday, August 10, 12
Storm




Friday, August 10, 12
Increased performance
                   by 700% while growing
                        data by 500%



Friday, August 10, 12
Reduced operational
                           costs by 40%



Friday, August 10, 12
Deleted 15,000 lines of code




Friday, August 10, 12
Friday, August 10, 12

Weitere ähnliche Inhalte

Ähnlich wie Cassandra summit

Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010
Rob Windsor
 
Solr integration in Magento Enterprise
Solr integration in Magento EnterpriseSolr integration in Magento Enterprise
Solr integration in Magento Enterprise
Tobias Zander
 

Ähnlich wie Cassandra summit (20)

Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
Tutorial, Part 3: SharePoint 101: Jump-Starting the Developer by Rob Windsor ...
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Solr Anti Patterns
Solr Anti PatternsSolr Anti Patterns
Solr Anti Patterns
 
Solr Anti - patterns
Solr Anti - patternsSolr Anti - patterns
Solr Anti - patterns
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Open Source Search: An Analysis
Open Source Search: An AnalysisOpen Source Search: An Analysis
Open Source Search: An Analysis
 
Linked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI MplsLinked Data Presentation at TDWI Mpls
Linked Data Presentation at TDWI Mpls
 
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UNSolr vs. Elasticsearch,  Case by Case: Presented by Alexandre Rafalovitch, UN
Solr vs. Elasticsearch, Case by Case: Presented by Alexandre Rafalovitch, UN
 
Solr features
Solr featuresSolr features
Solr features
 
PostgreSQL's Secret NoSQL Superpowers
PostgreSQL's Secret NoSQL SuperpowersPostgreSQL's Secret NoSQL Superpowers
PostgreSQL's Secret NoSQL Superpowers
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
DataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax: Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
 
Tagattr is it
Tagattr is itTagattr is it
Tagattr is it
 
Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010Data Access Options in SharePoint 2010
Data Access Options in SharePoint 2010
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Eric Redmond – Distributed Search on Riak 2.0 - NoSQL matters Barcelona 2014
Eric Redmond – Distributed Search on Riak 2.0 - NoSQL matters Barcelona 2014Eric Redmond – Distributed Search on Riak 2.0 - NoSQL matters Barcelona 2014
Eric Redmond – Distributed Search on Riak 2.0 - NoSQL matters Barcelona 2014
 
A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)A noobs lesson on solr (configuration)
A noobs lesson on solr (configuration)
 
Solr integration in Magento Enterprise
Solr integration in Magento EnterpriseSolr integration in Magento Enterprise
Solr integration in Magento Enterprise
 
GreenDao Introduction
GreenDao IntroductionGreenDao Introduction
GreenDao Introduction
 
Cassandra 2.1 boot camp, Protocol, Queries, CQL
Cassandra 2.1 boot camp, Protocol, Queries, CQLCassandra 2.1 boot camp, Protocol, Queries, CQL
Cassandra 2.1 boot camp, Protocol, Queries, CQL
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Cassandra summit

  • 6. Flexible schema Easily to scale, increased redundancy Fast enough for web requests Consolidate existing services Hadoop support Friday, August 10, 12
  • 9. FUD No Ad-hoc queries No Indexes No range queries Limited tooling Code complexity Friday, August 10, 12
  • 12. REST CQL Thrift Friday, August 10, 12
  • 13. SOLR Schema <?xml version="1.0" encoding="UTF-8" ?> <schema name="my_column_family" version="1.0"> <types> <fieldType name="string" class="solr.StrField"/> <fieldType name="date" class="solr.DateField"/> </types> <fields> <field name="id" type="string" indexed="true" stored="true"/> <field name="name" type="string" indexed="true" stored="true"/> <field name="released_at" type="date" indexed="true" stored="true"/> </fields> <uniqueKey>id</uniqueKey> <defaultSearchField>name</defaultSearchField> </schema> Friday, August 10, 12
  • 14. Basic Queries http://localhost:8983/solr/my_keyspace.my_column_family/select?q=name:foo SELECT * FROM my_column_family WHERE solr_query='name:foo'; Friday, August 10, 12
  • 15. Wide Rows <?xml version="1.0" encoding="UTF-8" ?> <schema name="my_column_family" version="1.0"> <types> <fieldType name="string" class="solr.StrField"/> <fieldType name="date" class="solr.DateField"/> </types> <fields> <field name="id" type="string" indexed="true" stored="true"/> <field name="name" type="string" indexed="true" stored="true"/> <field name="released_at" type="date" indexed="true" stored="true"/> <dynamicField name="wide_*" type="string" indexed="true" stored="true"/> </fields> <uniqueKey>id</uniqueKey> <defaultSearchField>name</defaultSearchField> </schema> Friday, August 10, 12
  • 16. Fuzzy Search <schema name="my_column_family" version="1.0"> <types> <fieldType name="string" class="solr.StrField"/> <fieldType name="ngram" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> </fieldType> </types> <fields> <field name="id" type="string" indexed="true" stored="true" /> <field name="name" type="string" indexed="true" stored="true" /> <field name="name_fuzzy" type="ngram" indexed="true" stored="true" /> </fields> <copyField source="name" dest="name_fuzzy"/> <uniqueKey>id</uniqueKey> <defaultSearchField>name</defaultSearchField> </schema> Friday, August 10, 12
  • 17. • Full-text indexing • Trigrams • Rich data formats (PDF, Word, HTML) • Easy interop (REST,CSV, XML, JSON) • Geo-spatial search • Highlighting • Auto-suggest • Faceted search and filtering Friday, August 10, 12
  • 21. Increased performance by 700% while growing data by 500% Friday, August 10, 12
  • 22. Reduced operational costs by 40% Friday, August 10, 12
  • 23. Deleted 15,000 lines of code Friday, August 10, 12