SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
What’s New in Solr
     3.x/4.0
  Charlottesville Lucene/Solr Meetup
           August 15, 2011

            Erik Hatcher
          Lucid Imagination
What is Solr?
•   Solr is the popular, blazing fast open source
    enterprise search platform from the Apache Lucene
    project. Its major features include powerful full-text
    search, hit highlighting, faceted search, dynamic
    clustering, database integration, rich document (e.g.,
    Word, PDF) handling, and geospatial search. Solr is
    highly scalable, providing distributed search and
    index replication, and it powers the search and
    navigation features of many of the world's largest
    internet sites.
What is Lucene?

• Apache Lucene is a high-performance, full-
  featured text search engine library written
  entirely in Java. It is a technology suitable
  for nearly any application that requires full-
  text search, especially cross-platform.
Solr History
• November 2009: Solr 1.4 (Lucene 2.9.1)
• June 2010: Solr 1.4.1 (Lucene 2.9.3)
• 2011
 • March - Solr 3.1
 • May - Solr 3.2
 • July - Solr 3.3
Solr 3.1
•   Improved geospatial support            •   New autosuggest component


•   Sorting by function queries            •   Distributed support for more
                                               components

•   Range faceting on all numeric fields
                                           •   JSON document indexing and CSV
                                               response format
•   Example Velocity driven search UI at
    http://localhost:8983/solr/browse
                                           •   Apache UIMA integration for metadata
                                               extraction
•   A new termvector-based highlighter


•   Improved spellchecking capabilities
                                           •   Many other Bugfixes, improvements and
                                               optimizations

•   Improved integration with Apache
    Lucene
Major components

• Apache Lucene 3.1.0
• Apache Tika 0.8
• Carrot2 3.4.2
• Velocity 1.6.1 and Velocity Tools 2.0-beta3
• Apache UIMA 2.3.1-SNAPSHOT
Schema / Cong
•   SOLR-1131: FieldTypes can now output multiple
    Fields per Type and still be searched. This can be
    handy for hiding the details of a particular
    implementation such as in the spatial case.

•   SOLR-1379: Add RAMDirectoryFactory for non-
    persistent in memory index storage.

•   SOLR-2059: Add "types" attribute to
    WordDelimiterFilterFactory, which allows you to
    customize how WordDelimiterFilter tokenizes text
    with a conguration le.
Indexing


• SOLR-945: JSON update handler that
  accepts add, delete, commit commands in
  JSON format.
Geospatial
•   SOLR-1302: Added several new distance based functions,
    including Great Circle (haversine), Manhattan, Euclidean
    and String (using the StringDistance methods in the Lucene
    spellchecker). Also added geohash(), deg() and rad()
    convenience functions. See http://wiki.apache.org/solr/
    FunctionQuery

•   SOLR-1568: Added "native" filtering support for PointType,
    GeohashField. Added LatLonType with ltering support
    too. See http://wiki.apache.org/solr/SpatialSearch and the
    example. Refactored some items in Lucene spatial.
    Removed SpatialTileField as the underlying CartesianTier is
    broken beyond repair and is going to be moved.
Query Parsing
•   SOLR-1553: New dismax parser implementation (accessible as "edismax") that supports full
    lucene syntax, improved reserved char escaping, elded queries, improved proximity
    boosting, and improved stopword handling. Note: status is experimental for now.

•   SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField.
    autoGeneratePhraseQueries="true" (the default) causes the query parser to generate
    phrase queries if multiple tokens are generated from a single non-quoted analysis string.
    For example WordDelimiterFilter splitting text:pdp-11 will cause the parser to generate
    text:"pdp 11" rather than (text:PDP OR text:11). Note that
    autoGeneratePhraseQueries="true" tends to not work well for non whitespace delimited
    languages.

•   SOLR-2128: Full parameter substitution for function queries. Example: q=add($v1,$v2)
    &v1=mul(popularity,5)&v2=20.0

•   SOLR-2133: Function query parser can now parse multiple comma separated value sources.
    It also now fails if there is extra unexpected text after parsing the functions, instead of
    silently ignoring it. This allows expressions like q=dist(2,vector(1,2),$pt)&pt=3,4
Functions
• SOLR-1574: Add many new functions from
  java Math (e.g. sin, cos)
• SOLR-1569: Allow functions to take in
  literal strings by modifying the
  FunctionQParser and adding
  LiteralValueSource
• SOLR-1297: Add sort by Function capability
Analysis
•   SOLR-1923: PhoneticFilterFactory now has support for the Caverphone
    algorithm.

•   SOLR-1571: Added unicode collation support though Lucene's
    CollationKeyFilter

•   SOLR-1653: Add PatternReplaceCharFilter

•   SOLR-1677: Add support for choosing the Lucene Version for Lucene
    components within Solr.

•   SOLR-1984: Add HyphenationCompoundWordTokenFilterFactory.

•   SOLR-2188: Added "maxTokenLength" argument to the factories for
    ClassicTokenizer, StandardTokenizer, and UAX29URLEmailTokenizer.

•   ICU integration
Analysis (cont.)
•   SOLR-1857: Synced Solr analysis with            •   SOLR-1740: ShingleFilterFactory supports
    Lucene 3.1. Added                                   the "minShingleSize" and "tokenSeparator"
    KeywordMarkerFilterFactory and                      parameters for controlling the minimum
    StemmerOverrideFilterFactory, which can             shingle size produced by the lter, and the
    be used to tune stemming algorithms.                separator string that it uses, respectively.


•    Added factories for Bulgarian, Czech, Hindi,   •   SOLR-744: ShingleFilterFactory supports
    Turkish, and Wikipedia analysis. Improved           the "outputUnigramsIfNoShingles"
    the performance of                                  parameter, to output unigrams if the
    SnowballPorterFilterFactory.                        number of input tokens is fewer than
                                                        minShingleSize, and no shingles can be
                                                        generated.
•   SOLR-1657: Converted remaining
    TokenStreams to the Attributes-based API.
    All Solr TokenFilters now support custom        •   SOLR-1974: Add
    Attributes, and some have improved                  LimitTokenCountFilterFactory.
    performance: especially
    WordDelimiterFilter and
    CommonGramsFilter.                              •   SOLR-1057: Add
                                                        PathHierarchyTokenizerFactory.
Faceting
•   SOLR-1240: "Range Faceting" has been added. This is a generalization
    of the existing "Date Faceting" logic so that it now supports any all
    stock numeric eld types that support range queries in addition to
    dates. facet.date is now deprecated in favor of this generalized
    mechanism.

•   SOLR-397: Date Faceting now supports a "facet.date.include" param
    for specifying when the upper & lower end points of computed date
    ranges should be included in the range. Legal values are: "all", "lower",
    "upper", "edge", and "outer". For backwards compatibility the default
    value is the set: [lower,upper,edge], so that all ranges between start
    and end are inclusive of their endpoints, but the "before" and "after"
    ranges are not.

•   SOLR-2325: Allow tagging and exclusion of main query for faceting.
SolrJ

• SOLR-1139: Add TermsComponent Query
  and Response Support in SolrJ
• SOLR-1815: SolrJ now preserves the order
  of facet queries.
Solr Components
•   SOLR-1316: Create autosuggest component

•   SOLR-2010: Added ability to verify that spell checking collations have
    actual results in the index.

•   SOLR-2157: Suggester should return alpha-sorted results when
    onlyMorePopular=false

•   SOLR-1625: Add regexp support for TermsComponent

•   SOLR-1556: TermVectorComponent now supports per field overrides.
    Also, it now throws an error if passed in elds do not exist and warnings
    if elds that do not have term vector options (termVectors, offsets,
    positions) that align with the schema declaration.

•   SOLR-860: Add debug output for MoreLikeThis.
Highlighting
•   SOLR-1268: Incorporate FastVectorHighlighter

•   SOLR-2021: Add SolrEncoder plugin to Highlighter.

•   SOLR-2030: Make FastVectorHighlighter use of
    SolrEncoder.

•   SOLR-2053: Add support for custom comparators
    in Solr spellchecker, per LUCENE-2479

•   SOLR-2049: Add hl.multiValuedSeparatorChar for
    FastVectorHighlighter, per LUCENE-2603.
Distributed

• SOLR-785: Distributed Search support for
  SpellCheckComponent
• SOLR-1177: Distributed Search support for
  TermsComponent
Misc.

•   SOLR-1957: The VelocityResponseWriter contrib moved to core. Example search UI now
    available at http://localhost:8983/solr/browse

•   SOLR-1966: QueryElevationComponent can now return just the included results in the
    elevation le

•   SOLR-1925: Add CSVResponseWriter (use wt=csv) that returns the list of documents in
    CSV format.

•   SOLR-2263: Add ability for RawResponseWriter to stream binary files as well as text files.

•   SOLR-1750: SolrInfoMBeanHandler added for simpler programmatic access to info
    currently available from registry.jsp and stats.jsp

•   SOLR-2099: Add ability to throttle rsync based replication using rsync option --bwlimit.
UIMA
•   UIMA - Unstructured Information Management
    Architecture - http://uima.apache.org/

•   Enables UIMA components to augment
    documents

•   Entity extraction, automated categorization,
    language detection, etc

•   "contrib" plugin - SOLR-2129

•   http://wiki.apache.org/solr/SolrUIMA
Optimizations
•   SOLR-1679: Don't build up string messages in SolrCore.execute unless they
    are necessary for the current log level.

•   SOLR-1874: Optimize PatternReplaceFilter for better performance.

•   SOLR-1968: speed up initial filter cache population for facet.method=enum
    and also big terms for multi-valued facet.method=fc. The resulting speedup
    for the rst facet request is anywhere from 30% to 32x, depending on how
    many terms are in the eld and how many documents match per term.

•   SOLR-2089: Speed up UnInvertedField faceting (facet.method=fc for multi-
    valued elds) when facet.limit is both high, and a high enough percentage of
    the number of unique terms in the eld. Extreme cases yield speedups over
    3x.

•   SOLR-2046: add common functions to scripts-util.
Solr 3.2
•   Ability to specify overwrite and commitWithin as request
    parameters when using the JSON update format

•   TermQParserPlugin, useful when generating filter queries from
    terms returned from eld faceting or the terms component.

•   DebugComponent now supports using a NamedList to model
    Explanation objects in it's responses instead of
    Explanation.toString

•   Improvements to the UIMA and Carrot2 integrations

•   Bugfixes and improvements from Apache Lucene 3.2
Other 3.2 goodies

• SOLR-2061: Pull base tests out into a new
  Solr Test Framework module, and publish
  binary, javadoc, and source test-framework
  jars.
• Dependency update: Carrot2 3.5.0
Solr 3.3
•   Grouping / Field Collapsing

•   A new, automaton-based suggest/autocomplete implementation offering
    an order of magnitude smaller RAM consumption.

•   KStemFilterFactory, an optimized implementation of a less aggressive
    stemmer for English.

•   Solr defaults to a new, more efficient merge policy (TieredMergePolicy).
    See http://s.apache.org/merging for more information.

•   Important bugfixes, including extremely high RAM usage in spellchecking.

•   Bugfixes and improvements from Apache Lucene 3.3
Solr 3.3 details
•   SOLR-2378: A new, automaton-based, implementation of suggest (autocomplete)
    component, offering an order of magnitude smaller memory consumption
    compared to ternary trees and jaspell and very fast lookups at runtime.

•   SOLR-2400: Field- and DocumentAnalysisRequestHandler now provide a position
    history for each token, so you can follow the token through all analysis stages. The
    output contains a separate int[] attribute containing all positions from previous
    Tokenizers/TokenFilters (called "positionHistory").

•   SOLR-2524: (SOLR-236, SOLR-237, SOLR-1773, SOLR-1311) Grouping / Field
    collapsing using the Lucene grouping contrib. The search result can be grouped by
    eld and query.

•   SOLR-1331: Added a srcCore parameter to CoreAdminHandler's mergeindexes
    action to merge one or more cores' indexes to a target core.

•   SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action
Solr 4.0


• aka "trunk" at the moment
• major changes! (for the better!) at both
  Lucene and Solr levels
Lucene 4.0
•   The postings APIs have been removed in favor of the
    new flexible indexing (flex) APIs.

•   With flexible indexing it is now possible for an
    application to create its own postings codec, to alter
    how elds, terms, docs and positions are encoded into
    the index.

•   String -> BytesRef

•   Per-segment everything
4.0 details
•   Directory.copy/Directory.copyTo now copies all files (not just
    index les), since what is and isn't and index le is now
    dependent on the codecs used.

•   String to BytesRef

•   FuzzyQuery and WildcardQuery now operate on Unicode
    codepoints, not unicode code units.

•   WildcardQuery and QueryParser now allows escaping with
    the '' character.

•   Similarity can now be configured on a per-field basis
Relevancy


• more flexible scoring
NRT

• per-segment
• IndexWriter#commit now doesn't block
  concurrent indexing while flushing all
  'currently' RAM resident documents to
  disk.
More Lucene 4.0
          features
•   Added RegexpQuery support to QueryParser.

•   Adds AutomatonQuery, a MultiTermQuery that
    matches terms against a nite-state machine.
    Implement WildcardQuery and FuzzyQuery with
    nite-state methods. Adds RegexpQuery.

•   The QueryParser now accepts mixed inclusive and
    exclusivebounds for range queries. Example: "{3 TO
    5]"
Solr 4.0
•   Pivot faceting

•   Direct Solr spell checker

•   Increased response writing flexibility (e.g. function query results)

•   Distributed date/numeric range faceting

•   "join" query parser

•   NRT:You may now specify a 'soft' commit when committing. This
    will use Lucene's NRT feature to avoid guaranteeing documents
    are on stable storage in exchange for faster reopen times. There
    is also a new 'soft' autocommit tracker that can be congured.
About Lucid...

•   Lucid Imagination provides commercial-grade
    support, training, high-level consulting and value-
    added software for Lucene and Solr.

•   We make Lucene ‘enterprise-ready’ by offering:

    •   Free, certified, distributions and downloads.

    •   Support, training, and consulting.

    •   LucidWorks Enterprise, a commercial search
        platform built on top of Solr.

•   http://www.lucidimagination.com
Lucid Offerings
LucidFind

http://www.lucidimagination.com/search/?q=charlottesville

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
 
Apache Solr
Apache SolrApache Solr
Apache SolrMinh Tran
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Yonik Seeley
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferayBinesh Gummadi
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 

Was ist angesagt? (20)

Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Solr 4
Solr 4Solr 4
Solr 4
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
Native Code, Off-Heap Data & JSON Facet API for Solr (Heliosearch)
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 

Ähnlich wie What's New in Solr 3.x / 4.0

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverLucidworks (Archived)
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Mary Jo Sminkey
 
Solr Recipes
Solr RecipesSolr Recipes
Solr RecipesErik Hatcher
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunktdthomassld
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in AlfrescoAngel Borroy LĂłpez
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?SearchStax
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0Anshum Gupta
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Enterprise search in_drupal_pub
Enterprise search in_drupal_pubEnterprise search in_drupal_pub
Enterprise search in_drupal_pubdstuartnz
 
Apachesolr presentation
Apachesolr presentationApachesolr presentation
Apachesolr presentationfreeformkurt
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmaplucenerevolution
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road maplucenerevolution
 

Ähnlich wie What's New in Solr 3.x / 4.0 (20)

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
How do Solr and Azure Search compare?
How do Solr and Azure Search compare?How do Solr and Azure Search compare?
How do Solr and Azure Search compare?
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Enterprise search in_drupal_pub
Enterprise search in_drupal_pubEnterprise search in_drupal_pub
Enterprise search in_drupal_pub
 
Apachesolr presentation
Apachesolr presentationApachesolr presentation
Apachesolr presentation
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
 

Mehr von Erik Hatcher

Solr Payloads
Solr PayloadsSolr Payloads
Solr PayloadsErik Hatcher
 
it's just search
it's just searchit's just search
it's just searchErik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Erik Hatcher
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrErik Hatcher
 

Mehr von Erik Hatcher (13)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
Solr Flair: Search User Interfaces Powered by Apache Solr (ApacheCon US 2009,...
 
Solr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache SolrSolr Flair: Search User Interfaces Powered by Apache Solr
Solr Flair: Search User Interfaces Powered by Apache Solr
 

KĂźrzlich hochgeladen

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

KĂźrzlich hochgeladen (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

What's New in Solr 3.x / 4.0

  • 1. What’s New in Solr 3.x/4.0 Charlottesville Lucene/Solr Meetup August 15, 2011 Erik Hatcher Lucid Imagination
  • 2. What is Solr? • Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
  • 3. What is Lucene? • Apache Lucene is a high-performance, full- featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full- text search, especially cross-platform.
  • 4. Solr History • November 2009: Solr 1.4 (Lucene 2.9.1) • June 2010: Solr 1.4.1 (Lucene 2.9.3) • 2011 • March - Solr 3.1 • May - Solr 3.2 • July - Solr 3.3
  • 5. Solr 3.1 • Improved geospatial support • New autosuggest component • Sorting by function queries • Distributed support for more components • Range faceting on all numeric elds • JSON document indexing and CSV response format • Example Velocity driven search UI at http://localhost:8983/solr/browse • Apache UIMA integration for metadata extraction • A new termvector-based highlighter • Improved spellchecking capabilities • Many other Bugxes, improvements and optimizations • Improved integration with Apache Lucene
  • 6. Major components • Apache Lucene 3.1.0 • Apache Tika 0.8 • Carrot2 3.4.2 • Velocity 1.6.1 and Velocity Tools 2.0-beta3 • Apache UIMA 2.3.1-SNAPSHOT
  • 7. Schema / Cong • SOLR-1131: FieldTypes can now output multiple Fields per Type and still be searched. This can be handy for hiding the details of a particular implementation such as in the spatial case. • SOLR-1379: Add RAMDirectoryFactory for non- persistent in memory index storage. • SOLR-2059: Add "types" attribute to WordDelimiterFilterFactory, which allows you to customize how WordDelimiterFilter tokenizes text with a conguration le.
  • 8. Indexing • SOLR-945: JSON update handler that accepts add, delete, commit commands in JSON format.
  • 9. Geospatial • SOLR-1302: Added several new distance based functions, including Great Circle (haversine), Manhattan, Euclidean and String (using the StringDistance methods in the Lucene spellchecker). Also added geohash(), deg() and rad() convenience functions. See http://wiki.apache.org/solr/ FunctionQuery • SOLR-1568: Added "native" ltering support for PointType, GeohashField. Added LatLonType with ltering support too. See http://wiki.apache.org/solr/SpatialSearch and the example. Refactored some items in Lucene spatial. Removed SpatialTileField as the underlying CartesianTier is broken beyond repair and is going to be moved.
  • 10. Query Parsing • SOLR-1553: New dismax parser implementation (accessible as "edismax") that supports full lucene syntax, improved reserved char escaping, elded queries, improved proximity boosting, and improved stopword handling. Note: status is experimental for now. • SOLR-2015: Add a boolean attribute autoGeneratePhraseQueries to TextField. autoGeneratePhraseQueries="true" (the default) causes the query parser to generate phrase queries if multiple tokens are generated from a single non-quoted analysis string. For example WordDelimiterFilter splitting text:pdp-11 will cause the parser to generate text:"pdp 11" rather than (text:PDP OR text:11). Note that autoGeneratePhraseQueries="true" tends to not work well for non whitespace delimited languages. • SOLR-2128: Full parameter substitution for function queries. Example: q=add($v1,$v2) &v1=mul(popularity,5)&v2=20.0 • SOLR-2133: Function query parser can now parse multiple comma separated value sources. It also now fails if there is extra unexpected text after parsing the functions, instead of silently ignoring it. This allows expressions like q=dist(2,vector(1,2),$pt)&pt=3,4
  • 11. Functions • SOLR-1574: Add many new functions from java Math (e.g. sin, cos) • SOLR-1569: Allow functions to take in literal strings by modifying the FunctionQParser and adding LiteralValueSource • SOLR-1297: Add sort by Function capability
  • 12. Analysis • SOLR-1923: PhoneticFilterFactory now has support for the Caverphone algorithm. • SOLR-1571: Added unicode collation support though Lucene's CollationKeyFilter • SOLR-1653: Add PatternReplaceCharFilter • SOLR-1677: Add support for choosing the Lucene Version for Lucene components within Solr. • SOLR-1984: Add HyphenationCompoundWordTokenFilterFactory. • SOLR-2188: Added "maxTokenLength" argument to the factories for ClassicTokenizer, StandardTokenizer, and UAX29URLEmailTokenizer. • ICU integration
  • 13. Analysis (cont.) • SOLR-1857: Synced Solr analysis with • SOLR-1740: ShingleFilterFactory supports Lucene 3.1. Added the "minShingleSize" and "tokenSeparator" KeywordMarkerFilterFactory and parameters for controlling the minimum StemmerOverrideFilterFactory, which can shingle size produced by the lter, and the be used to tune stemming algorithms. separator string that it uses, respectively. • Added factories for Bulgarian, Czech, Hindi, • SOLR-744: ShingleFilterFactory supports Turkish, and Wikipedia analysis. Improved the "outputUnigramsIfNoShingles" the performance of parameter, to output unigrams if the SnowballPorterFilterFactory. number of input tokens is fewer than minShingleSize, and no shingles can be generated. • SOLR-1657: Converted remaining TokenStreams to the Attributes-based API. All Solr TokenFilters now support custom • SOLR-1974: Add Attributes, and some have improved LimitTokenCountFilterFactory. performance: especially WordDelimiterFilter and CommonGramsFilter. • SOLR-1057: Add PathHierarchyTokenizerFactory.
  • 14. Faceting • SOLR-1240: "Range Faceting" has been added. This is a generalization of the existing "Date Faceting" logic so that it now supports any all stock numeric eld types that support range queries in addition to dates. facet.date is now deprecated in favor of this generalized mechanism. • SOLR-397: Date Faceting now supports a "facet.date.include" param for specifying when the upper & lower end points of computed date ranges should be included in the range. Legal values are: "all", "lower", "upper", "edge", and "outer". For backwards compatibility the default value is the set: [lower,upper,edge], so that all ranges between start and end are inclusive of their endpoints, but the "before" and "after" ranges are not. • SOLR-2325: Allow tagging and exclusion of main query for faceting.
  • 15. SolrJ • SOLR-1139: Add TermsComponent Query and Response Support in SolrJ • SOLR-1815: SolrJ now preserves the order of facet queries.
  • 16. Solr Components • SOLR-1316: Create autosuggest component • SOLR-2010: Added ability to verify that spell checking collations have actual results in the index. • SOLR-2157: Suggester should return alpha-sorted results when onlyMorePopular=false • SOLR-1625: Add regexp support for TermsComponent • SOLR-1556: TermVectorComponent now supports per eld overrides. Also, it now throws an error if passed in elds do not exist and warnings if elds that do not have term vector options (termVectors, offsets, positions) that align with the schema declaration. • SOLR-860: Add debug output for MoreLikeThis.
  • 17. Highlighting • SOLR-1268: Incorporate FastVectorHighlighter • SOLR-2021: Add SolrEncoder plugin to Highlighter. • SOLR-2030: Make FastVectorHighlighter use of SolrEncoder. • SOLR-2053: Add support for custom comparators in Solr spellchecker, per LUCENE-2479 • SOLR-2049: Add hl.multiValuedSeparatorChar for FastVectorHighlighter, per LUCENE-2603.
  • 18. Distributed • SOLR-785: Distributed Search support for SpellCheckComponent • SOLR-1177: Distributed Search support for TermsComponent
  • 19. Misc. • SOLR-1957: The VelocityResponseWriter contrib moved to core. Example search UI now available at http://localhost:8983/solr/browse • SOLR-1966: QueryElevationComponent can now return just the included results in the elevation le • SOLR-1925: Add CSVResponseWriter (use wt=csv) that returns the list of documents in CSV format. • SOLR-2263: Add ability for RawResponseWriter to stream binary les as well as text les. • SOLR-1750: SolrInfoMBeanHandler added for simpler programmatic access to info currently available from registry.jsp and stats.jsp • SOLR-2099: Add ability to throttle rsync based replication using rsync option --bwlimit.
  • 20. UIMA • UIMA - Unstructured Information Management Architecture - http://uima.apache.org/ • Enables UIMA components to augment documents • Entity extraction, automated categorization, language detection, etc • "contrib" plugin - SOLR-2129 • http://wiki.apache.org/solr/SolrUIMA
  • 21. Optimizations • SOLR-1679: Don't build up string messages in SolrCore.execute unless they are necessary for the current log level. • SOLR-1874: Optimize PatternReplaceFilter for better performance. • SOLR-1968: speed up initial lter cache population for facet.method=enum and also big terms for multi-valued facet.method=fc. The resulting speedup for the rst facet request is anywhere from 30% to 32x, depending on how many terms are in the eld and how many documents match per term. • SOLR-2089: Speed up UnInvertedField faceting (facet.method=fc for multi- valued elds) when facet.limit is both high, and a high enough percentage of the number of unique terms in the eld. Extreme cases yield speedups over 3x. • SOLR-2046: add common functions to scripts-util.
  • 22. Solr 3.2 • Ability to specify overwrite and commitWithin as request parameters when using the JSON update format • TermQParserPlugin, useful when generating lter queries from terms returned from eld faceting or the terms component. • DebugComponent now supports using a NamedList to model Explanation objects in it's responses instead of Explanation.toString • Improvements to the UIMA and Carrot2 integrations • Bugxes and improvements from Apache Lucene 3.2
  • 23. Other 3.2 goodies • SOLR-2061: Pull base tests out into a new Solr Test Framework module, and publish binary, javadoc, and source test-framework jars. • Dependency update: Carrot2 3.5.0
  • 24. Solr 3.3 • Grouping / Field Collapsing • A new, automaton-based suggest/autocomplete implementation offering an order of magnitude smaller RAM consumption. • KStemFilterFactory, an optimized implementation of a less aggressive stemmer for English. • Solr defaults to a new, more efcient merge policy (TieredMergePolicy). See http://s.apache.org/merging for more information. • Important bugxes, including extremely high RAM usage in spellchecking. • Bugxes and improvements from Apache Lucene 3.3
  • 25. Solr 3.3 details • SOLR-2378: A new, automaton-based, implementation of suggest (autocomplete) component, offering an order of magnitude smaller memory consumption compared to ternary trees and jaspell and very fast lookups at runtime. • SOLR-2400: Field- and DocumentAnalysisRequestHandler now provide a position history for each token, so you can follow the token through all analysis stages. The output contains a separate int[] attribute containing all positions from previous Tokenizers/TokenFilters (called "positionHistory"). • SOLR-2524: (SOLR-236, SOLR-237, SOLR-1773, SOLR-1311) Grouping / Field collapsing using the Lucene grouping contrib. The search result can be grouped by eld and query. • SOLR-1331: Added a srcCore parameter to CoreAdminHandler's mergeindexes action to merge one or more cores' indexes to a target core. • SOLR-2610 -- Add an option to delete index through CoreAdmin UNLOAD action
  • 26. Solr 4.0 • aka "trunk" at the moment • major changes! (for the better!) at both Lucene and Solr levels
  • 27. Lucene 4.0 • The postings APIs have been removed in favor of the new flexible indexing (flex) APIs. • With flexible indexing it is now possible for an application to create its own postings codec, to alter how elds, terms, docs and positions are encoded into the index. • String -> BytesRef • Per-segment everything
  • 28. 4.0 details • Directory.copy/Directory.copyTo now copies all les (not just index les), since what is and isn't and index le is now dependent on the codecs used. • String to BytesRef • FuzzyQuery and WildcardQuery now operate on Unicode codepoints, not unicode code units. • WildcardQuery and QueryParser now allows escaping with the '' character. • Similarity can now be congured on a per-eld basis
  • 30. NRT • per-segment • IndexWriter#commit now doesn't block concurrent indexing while flushing all 'currently' RAM resident documents to disk.
  • 31. More Lucene 4.0 features • Added RegexpQuery support to QueryParser. • Adds AutomatonQuery, a MultiTermQuery that matches terms against a nite-state machine. Implement WildcardQuery and FuzzyQuery with nite-state methods. Adds RegexpQuery. • The QueryParser now accepts mixed inclusive and exclusivebounds for range queries. Example: "{3 TO 5]"
  • 32. Solr 4.0 • Pivot faceting • Direct Solr spell checker • Increased response writing flexibility (e.g. function query results) • Distributed date/numeric range faceting • "join" query parser • NRT:You may now specify a 'soft' commit when committing. This will use Lucene's NRT feature to avoid guaranteeing documents are on stable storage in exchange for faster reopen times. There is also a new 'soft' autocommit tracker that can be congured.
  • 33. About Lucid... • Lucid Imagination provides commercial-grade support, training, high-level consulting and value- added software for Lucene and Solr. • We make Lucene ‘enterprise-ready’ by offering: • Free, certied, distributions and downloads. • Support, training, and consulting. • LucidWorks Enterprise, a commercial search platform built on top of Solr. • http://www.lucidimagination.com