SlideShare a Scribd company logo
1 of 43
Download to read offline
Highly Relevant Search Result
Ranking for Law Enforcement

       Ronald Mayer, Forensic Logic, Inc
    ramayer@forensiclogic.com, 2011-05-26




      Police car photo by davidsonscott15 (Scott Davidson) on Flickr under (CC BY 2.0) license
What I Will Cover
 Highly Relevant Search Result Ranking for Large Law
  Enforcement Information Sharing Systems
 Who I am – Ron Mayer, CTO at Forensic Logic.
 The challenge / problem
  • Ranking law enforcement documents has interesting
    challenges.
 3 interesting challenges:
  • Many factors affect relevance for a law-enforcement user
  • A mix of structured, unstructured, semi-structured data
  • Improving edismax sub-phrase boosting
 Conclusion
  • Solr's flexibility & community are both great.


                                                               2
My Background
 Ron Mayer
 CTO of Forensic Logic, Inc
  • We power crime analysis and cross-agency search tools for the
    LEAP (law enforcement analysis portal) project.
  • About 150 State, Local, and Federal law enforcement agencies use
    our SAAS software to analyze and share data
 My background
  • 8 years of delivering software technologies to law enforcement as
    SAAS solutions.
  • Use some F/OSS, quite a bit of proprietary.
  • Play well with F/OSS projects
      (contributed back code to PostgreSQL, PostGIS, a memcached client, and earlier
       contributions from school that found their way into various projects)




                                                                                        3
The Challenge
 Problem I set out to solve
  • We had a good but complex database-based crime analysis package
    for investigators with good computer skills.
  • Needed an easy “google-like” interface that any officer could use.
 Considerations
  • Most officers don't want to sit around on desks filling out search
    forms.
  • Want something like Google – type a guess, and get the most
    relevant documents on the first page.
 Key hurdles or obstacles to success you had to overcome.
  • What factors even define “the most relevant” document.
  • Extremely Disparate data (some almost totally structured; some
    totally unstructured; most a mix)
  • How do we implement ranking.



                                                                         4
Project background
Project background
 Started 8 years ago with a desktop Crime Analysis
  Application; ported to web application




 Big structured search forms worked well for crime
  analysts and detectives who can invest time at a desk
 Some users wanted quicker/easier simple search
Project background
 Prototyped with Project Blacklight
  • Wonderful F/OSS community
  • Just added to their facet list in a config file.
  • Constructuve feedback from customers in couple weeks.
Project background
 Eventually rewrote with many law-enforcement-
  centric features.
Search Relevance for Law
   Enforcement Users
Search Relevance for Law
         Enforcement Users
 Searches often contain multiple clauses
  • 'red baseball cap black leather jacket tall male
    suspect short asian victim'
  • These search clauses are often noun clauses with a
    few adjectives preceding a noun; but are often
    independent from each other.
 Fuzzy searches are common
  • Victims give incomplete descriptions
  • Suspects lie
  • Close counts.
Search Relevance for Law
              Enforcement Users
 Geospatial factors
  • Officers are often interested in things near their own city or beat
      Solr does this one well for 1 location of interest in a document:
           – bf=... recip(dist(2,primary_latlon,vector(#{lat},#{lon})),1,1,1)^0.5
      I haven't yet found a great solution for documents with many locations of interest (say,
       a document regarding a gang importing drugs from Ciudad Juárez Mexico to Denver,
       which should be highly relevant to every city touching the southern half of I25.
  • Often law enforcement officers want to search for documents near a
    certain type of landmark
        “near any elementary school in the school district”
        “near a particular school”
        “in a predominantly Hispanic neighborhood”
        “near a freeway”
  • Sometimes more convenient to interact with a map and use Solr's
    geospatial features. Sometimes more convenient to tag the
    documents with the relevant phrases.
Search Relevance for Law
           Enforcement Users
 Advanced geospatial searches
  • Not having a lot of luck with Solr/Lucene here yet
  • Often intersecting
    polygons.
      Just off a I5
      Walking distance
       from a Jr High
       School
  • We do it in a
    more complex
    app w/ Postgis.
      Would love to be
       able to click a school
       or road on a map,
       and use that to filter
       or sort Solr results
Search Relevance for Law
              Enforcement
 Temporal factors
  • Absolute time: Recent documents are often more interesting than
    very old documents.
      Solr handles this well with
         – Dismax's bf=”recip(ms(NOW,primary_date),3.16e-11,1,1)^2 ...”
         – Edismax's boost=recip(ms(NOW,primary_date),3.16e-11,1,1)&boost=
         – (unless you have expressions that can hit 0, edismax's multiplicative boost seem easier to
           balance against other boosting factors)

  • Relative time: Gang retaliations often happen near each other in
    time.
      Can replace “NOW” in the above with some other date of interest.
  • Time of day: Certain robbers and burglars like to work at certain
    times of the day (payday after work; dusk; at Raider's games).
      Can handle as a range facet, and/or by tagging documents with phrases for text
       search
Search Relevance for Law
                 Enforcement
 Some parts of a document are more important than other parts
  • A search for “John Doe” should rank documents where he's the Arrestee (or subject, etc)
    over those where he's an innocent bystander (or witness or victim, etc).
  • Handled nicely by Solr's Dismax and edismax
    “qf=important_text^2 less_important_text”
    feature
 Important parts of a document can depend a lot on the content of a document itself.
  • For a sexual assault, characteristics of a victim like the victim's age and gender can be
    very "important", while the make/model of her car will be unimportant. For a vehicle
    theft, the age and gender of the victim will be more unimportant while make/model of the
    car will be more important.
  • Handled reasonably by having logic in the indexer to place some data into different text
    fields; and by having the app server tweak the boosts in the qf= expression as needed
Search Relevance for Law
            Enforcement
 Some documents are more important than others.
  • An active warrant on a person is more important
    than an inactive one.
  • An unsolved homicide is more important than a
    complaint about noise that was decided to be
    unfounded.
  • A document with complete descriptions is more
    important (well, or at least more actionable) than a
    very incomplete form that was abandoned
 Handled with the dismax: bf=sqrt(importance)
  parameter and similar edismax boost= paramters
Search Relevance for Law
              Enforcement
 Exact matches with text from the source document is weighted
  more than speculative guesses from our algorithms.
  • We tag documents with additional terms that weren't necessarily in
    the source document.
      Some of this is done by Solr
         – Stemming
         – Synonyms
      Some approximations and guesses are done by our indexers
         – 6'4” -> 'tall'
         – “lat = 37.799, lon = -122.161” -> “Near Skyline High School”
         – 8:00pm → 'dusk'( at certain times of the year); 'night' (at others)

  • But these additional tags carry less weight in ranking than the
    source document.
 Handled well by solr's
  • “qf=source_document^10 stemmed_text^1 speculative_guesses^0.1”
Search Relevance for Law
            Enforcement
 Keyword density matters
  • The Lucene SweetSpotSimilarity feature seems to
    be give nicer results than the old default.
  • We're experimenting with our own that may work
    better with our mixed-structured-unstructured
    content.
Disparate data
Disparate data from many source



                            City



                           County




Law Enforcement
Mixed structured/semi-
    structured/un-structured data


                               City



                              County




Courts
Law Enforcement
Mixed structured/semi-
     structured/un-structured data


                                City



                               County



                               Federal
 Jails
Courts
Law Enforcement
Aren't there standards to deal
             with that?
 XML, etc?
Aren't there standards to deal
               with that?
 Or course! And the best part is there are many to
  choose from :)
 Many federal efforts
   • GJXDM (“Global Justice XML Data Model”) 1.0, 2.0, 3.0.3 (2005)
   • NIEM (outgrowth of GJXDM + DHS(FBI) + ODNI)
       NIEM 1.0 (2006) NIEM2.0 (2007) 2.1 (2009)
   • LEXS – extends subsets of NIEM
   • EDXL (DHS, EIC) “Emergency Data Exchange Language”
       Not really designed for law enforcement, but with data relevant
        to police, and less US-centric in person names and addresses.
 And many States define their own XML standards. (which are often
  Extensions to NIEM Subsets like the Texas Path to NIEM)
Aren't there standards to deal
              with that?
 But many of our data        Small cities who's record
  sources aren't that          management system is a folder
                               of word documents.
  ready to adopt federal
                              Old mainframe computers where
  standards.
                               every developer has retired
                              Even when agencies using
                               standardized XML, the most
                               interesting content's not in the
                               structured part.
“The first suspect is described as a tall, heavyset, light
skinned black male, possibly half Italian, with 2 inch knots or
dreads in his hair with a light brown mustache. He was in
possession of a small caliber handgun.”
Aren't there standards to deal
              with that?
 But many of our data
  sources aren't that
  ready to adopt federal
  standards.
 And some never will.
Mix of structured/semi-
              structured/un-structured data
 Typical data we get  Typical searches from our
<SomeXMLContainer>
                        users
<?xml version="1.0" encoding="UTF-8"?>

 [... hundreds more lines...]
 <Incident>
   <nc:ActivityDate>
     <nc:DateTime>2007-01-01T10:00:00</nc:DateTime>
   </nc:ActivityDate>
 </Incident>
  [... hundreds more lines...]
                                                                                                  • 'tall red haired blue eyed
                                                                                                    teen male with dragon
 <tx:SubjectPerson s:id="Subject_id">
   <nc:PersonBirthDate>
     <nc:Date>1970-01-01</nc:Date>
   </nc:PersonBirthDate>


                                                                                                    tattoo'
   <nc:PersonEthnicityCode>N</nc:PersonEthnicityCode>
   <nc:PersonEyeColorCode>BLU</nc:PersonEyeColorCode>
   <nc:PersonHeightMeasure>
     <nc:MeasurePointValue>604</nc:MeasurePointValue>
   </nc:PersonHeightMeasure>
   <nc:PersonName>
     <nc:PersonGivenName>Jonathan</nc:PersonGivenName>
     <nc:PersonMiddleName>William</nc:PersonMiddleName>
     <nc:PersonSurName>Doe</nc:PersonSurName>
                                                                                                  • '”Johnnie Doe” dallas'
     <nc:PersonNameSuffixText>III</nc:PersonNameSuffixText>
   </nc:PersonName>
   <nc:PersonPhysicalFeature>
     <nc:PhysicalFeatureDescriptionText>Green Dragon Tattoo</nc:PhysicalFeatureDescriptionText>
                                                                                                  • 'Burglar broke rear
     <nc:PhysicalFeatureLocationText>Arm</nc:PhysicalFeatureLocationText>
   </nc:PersonPhysicalFeature>
   <nc:PersonRaceCode>W</nc:PersonRaceCode>
   <nc:PersonSexCode>M</nc:PersonSexCode>
   <nc:PersonSkinToneCode>RUD</nc:PersonSkinToneCode>
                                                                                                    bedroom window, stole
   <nc:PersonHairColorCode>RED</nc:PersonHairColorCode>
   <nc:PersonWeightMeasure>
     <nc:MeasurePointValue>150</nc:MeasurePointValue>
   </nc:PersonWeightMeasure>
                                                                                                    jewelry'
   [... dozens more lines of xml about the person ...]
 </tx:SubjectPerson>
 [... hundreds more lines of xml...]
 <tx:Location s:id="Subjects_Home_id">
   <nc:LocationAddress>
     <nc:AddressFullText>1 Main St</nc:AddressFullText>
     <nc:StructuredAddress>
       <nc:LocationCityName>Dallas</nc:LocationCityName>
       <nc:LocationStateName>Texas</nc:LocationStateName>
       <nc:LocationCountryName>USA</nc:LocationCountryName>
       <nc:LocationPostalCode>54321</nc:LocationPostalCode>
   <...
De-structuring structured data
 Typical data we get  Typical searches done by
<?xml version="1.0" encoding="UTF-8"?>


                                                                                                    users
<SomeXMLContainer>
 [... hundreds more lines...]
 <Incident>
   <nc:ActivityDate>
     <nc:DateTime>2007-01-01T10:00:00</nc:DateTime>


                                                                                                     • 'tall blue eyed teen male with
   </nc:ActivityDate>
 </Incident>
  [... hundreds more lines...]
 <tx:SubjectPerson s:id="Subject_id">


                                                                                                       dragon tattoo'
   <nc:PersonBirthDate>
     <nc:Date>1990-01-01</nc:Date>
   </nc:PersonBirthDate>
   <nc:PersonEthnicityCode>N</nc:PersonEthnicityCode>


                                                                                                     • '”Johnnie Doe” “red hair”
   <nc:PersonEyeColorCode>BLU</nc:PersonEyeColorCode>
   <nc:PersonHeightMeasure>
     <nc:MeasurePointValue>604</nc:MeasurePointValue>
   </nc:PersonHeightMeasure>


                                                                                                       dallas'
   <nc:PersonName>
     <nc:PersonGivenName>Jonathan</nc:PersonGivenName>
     <nc:PersonMiddleName>William</nc:PersonMiddleName>
     <nc:PersonSurName>Doe</nc:PersonSurName>
     <nc:PersonNameSuffixText>III</nc:PersonNameSuffixText>
   </nc:PersonName>
   <nc:PersonPhysicalFeature>
     <nc:PhysicalFeatureDescriptionText>Green Dragon Tattoo</nc:PhysicalFeatureDescriptionText>
     <nc:PhysicalFeatureLocationText>Arm</nc:PhysicalFeatureLocationText>
                                                                                                   One nice trick for solr:
   </nc:PersonPhysicalFeature>
   <nc:PersonRaceCode>W</nc:PersonRaceCode>
   <nc:PersonSexCode>M</nc:PersonSexCode>
   <nc:PersonSkinToneCode>RUD</nc:PersonSkinToneCode>
                                                                                                     • Convert XML to English.
   <nc:PersonHairColorCode>RED</nc:PersonHairColorCode>
   <nc:PersonWeightMeasure>
     <nc:MeasurePointValue>150</nc:MeasurePointValue>
   </nc:PersonWeightMeasure>
                                                                                                         Jonathan Doe, a tall (6'4”) red
                                                                                                          haired blue eyed teen (17 year
   [... dozens more lines of xml about the person ...]
 </tx:SubjectPerson>
 [... hundreds more lines of xml...]


                                                                                                          old) white male of Dallas TX was
 <tx:Location s:id="Subjects_Home_id">
   <nc:LocationAddress>
     <nc:AddressFullText>1 Main St</nc:AddressFullText>
     <nc:StructuredAddress>
       <nc:LocationCityName>Dallas</nc:LocationCityName>
       <nc:LocationStateName>Texas</nc:LocationStateName>                                                 arrested at 1 Main St on Jan 1.
       <nc:LocationCountryName>USA</nc:LocationCountryName>
       <nc:LocationPostalCode>54321</nc:LocationPostalCode>
     </nc:StructuredAddress>                                                                              Possible nicknames, johnny,
   </nc:LocationAddress>
 ...                                                                                                      william, bill, billy ...”
De-structuring structured data
 Typical searches done by users
  • 'tall blue eyed teen male with dragon tattoo'
  • '”Johnnie Doe” “red hair” Dallas'
 Solution:
  • Convert XML to English.
      “Jonathan Doe, a tall (6'4”) red haired blue eyed teen (17 year old)
       white male of Dallas TX was arrested at 1 Main St at 0456 Jan 1,
       1999 (1999-01-01 04:56.) Possible nicknames, johnny, william, bill,
       billy ...”
  • A little more subtle than that
      Terms generated by our speculative algorithms (possible nicknames,
       'tall', etc) are put in a separate lower-weighted text field that the users
       can exclude when doing “exact match” searches.
De-structuring structured data
 We've developed a pretty nice NIEM(*) to Human-
  friendly English Text tool that enables users
  uncomfortable with databases to search their
  agency's structured data much as they would
  google something.
 Side benefit – easier to fit one text field on a
  mobile phone than search forms with many dozen
  fields.


  * NIEM is a large government XML standard often used for law enforcement information exchange. Much of our data is sent to us in this
  format or closely related ones; and for other data sources we map it to NIEM as as early part of our import pipeline.
De-structuring structured data
 Another example – Vehicle VIN numbers
  • Translate
     “1N19G9J100001”
  • To
       “The VIN number suggests the vehicle a 1979 4-
    door Chevrolet (Chevy) Caprice”
    in one of our speculative-content fields.
  • (but only if the document didn't already have this
    information)
De-structuring structured data
 Another example – GPS coordinates
  • Translate
       “37.799,-122.161”
  • To
        “Near Skyline HighSchool”
    in one of our speculative-content fields.
De-structuring structured data
 And (coming soon)
  also translate
     “37.799,-122.161”
 To “Room number
  XXX in Building YYY at
  Skyline High”.
Improving phrase searches




                            33
Improving phrase searches
 Dismax's “pf” (Phrase Fields) and “ps” (Phrase
  Slop) are very useful.
  • pf = 'the "pf" param can be used to "boost" the
    score of documents in cases where all of the
    terms in the "q" param appear in close proximity'
  • ps = 'Amount of slop on phrase queries built for
    "pf" fields (affects boosting)'




                                                        34
Improving phrase searches
 Dismax's “pf” (Phrase Fields) and “ps” (Phrase Slop)
  are very useful.
  • A high-boost “pf” with 0 “ps” is great for ensuring
    that our very most relevant documents show up on
    the very top in search results.
  • A modest-boost “pf” with a largeish “ps” (paragraph
    sized) is great for ensuring that quite relevant
    documents appear in the first page of results.
 Examples:
  • If an exact phrase matches, it's probably the
    document he's looking for.
  • If a single paragraph contains all the words of a user's
    search, it's probably relevant too.

                                                               35
Improving phrase searches
 Edismax's pf2 and pf3 are even more powerful.
  • A modest “pf2” with a relatively small “ps”
    (about noun-clause sized) is excellent for
    searching for adjective/noun clauses.
 Examples:
  • Document text: “The suspect was a tall thin teen
    male wearing a red baseball cap and black
    leather jacket”
  • Quite relevant for searches for “black jacket”,
    “tall male”, “leather jacket”, etc.



                                                       36
SOLR-2058 – best of both
 So with some experimentation, for our docs:
  • We want a high pf with a very small (0) ps
  • We want a low pf with large ps
  • We want a moderate pf2 with moderate ps
 Solution
  • SOLR-2058
  • ...&pf2=text^10~10&pf=text^100&pf=text~100
  • your constants may change depending how much
    you weigh other boosting factors like document
    age or distance


                                                     37
SOLR-2058 – best of both
This worked pretty well for us when we first implemented:
         "pf"      => "source_doc~1^500 text_stem~1^100 source_doc~50^50 text_stem~20^50",
         "pf3"     => "text_unstem~1^250",
         "pf2"     => "text_stem^50 text_stem~10^10 text_unstem~10^10",
         "ps"      => 1,



Scary Parsed Query:
  [... many dozen lines... ]
DisjunctionMaxQuery((text_stem:"black leather"~1^50.0)~0.01)
DisjunctionMaxQuery((text_stem:"leather jacket"~1^50.0)~0.01)) (
DisjunctionMaxQuery((text_stem:"red basebal"~10^10.0)~0.01)
DisjunctionMaxQuery((text_stem:"basebal cap"~10^10.0)~0.01)
  [... many dozens more lines...]

But it's fast enough in the end:
       org.apache.solr.handler.component.QueryComponent:
               time: 658.0




                                                                                             38
Alternatives that may work even
                better
 This whole project started trying to boost adjectives
  connected to nouns
  • With document text like “Tall white heavyset male
    suspect with eyes that looked blue or gray and red hair
    wearing a black and yellow jacket a hat that looked
    purple and a green dragon tattoo on his right arm using
    a knife with an orange handle”.
  • And a search clause like 'white male, orange knife, black
    jacket' boosting this document appropriately.
 Had an interesting conversation with one of this
  conference's sponsors about looking at the grammar to
  see which color goes with which noun.


                                                                39
Wrap Up
 Law Enforcement has some pretty interesting
  challenges for finding the most relevant
  document.

 Solr's a very nice tool for companies to get
  started with text search and tuning it for domain
  specific needs; thanks to nice projects already
  using it, and a very helpful community.

 Solr's flexibility makes it easy to configure to
  even quite demanding requirements.
                                                      40
Thanks to the Community
 Extremely helpful community!
 Thanks to many in the Lucene community's help!!!
  • Jayendra Patil-2
      Who experienced a similar issue and pointed me to exactly where in the code they applied a similar patch.
  • Yonik Seeley
      Proposed a good syntax for the parameters, and politely critiqued my really ugly first implementation.
  • Chris Hostetter
      Voicing support for the syntax and gave encouraging comments
  • Erik Hatcher
      For Blacklight which introduced us to solr and powered our initial prototypes.
  • Swapnonil Mukherjee, Nick Hall
      Expressing interest in and trying the patches. “Sor-2058 allows for a dramatic increase in search relevance” -
       Nick

  • Andy Jenkins and team at Ejustice
      Another Lucene user we're working with who's giving me great advice how to further improve ranking
  • Lucid Imagination
      Thanks much for your free advice during early sales calls.
      Thanks even more for your free support on mailing lists, IRC, etc.




                                                                                                                        41
Sources
 Resource
  • http://leap.nctcog.org
 Links
  •   https://issues.apache.org/jira/browse/SOLR-2058
  •   https://github.com/ramayer/lucene-
      solr/tree/solr_2058_edismax_pf2_phrase_slop

 White paper




                                                        42
Contact
 Ron Mayer
  • ramayer@forensiclogic.com




                                43

More Related Content

What's hot

Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
OpenSource Connections
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Lucidworks
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
Trey Grainger
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger
 

What's hot (16)

Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Semantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/SolrSemantic & Multilingual Strategies in Lucene/Solr
Semantic & Multilingual Strategies in Lucene/Solr
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
Web search engines
Web search enginesWeb search engines
Web search engines
 
IR
IRIR
IR
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and future
 

Viewers also liked

Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Lucidworks (Archived)
 
Artist Update8 11
Artist Update8 11Artist Update8 11
Artist Update8 11
LaRue
 
2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr
Lucidworks (Archived)
 
Lucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lr
Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Lucidworks (Archived)
 
A haiti
A haitiA haiti
A haiti
tanica
 
ブラウザー勉強会始めました
ブラウザー勉強会始めましたブラウザー勉強会始めました
ブラウザー勉強会始めました
彰 村地
 
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
Lucidworks (Archived)
 

Viewers also liked (20)

Open Source Search Applications
Open Source Search ApplicationsOpen Source Search Applications
Open Source Search Applications
 
Updated: Sources of Funding
Updated:  Sources of FundingUpdated:  Sources of Funding
Updated: Sources of Funding
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Artist Update8 11
Artist Update8 11Artist Update8 11
Artist Update8 11
 
Webテクノロジー@2012
Webテクノロジー@2012Webテクノロジー@2012
Webテクノロジー@2012
 
2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr2010 10-building-global-listening-platform-with-solr
2010 10-building-global-listening-platform-with-solr
 
What’s New in Solr 1.4
What’s New in Solr 1.4What’s New in Solr 1.4
What’s New in Solr 1.4
 
Lucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lr
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
Cmd Training Institute - New Premises
Cmd Training Institute - New PremisesCmd Training Institute - New Premises
Cmd Training Institute - New Premises
 
Van gogh
Van goghVan gogh
Van gogh
 
A haiti
A haitiA haiti
A haiti
 
Real Time Search at Yammer
Real Time Search at YammerReal Time Search at Yammer
Real Time Search at Yammer
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
корея
кореякорея
корея
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
ブラウザー勉強会始めました
ブラウザー勉強会始めましたブラウザー勉強会始めました
ブラウザー勉強会始めました
 
What’s new in apache solr 1.4
What’s new in apache solr 1.4What’s new in apache solr 1.4
What’s new in apache solr 1.4
 
Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...Building specialized industry applications using Solr, and migration from FAS...
Building specialized industry applications using Solr, and migration from FAS...
 
Shining new light on lucene solr performance and monitoring
Shining new light on lucene solr performance and monitoringShining new light on lucene solr performance and monitoring
Shining new light on lucene solr performance and monitoring
 

Similar to Highly Relevant Search Result Ranking for Law Enforcement

Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 
Phrase based Indexing and Information Retrieval
Phrase based Indexing and Information RetrievalPhrase based Indexing and Information Retrieval
Phrase based Indexing and Information Retrieval
Bala Abirami
 
Computer Programming for Lawyers
Computer Programming for LawyersComputer Programming for Lawyers
Computer Programming for Lawyers
Nehal Madhani
 
Vigiles Overview June 2010
Vigiles Overview June 2010Vigiles Overview June 2010
Vigiles Overview June 2010
Graeme McGowan
 

Similar to Highly Relevant Search Result Ranking for Law Enforcement (20)

Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
Social Issues in Computing : Forensics
Social Issues in Computing : ForensicsSocial Issues in Computing : Forensics
Social Issues in Computing : Forensics
 
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
 
How to Manage Open Police Data - Tips for Data QA/QC and Automation
How to Manage Open Police Data - Tips for Data QA/QC and AutomationHow to Manage Open Police Data - Tips for Data QA/QC and Automation
How to Manage Open Police Data - Tips for Data QA/QC and Automation
 
Information Retrieval
Information Retrieval Information Retrieval
Information Retrieval
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...
 
Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...Why private search is important for everone and how you can protect your pers...
Why private search is important for everone and how you can protect your pers...
 
File000162
File000162File000162
File000162
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Starting a search application
Starting a search applicationStarting a search application
Starting a search application
 
Establishing conclusive proof in Forensic Data Analytics
Establishing conclusive proof in Forensic Data AnalyticsEstablishing conclusive proof in Forensic Data Analytics
Establishing conclusive proof in Forensic Data Analytics
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
 
Phrase based Indexing and Information Retrieval
Phrase based Indexing and Information RetrievalPhrase based Indexing and Information Retrieval
Phrase based Indexing and Information Retrieval
 
Computer Programming for Lawyers
Computer Programming for LawyersComputer Programming for Lawyers
Computer Programming for Lawyers
 
POLE Investigations with Neo4j
POLE Investigations with Neo4jPOLE Investigations with Neo4j
POLE Investigations with Neo4j
 
Webinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with SolrWebinar: Simpler Semantic Search with Solr
Webinar: Simpler Semantic Search with Solr
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Vigiles Overview June 2010
Vigiles Overview June 2010Vigiles Overview June 2010
Vigiles Overview June 2010
 
Towards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity ModelTowards a Threat Hunting Automation Maturity Model
Towards a Threat Hunting Automation Maturity Model
 

More from Lucidworks (Archived)

Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Lucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Lucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Lucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
Lucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Lucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Lucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Lucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
Lucidworks (Archived)
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovations
Lucidworks (Archived)
 

More from Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovations
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Highly Relevant Search Result Ranking for Law Enforcement

  • 1. Highly Relevant Search Result Ranking for Law Enforcement Ronald Mayer, Forensic Logic, Inc ramayer@forensiclogic.com, 2011-05-26 Police car photo by davidsonscott15 (Scott Davidson) on Flickr under (CC BY 2.0) license
  • 2. What I Will Cover  Highly Relevant Search Result Ranking for Large Law Enforcement Information Sharing Systems  Who I am – Ron Mayer, CTO at Forensic Logic.  The challenge / problem • Ranking law enforcement documents has interesting challenges.  3 interesting challenges: • Many factors affect relevance for a law-enforcement user • A mix of structured, unstructured, semi-structured data • Improving edismax sub-phrase boosting  Conclusion • Solr's flexibility & community are both great. 2
  • 3. My Background  Ron Mayer  CTO of Forensic Logic, Inc • We power crime analysis and cross-agency search tools for the LEAP (law enforcement analysis portal) project. • About 150 State, Local, and Federal law enforcement agencies use our SAAS software to analyze and share data  My background • 8 years of delivering software technologies to law enforcement as SAAS solutions. • Use some F/OSS, quite a bit of proprietary. • Play well with F/OSS projects  (contributed back code to PostgreSQL, PostGIS, a memcached client, and earlier contributions from school that found their way into various projects) 3
  • 4. The Challenge  Problem I set out to solve • We had a good but complex database-based crime analysis package for investigators with good computer skills. • Needed an easy “google-like” interface that any officer could use.  Considerations • Most officers don't want to sit around on desks filling out search forms. • Want something like Google – type a guess, and get the most relevant documents on the first page.  Key hurdles or obstacles to success you had to overcome. • What factors even define “the most relevant” document. • Extremely Disparate data (some almost totally structured; some totally unstructured; most a mix) • How do we implement ranking. 4
  • 6. Project background  Started 8 years ago with a desktop Crime Analysis Application; ported to web application  Big structured search forms worked well for crime analysts and detectives who can invest time at a desk  Some users wanted quicker/easier simple search
  • 7. Project background  Prototyped with Project Blacklight • Wonderful F/OSS community • Just added to their facet list in a config file. • Constructuve feedback from customers in couple weeks.
  • 8. Project background  Eventually rewrote with many law-enforcement- centric features.
  • 9. Search Relevance for Law Enforcement Users
  • 10. Search Relevance for Law Enforcement Users  Searches often contain multiple clauses • 'red baseball cap black leather jacket tall male suspect short asian victim' • These search clauses are often noun clauses with a few adjectives preceding a noun; but are often independent from each other.  Fuzzy searches are common • Victims give incomplete descriptions • Suspects lie • Close counts.
  • 11. Search Relevance for Law Enforcement Users  Geospatial factors • Officers are often interested in things near their own city or beat  Solr does this one well for 1 location of interest in a document: – bf=... recip(dist(2,primary_latlon,vector(#{lat},#{lon})),1,1,1)^0.5  I haven't yet found a great solution for documents with many locations of interest (say, a document regarding a gang importing drugs from Ciudad Juárez Mexico to Denver, which should be highly relevant to every city touching the southern half of I25. • Often law enforcement officers want to search for documents near a certain type of landmark  “near any elementary school in the school district”  “near a particular school”  “in a predominantly Hispanic neighborhood”  “near a freeway” • Sometimes more convenient to interact with a map and use Solr's geospatial features. Sometimes more convenient to tag the documents with the relevant phrases.
  • 12. Search Relevance for Law Enforcement Users  Advanced geospatial searches • Not having a lot of luck with Solr/Lucene here yet • Often intersecting polygons.  Just off a I5  Walking distance from a Jr High School • We do it in a more complex app w/ Postgis.  Would love to be able to click a school or road on a map, and use that to filter or sort Solr results
  • 13. Search Relevance for Law Enforcement  Temporal factors • Absolute time: Recent documents are often more interesting than very old documents.  Solr handles this well with – Dismax's bf=”recip(ms(NOW,primary_date),3.16e-11,1,1)^2 ...” – Edismax's boost=recip(ms(NOW,primary_date),3.16e-11,1,1)&boost= – (unless you have expressions that can hit 0, edismax's multiplicative boost seem easier to balance against other boosting factors) • Relative time: Gang retaliations often happen near each other in time.  Can replace “NOW” in the above with some other date of interest. • Time of day: Certain robbers and burglars like to work at certain times of the day (payday after work; dusk; at Raider's games).  Can handle as a range facet, and/or by tagging documents with phrases for text search
  • 14. Search Relevance for Law Enforcement  Some parts of a document are more important than other parts • A search for “John Doe” should rank documents where he's the Arrestee (or subject, etc) over those where he's an innocent bystander (or witness or victim, etc). • Handled nicely by Solr's Dismax and edismax “qf=important_text^2 less_important_text” feature  Important parts of a document can depend a lot on the content of a document itself. • For a sexual assault, characteristics of a victim like the victim's age and gender can be very "important", while the make/model of her car will be unimportant. For a vehicle theft, the age and gender of the victim will be more unimportant while make/model of the car will be more important. • Handled reasonably by having logic in the indexer to place some data into different text fields; and by having the app server tweak the boosts in the qf= expression as needed
  • 15. Search Relevance for Law Enforcement  Some documents are more important than others. • An active warrant on a person is more important than an inactive one. • An unsolved homicide is more important than a complaint about noise that was decided to be unfounded. • A document with complete descriptions is more important (well, or at least more actionable) than a very incomplete form that was abandoned  Handled with the dismax: bf=sqrt(importance) parameter and similar edismax boost= paramters
  • 16. Search Relevance for Law Enforcement  Exact matches with text from the source document is weighted more than speculative guesses from our algorithms. • We tag documents with additional terms that weren't necessarily in the source document.  Some of this is done by Solr – Stemming – Synonyms  Some approximations and guesses are done by our indexers – 6'4” -> 'tall' – “lat = 37.799, lon = -122.161” -> “Near Skyline High School” – 8:00pm → 'dusk'( at certain times of the year); 'night' (at others) • But these additional tags carry less weight in ranking than the source document.  Handled well by solr's • “qf=source_document^10 stemmed_text^1 speculative_guesses^0.1”
  • 17. Search Relevance for Law Enforcement  Keyword density matters • The Lucene SweetSpotSimilarity feature seems to be give nicer results than the old default. • We're experimenting with our own that may work better with our mixed-structured-unstructured content.
  • 19. Disparate data from many source City County Law Enforcement
  • 20. Mixed structured/semi- structured/un-structured data City County Courts Law Enforcement
  • 21. Mixed structured/semi- structured/un-structured data City County Federal Jails Courts Law Enforcement
  • 22. Aren't there standards to deal with that?  XML, etc?
  • 23. Aren't there standards to deal with that?  Or course! And the best part is there are many to choose from :)  Many federal efforts • GJXDM (“Global Justice XML Data Model”) 1.0, 2.0, 3.0.3 (2005) • NIEM (outgrowth of GJXDM + DHS(FBI) + ODNI)  NIEM 1.0 (2006) NIEM2.0 (2007) 2.1 (2009) • LEXS – extends subsets of NIEM • EDXL (DHS, EIC) “Emergency Data Exchange Language”  Not really designed for law enforcement, but with data relevant to police, and less US-centric in person names and addresses.  And many States define their own XML standards. (which are often Extensions to NIEM Subsets like the Texas Path to NIEM)
  • 24. Aren't there standards to deal with that?  But many of our data  Small cities who's record sources aren't that management system is a folder of word documents. ready to adopt federal  Old mainframe computers where standards. every developer has retired  Even when agencies using standardized XML, the most interesting content's not in the structured part. “The first suspect is described as a tall, heavyset, light skinned black male, possibly half Italian, with 2 inch knots or dreads in his hair with a light brown mustache. He was in possession of a small caliber handgun.”
  • 25. Aren't there standards to deal with that?  But many of our data sources aren't that ready to adopt federal standards.  And some never will.
  • 26. Mix of structured/semi- structured/un-structured data  Typical data we get  Typical searches from our <SomeXMLContainer> users <?xml version="1.0" encoding="UTF-8"?> [... hundreds more lines...] <Incident> <nc:ActivityDate> <nc:DateTime>2007-01-01T10:00:00</nc:DateTime> </nc:ActivityDate> </Incident> [... hundreds more lines...] • 'tall red haired blue eyed teen male with dragon <tx:SubjectPerson s:id="Subject_id"> <nc:PersonBirthDate> <nc:Date>1970-01-01</nc:Date> </nc:PersonBirthDate> tattoo' <nc:PersonEthnicityCode>N</nc:PersonEthnicityCode> <nc:PersonEyeColorCode>BLU</nc:PersonEyeColorCode> <nc:PersonHeightMeasure> <nc:MeasurePointValue>604</nc:MeasurePointValue> </nc:PersonHeightMeasure> <nc:PersonName> <nc:PersonGivenName>Jonathan</nc:PersonGivenName> <nc:PersonMiddleName>William</nc:PersonMiddleName> <nc:PersonSurName>Doe</nc:PersonSurName> • '”Johnnie Doe” dallas' <nc:PersonNameSuffixText>III</nc:PersonNameSuffixText> </nc:PersonName> <nc:PersonPhysicalFeature> <nc:PhysicalFeatureDescriptionText>Green Dragon Tattoo</nc:PhysicalFeatureDescriptionText> • 'Burglar broke rear <nc:PhysicalFeatureLocationText>Arm</nc:PhysicalFeatureLocationText> </nc:PersonPhysicalFeature> <nc:PersonRaceCode>W</nc:PersonRaceCode> <nc:PersonSexCode>M</nc:PersonSexCode> <nc:PersonSkinToneCode>RUD</nc:PersonSkinToneCode> bedroom window, stole <nc:PersonHairColorCode>RED</nc:PersonHairColorCode> <nc:PersonWeightMeasure> <nc:MeasurePointValue>150</nc:MeasurePointValue> </nc:PersonWeightMeasure> jewelry' [... dozens more lines of xml about the person ...] </tx:SubjectPerson> [... hundreds more lines of xml...] <tx:Location s:id="Subjects_Home_id"> <nc:LocationAddress> <nc:AddressFullText>1 Main St</nc:AddressFullText> <nc:StructuredAddress> <nc:LocationCityName>Dallas</nc:LocationCityName> <nc:LocationStateName>Texas</nc:LocationStateName> <nc:LocationCountryName>USA</nc:LocationCountryName> <nc:LocationPostalCode>54321</nc:LocationPostalCode> <...
  • 27. De-structuring structured data  Typical data we get  Typical searches done by <?xml version="1.0" encoding="UTF-8"?> users <SomeXMLContainer> [... hundreds more lines...] <Incident> <nc:ActivityDate> <nc:DateTime>2007-01-01T10:00:00</nc:DateTime> • 'tall blue eyed teen male with </nc:ActivityDate> </Incident> [... hundreds more lines...] <tx:SubjectPerson s:id="Subject_id"> dragon tattoo' <nc:PersonBirthDate> <nc:Date>1990-01-01</nc:Date> </nc:PersonBirthDate> <nc:PersonEthnicityCode>N</nc:PersonEthnicityCode> • '”Johnnie Doe” “red hair” <nc:PersonEyeColorCode>BLU</nc:PersonEyeColorCode> <nc:PersonHeightMeasure> <nc:MeasurePointValue>604</nc:MeasurePointValue> </nc:PersonHeightMeasure> dallas' <nc:PersonName> <nc:PersonGivenName>Jonathan</nc:PersonGivenName> <nc:PersonMiddleName>William</nc:PersonMiddleName> <nc:PersonSurName>Doe</nc:PersonSurName> <nc:PersonNameSuffixText>III</nc:PersonNameSuffixText> </nc:PersonName> <nc:PersonPhysicalFeature> <nc:PhysicalFeatureDescriptionText>Green Dragon Tattoo</nc:PhysicalFeatureDescriptionText> <nc:PhysicalFeatureLocationText>Arm</nc:PhysicalFeatureLocationText>  One nice trick for solr: </nc:PersonPhysicalFeature> <nc:PersonRaceCode>W</nc:PersonRaceCode> <nc:PersonSexCode>M</nc:PersonSexCode> <nc:PersonSkinToneCode>RUD</nc:PersonSkinToneCode> • Convert XML to English. <nc:PersonHairColorCode>RED</nc:PersonHairColorCode> <nc:PersonWeightMeasure> <nc:MeasurePointValue>150</nc:MeasurePointValue> </nc:PersonWeightMeasure>  Jonathan Doe, a tall (6'4”) red haired blue eyed teen (17 year [... dozens more lines of xml about the person ...] </tx:SubjectPerson> [... hundreds more lines of xml...] old) white male of Dallas TX was <tx:Location s:id="Subjects_Home_id"> <nc:LocationAddress> <nc:AddressFullText>1 Main St</nc:AddressFullText> <nc:StructuredAddress> <nc:LocationCityName>Dallas</nc:LocationCityName> <nc:LocationStateName>Texas</nc:LocationStateName> arrested at 1 Main St on Jan 1. <nc:LocationCountryName>USA</nc:LocationCountryName> <nc:LocationPostalCode>54321</nc:LocationPostalCode> </nc:StructuredAddress> Possible nicknames, johnny, </nc:LocationAddress> ... william, bill, billy ...”
  • 28. De-structuring structured data  Typical searches done by users • 'tall blue eyed teen male with dragon tattoo' • '”Johnnie Doe” “red hair” Dallas'  Solution: • Convert XML to English.  “Jonathan Doe, a tall (6'4”) red haired blue eyed teen (17 year old) white male of Dallas TX was arrested at 1 Main St at 0456 Jan 1, 1999 (1999-01-01 04:56.) Possible nicknames, johnny, william, bill, billy ...” • A little more subtle than that  Terms generated by our speculative algorithms (possible nicknames, 'tall', etc) are put in a separate lower-weighted text field that the users can exclude when doing “exact match” searches.
  • 29. De-structuring structured data  We've developed a pretty nice NIEM(*) to Human- friendly English Text tool that enables users uncomfortable with databases to search their agency's structured data much as they would google something.  Side benefit – easier to fit one text field on a mobile phone than search forms with many dozen fields. * NIEM is a large government XML standard often used for law enforcement information exchange. Much of our data is sent to us in this format or closely related ones; and for other data sources we map it to NIEM as as early part of our import pipeline.
  • 30. De-structuring structured data  Another example – Vehicle VIN numbers • Translate “1N19G9J100001” • To “The VIN number suggests the vehicle a 1979 4- door Chevrolet (Chevy) Caprice” in one of our speculative-content fields. • (but only if the document didn't already have this information)
  • 31. De-structuring structured data  Another example – GPS coordinates • Translate “37.799,-122.161” • To “Near Skyline HighSchool” in one of our speculative-content fields.
  • 32. De-structuring structured data  And (coming soon) also translate “37.799,-122.161”  To “Room number XXX in Building YYY at Skyline High”.
  • 34. Improving phrase searches  Dismax's “pf” (Phrase Fields) and “ps” (Phrase Slop) are very useful. • pf = 'the "pf" param can be used to "boost" the score of documents in cases where all of the terms in the "q" param appear in close proximity' • ps = 'Amount of slop on phrase queries built for "pf" fields (affects boosting)' 34
  • 35. Improving phrase searches  Dismax's “pf” (Phrase Fields) and “ps” (Phrase Slop) are very useful. • A high-boost “pf” with 0 “ps” is great for ensuring that our very most relevant documents show up on the very top in search results. • A modest-boost “pf” with a largeish “ps” (paragraph sized) is great for ensuring that quite relevant documents appear in the first page of results.  Examples: • If an exact phrase matches, it's probably the document he's looking for. • If a single paragraph contains all the words of a user's search, it's probably relevant too. 35
  • 36. Improving phrase searches  Edismax's pf2 and pf3 are even more powerful. • A modest “pf2” with a relatively small “ps” (about noun-clause sized) is excellent for searching for adjective/noun clauses.  Examples: • Document text: “The suspect was a tall thin teen male wearing a red baseball cap and black leather jacket” • Quite relevant for searches for “black jacket”, “tall male”, “leather jacket”, etc. 36
  • 37. SOLR-2058 – best of both  So with some experimentation, for our docs: • We want a high pf with a very small (0) ps • We want a low pf with large ps • We want a moderate pf2 with moderate ps  Solution • SOLR-2058 • ...&pf2=text^10~10&pf=text^100&pf=text~100 • your constants may change depending how much you weigh other boosting factors like document age or distance 37
  • 38. SOLR-2058 – best of both This worked pretty well for us when we first implemented: "pf" => "source_doc~1^500 text_stem~1^100 source_doc~50^50 text_stem~20^50", "pf3" => "text_unstem~1^250", "pf2" => "text_stem^50 text_stem~10^10 text_unstem~10^10", "ps" => 1, Scary Parsed Query: [... many dozen lines... ] DisjunctionMaxQuery((text_stem:"black leather"~1^50.0)~0.01) DisjunctionMaxQuery((text_stem:"leather jacket"~1^50.0)~0.01)) ( DisjunctionMaxQuery((text_stem:"red basebal"~10^10.0)~0.01) DisjunctionMaxQuery((text_stem:"basebal cap"~10^10.0)~0.01) [... many dozens more lines...] But it's fast enough in the end: org.apache.solr.handler.component.QueryComponent: time: 658.0 38
  • 39. Alternatives that may work even better  This whole project started trying to boost adjectives connected to nouns • With document text like “Tall white heavyset male suspect with eyes that looked blue or gray and red hair wearing a black and yellow jacket a hat that looked purple and a green dragon tattoo on his right arm using a knife with an orange handle”. • And a search clause like 'white male, orange knife, black jacket' boosting this document appropriately.  Had an interesting conversation with one of this conference's sponsors about looking at the grammar to see which color goes with which noun. 39
  • 40. Wrap Up  Law Enforcement has some pretty interesting challenges for finding the most relevant document.  Solr's a very nice tool for companies to get started with text search and tuning it for domain specific needs; thanks to nice projects already using it, and a very helpful community.  Solr's flexibility makes it easy to configure to even quite demanding requirements. 40
  • 41. Thanks to the Community  Extremely helpful community!  Thanks to many in the Lucene community's help!!! • Jayendra Patil-2  Who experienced a similar issue and pointed me to exactly where in the code they applied a similar patch. • Yonik Seeley  Proposed a good syntax for the parameters, and politely critiqued my really ugly first implementation. • Chris Hostetter  Voicing support for the syntax and gave encouraging comments • Erik Hatcher  For Blacklight which introduced us to solr and powered our initial prototypes. • Swapnonil Mukherjee, Nick Hall  Expressing interest in and trying the patches. “Sor-2058 allows for a dramatic increase in search relevance” - Nick • Andy Jenkins and team at Ejustice  Another Lucene user we're working with who's giving me great advice how to further improve ranking • Lucid Imagination  Thanks much for your free advice during early sales calls.  Thanks even more for your free support on mailing lists, IRC, etc. 41
  • 42. Sources  Resource • http://leap.nctcog.org  Links • https://issues.apache.org/jira/browse/SOLR-2058 • https://github.com/ramayer/lucene- solr/tree/solr_2058_edismax_pf2_phrase_slop  White paper 42
  • 43. Contact  Ron Mayer • ramayer@forensiclogic.com 43