2. Agenda
Challenges
Why a Platform?
Information Extraction
Need, Impact
Research / Evaluations
Approach / Implementation
Information Retrieval
Need, Impact
Research / Evaluations
Approach / Implementation
3. Challenges
Job Alerts
Over 13 Million searches, 3 times a week
Complex Matching: Multiple Filters, Boosts, Sorts
Resdex
130K active users daily
470K searches daily
Over 220 million resumes and growing.
Job Search
High QPS 112, 760K searches a day
Near Real-time Indexing
Jobs Refreshed 92 times daily
Product Demands
> 99.99% uptime, Stability, Scalability → User Experience
Varied Functional Requirements (Complexity)
NIRM, FN Suggestors, etc.
Turnaround Time
Over 17 applications and growing
About a week to deploy / configure a new one
4. Why a Platform?
Technical Challenges
Code / Bug Duplication, Reusability
Agility
Product Requirements Drive Platform-Wide Features
SOA, Integration, Business Logic Separation
Comprehensive Documentation
Scalability
Development and QA Time/Cost Reduction
Product Challenges
Turnaround Time
Business Logic Implementation = Configuration
Miscellaneous
Maintenance Cost Reduction
Resource Optimization/Integration (...Cloud)
Standardized Reporting / Health Monitoring
5. Information Extraction
Data/Information Acquisition
Structurize Raw Information
Training based Models for Class Inference
Functional Area Detection
Rule based Extraction
Nested Funnels/Filter Layers
Regular Expressions
Feedback Loop
Wisdom of Crowd/Collective Intelligence
SAP/SimCV: Capture User Response for
Recommendations
Continuous Quality Improvement
16. IR: Use Cases/Impact
Error Count the week Before: 91, week After: 1
Availability (Before: 97.71% - 99.44%, After: 99.99%)
Performance
Slow Queries ( 10 secs): < 0.2%
Average Search Time: 0.55 secs
QA Quote
”There is an overall decrease in the page download time for
Resdex Search Results page. Incase the cache is cleared the
page download time has decreased by 34% to 35%, while the
page download time has drastically decreased, more than 73%,
when checked without clearing cache.”
NSE on Resdex FirstNaukri
PM Quote
”Hardly any bugs considering the complexity of project. Search results are also
coming @ speed of thought.”
19. IR: Platform Features
High Availability, Stability, Performance
Caching
Adaptive Caching of Hit Attributes
Caching of Expression Evaluations
Pre-configurable Caching Query Filters
Distributed Search
Search over Sharded Indexes
Auto Failover
Auto Healing
Search/Sort/Group Millions of results
Complex expressions.
Miscellaneous
Status Reports, Performance Analytics
Suggestive Garbage Collection
Preload Indexes into RAM
Ease of Deployment
20. IR: Platform Features
Text Transformations
Tokenization/Transformation/Tagging
Controlled, Combinable Stemming
Plural, Tenses, Noun-Forms, etc. [Relevance ]
Inversion of Stem-roots
Highlighting/Did You Mean/Query Expansion
Phonetic Token Mapping/Augmentation
Custom Word Mapping/Synonyms (iMatch)
Linguistic Tagging
PoS, Entity Extraction
Match/Boost on Tags
Sentence Detection
Apply different analytics to different fields
Context Sensitive Spelling Correction
21. IR: Platform Features
Indexing
Dynamic Rule Based Sharding, Distributed Search
Multiple Data Source Type Support
(Near-)Real Time Indexing, Search
Generic Auxillary Index Format
Fast Updation/Retrieval
Realtime Per-User Filtering/Sorting
Matching/Filtering
Lucene Query Functionality
Phrase, Proximity, Fuzzy, Wildcard
FirstNaukri Suggestor Implementation
22. IR: Platform Features
Result Grouping/Clustering
Expressions
Embedded JavaScript Support
Aggregate Functions (superset of SQL)
Sort/Group/Filter during indexing, search
Sorting
Dynamic/Stateful Sorting, e.g. for Ad Rotation
Quota-Based Resorting
23. IR: Platform Features
Scoring
Fully Controlled, Customizable Relevance Scores
More controllable/testable than Solr/Default Lucene
Scoring
Named Query Parts usable in Expressions
Custom Scorer Variables
Vector Space, Query Boost, LCS, Numwords
Configurability, API
SQL-like client wrapper
Engine-App interactions look like SQL
XML configurability
24. Road Ahead
If you don't know where you are going,
any road will get you there.
- The Cheshire Cat,
Alice in Wonderland.