Understand the Importance of Search Based Applications in today’s enterprise and how to integrate Business Intelligence and Search for business benefit.
Role of Microsoft FAST Search in an enterprise for building Search based Business
IntelligenBusiness Intelligence Application.
Demonstration of a FAST search based BI applications.
CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING FAST SEARCH FOR SHAREPOINT
1. CREATE SEARCH DRIVEN BUSINESS
INTELLIGENCE APPLICATION USING
FAST SEARCH FOR SHAREPOINT
Pankaj Bose
Niraj Tenany
2. • Session Overview
• Presenters Bio
• Introduction to Netwoven
• Industry Facts (Business Intelligence and Search)
• Business Intelligence Challenges
• Benefits of integrating Business Intelligence and Search
• Search Market
• FAST Search features and functions
• Steps To Build Search Centric Application
• Demo
• Wrap up and Next Steps
AGENDA
3. SESSION OVERVIEW
• Understand the Importance of Search Based
Applications in today’s enterprise and how to
integrate Business Intelligence and Search
for business benefit
• Role of Microsoft FAST Search in an
enterprise for building Search based Business
IntelligenBusiness Intelligence Application
• Demonstration of a FAST search based BI
applications
4. NIRAJ TENANY – PRESIDENT AND
CEO, NETWOVEN, INC.
• Based in USA
• Formerly Microsoft Consulting Services Head of
Enterprise Applications Practice
• Frequent speaker in Enterprise Content
Management and Search events
• Works with Fortune 1000 companies to define
and implement ECM, BI, and Search strategies
5. PANKAJ BOSE – ECM AND SEARCH
PRACTICE HEAD, NETWOVEN, INC.
• Based in India
• Architect and implementer of large scale
Enterprise Content Management and Search
based applications for large and medium sized
companies
• Formerly, Architect at Lockheed Martin Corp in
USA as Technical Lead for ECM and Search
implementations
• Extensive experience with different ECM and
Search platforms
6. NETWOVEN BACKGROUND
Founded in 2001 by former Microsoft executives
Top talent from industry
Firm leadership comprised of Microsoft, Accenture, Oracle and Intel talent
Former senior executive of Wipro, Infosys, McKinsey on our board
US headquartered company with development center in India
Save the Children
7. Industry Verticals
Life Sciences
Financial
Services
Energy Manufacturing Not For Profit Software
Netwoven Technology Services
Enterprise Content Management Business IntelligenceBusiness Process Management
Netwoven Solution Practices
NETWOVEN SERVICES
8. Solution Area Description
Out-Tasking Your SharePoint 2010 SharePoint managed services with L1, L2 and L3 support
Upgrading to SharePoint 2010 Upgrade intranet, extranet or internet sites
Social Networking with SharePoint 2010 Build communities with SharePoint
Document Management with SharePoint 2010 Develop or Migration document management systems to SharePoint
Business Intelligence with SharePoint 2010 Develop reports, dashboards and map based drill downs with SharePoint
Portal and Collaboration with SharePoint 2010 Developing intranet and collaboration sites using SharePoint
Web Content Management with SharePoint 2010 Develop intranet and extranet sites using SharePoint
Enterprise Search with SharePoint 2010 Develop Search based Applications using SharePoint 2010
NETWOVEN SHAREPOINT SERVICES
9. • Every 2 days we generate more data than we did
from the dawn of time through 2003
• Worldwide volume of data is growing at 59% per
year
• Between 75% and 85% of data is unstructured
• In 5 years the majority of analytic data will come
from unstructured sources
- Gartner Blog
BUSINESS INTELLIGENCE FACTS
10. • Time spent searching for information
averages 8.8 hours per week for a cost
of $14,209 per worker per year
• Analyzing information soaks up 8.1
hours per week, costing an
organization $13,078 annually
SEARCH FACTS
- IDC
11. BUSINESS INTELLIGENCE CHALLENGES
• With data growing exponentially businesses need
better tools to get information faster
• Complexity of integrating large number of
disparate data sources
• Difficulty in integrating structured and
unstructured data
• End users spend a great deal of time trying to find
information, reinventing the wheel, and not
having the right information to make decisions
12. BENEFITS OF INTEGRATING BUSINESS
INTELLIGENCE AND SEARCH
• Reduce the time lost searching for information
• Simplifies integration of disparate data sources
• Improves integration of structured and
unstructured data there by providing better
insights
• Reduce the time lost reinventing the wheel
• Improve decision making by having the right
information available in a timely manner
13. BENEFITS OF INTEGRATING BUSINESS
INTELLIGENCE AND SEARCH
• Integration of search and other types
of applications creates a new category
of applications called Search Based
Applications
• Integration of BI and search is one
form of search based application
14. BENEFITS OF INTEGRATING BUSINESS
INTELLIGENCE USING SEARCH
• Easy to use interface that end users understand
• Enables the integration and search of any data source
• Search Across Multiple Sources
• Easily integrates structured and unstructured data
sources
• Indexes the sources in Real Time
• Provided Assisted Navigation To Filter the Search
Results there by reducing the time it takes to find
information
• Ability to display results in highly visual and interactive
form
17. WHAT IS A SEARCH BASED
APPLICATION?
• Search-based applications (SBA) are software
applications in which a search engine platform is
used as the core infrastructure for information
access and reporting. SBAs use semantic
technologies to aggregate, normalize and classify
unstructured, semi-structured and/or structured
content across multiple repositories, and employ
natural language technologies for accessing the
aggregated information.
- Wikipedia
19. • Advanced content processing
• Extraction of entities, properties, key phrases
• Content classification
• Sentiment analysis
• Connectors
• Out of the box (from SharePoint interface)
• Out of the box JDBC connectors
• Content API to create custom connectors
• Query and Federation Object Model
• FOM to search repositories by native search process
• FOM to create core results XML and Populates Refiners
• Query object model to execute complex queries using Fast Query Language
COMPONENTS OF FAST SEARCH
20. • Identify your content source (possibly a mix)
• Structured (database fields with traditional field types)
• Non-structured (database fields – text, documents, web pages)
• Configure connectors to crawl content sources
• Use filters to crawl only specific type(s) of content you would like to
crawl
• Review generated crawled properties
• Use SharePoint Central Admin UI or FAST PowerShell cmdlets
• Use SPY processor stage to review contents of crawled properties
• Add additional crawled properties if needed
STEPS TO BUILD A SEARCH CENTRIC
APPLICATION - I
21. • Review and update content processing pipeline
• Extract entities
• Persons / Locations / Companies / Key phrases / Any other
custom entities
• Use entity extraction framework of FAST For SharePoint,
Service Pack 1
• Use Out of The Box or custom dictionaries
• Configure custom property extraction stage
• Create / Update etcconfig_dataDocumentProcessorCustomPropertyExtractors.xml
• Create new crawled properties if needed
• Create managed properties and make them searchable and refinable
STEPS TO BUILD A SEARCH CENTRIC
APPLICATION - II
22. • Review and update content processing pipeline
• Extend pipeline with custom processing stages
• Why?
• Mechanism
• Create an executable that takes some inputs and produce some outputs
• The executable can be any command (exe, java class, scripts etc.)
• Update etcpipelineextensibility.xml to add a RUN section that uses the command.
• Provide a set of crawled properties that act as input.
• Provide a set of crawled properties that get populated with the output.
• Reset the document processor service
o psctrl reset
» Feed a document
» Map crawled and managed properties
» Do a full crawl
STEPS TO BUILD A SEARCH CENTRIC
APPLICATION - III
Classification Geo Search Sentiments
23. • Develop Search Interface
• Refinement panel makes great Dimensions
Refiners sorted by frequency
Indicates importance of a refiner
Exact counts / percentage
Helps in deep analysis of content
Applying refiner filters the result set
Leads to further granular analysis while exposing new dimensions
• Create visual refiners
• Extend the Refinement Panel web part
• Override the GetXPathNavigator method
• Get the refinement XML base.GetXPathNavigator
• Use the XML as data source for Chart controls
STEPS TO BUILD A SEARCH CENTRIC
APPLICATION - IV
24. • Customize Search Result Web parts
• Extend SearchCoreResults web part.
Add additional sources
Override CreateDataSource and ConfigureDataSource properties to create / configure data source
Override GetXPathNavigator for mixing of results from data sources
• Change XSLT to display specific metadata
• Roll-up numbers by result collapsing
• Display previews
• Aggregate Search Results from Federation
• Create a new LocationRunTime class inheriting from ILocationRuntime and
Irefinable
• Execute queries in native format
• Create Core Results XML
• Fill up the refiner
STEPS TO BUILD A SEARCH CENTRIC
APPLICATION - V
25. Overview of the scenario
A US based Hospital chain conducts patient surveys for all of its locations to
Improve patient loyalty
Increase referrals
Evaluate healthcare provider performance
Identify areas of improvements
They target all of its in-house patient for surveys at the time of their discharge. The survey
responses are stored in a database. The hospital typically use SQL Server SSAS and SSRS to
produce BI dashboards and reports. While this works to a great extent there are some
short falls
The reports only considers the specific answers to objective questions like “How did you like the meal?”.
The options being Excellent, Good, Not so good, Horrible. However survey respondents can express their
true sentiments in one of more sentences. As traditional BI cannot make use of non-structured content,
these are left out.
BI reports precisely tell us about WHAT. However many times it stops short of informing us WHY?
The BI reports does not have provisions for answering to flexible user questions like: Cleanliness of
hospital toilets.
Important attributes / entities hidden within the comments text are ignored while they could be crucial
business dimensions.
Hospital management decided to deploy search to extract information as discussed above while retaining BI
capabilities.
USE CASE
27. HOW WE DID IT
• Survey data is available in database
Comments is a text field that is used for key phrase extraction
Other fields used are of regular data types – string, integer, etc.
• For key phrase extraction and normalization used external application
(FAST ESP does have key phrase extraction processor, but FS4SP does not have that yet)
• Using key phrases created a dictionary. The dictionary is used in a custom
property extraction processor
• The processor fills in crawled properties of sentiments during indexing
• Database indexing is done using JDBC connector (BDC also works)
• Generated crawled properties are mapped to managed properties that need to
be searched or used in refiners – such as Overall Experience, Speciality, No of
days in hospital, etc.
28. HOW WE DID IT - II
• Using Federation Object Model
• Visual refiners are created using existing RefinementManager object in the
search page. This can also be done extending RefinementPanel webpart.
• RefinementManager provides refiner XML
• MSCharts control is using refiner XML
• Selected refiners are being used to construct the breadcrumb
• KeywordQueries objects are also being used to collect data points for multiple
timeframes.
• SearchCoreResults webpart XSLT has been updated to display patient comments
• Sentiments are extracted key phrases represented as refiners
29. COMMON SEARCH APPLICATION
CATEGORIES
• Extended search platforms
• • Search engines
• • Question-answering applications
• • Categorization/metadata tagging tools
• • Categorizers and clustering engines
• • Visualization tools for information navigation and
analysis
• • Filtering and alerting tools and text analytics
• • Translation and globalization software
30. CONTACTS
• Niraj Tenany
• President and CEO, Netwoven, Inc.
• ntenany@netwoven.com
• Pankaj Bose
• ECM Practice Head, Netwoven, Inc.
• pbose@netwoven.com
• Rashi Bajaj
• Business Development Manager
• rbajaj@netwoven.com