Unraveling Multimodality with Large Language Models.pdf
Improve Performance in Fast Search for SharePoint - Comperio
1. OSLO STOCKHOLM LONDON BOSTON
Improve Performance in
FAST Search for SharePoint
2010 (FS4SP)
Enda Flynn – Marketing Manager
Job Maelane - Senior Consultant
2. Financials
Financially stabile and secure
Profitable growth when expanding
AAA Rating (D&B)
Employees
50 + specialists and growing
Ability to attract highly qualified employees
Customers
90 +
3 continents
Search Projects
200 + FAST projects completed
Comperio | Search Matters.
7. Hardware – Best Practices
CPU: 2 x 2GHz+ (Quad/six core)
Memory: 24-48 GB
Disk:
2 x 300 GB, SAS, 10K RPM (RAID 1)
CPU: 2 x 2GHz+ (Quad/six core)
Memory: 24-48 GB
Disk alternatives:
1.0 TB: 8 x 300 GB, SAS, 10K RPM (RAID10)
1.8 TB: 8 x 300 GB, SAS, 10K RPM (RAID 5)
3.6 TB: 16 x 300 GB, SAS, 10K RPM (RAID 5+0)
New: 7.2 TB: 16 x 600 GB, SAS, 10K RPM (RAID
5+0)
SAN: Configured for “database performance”
Storage Server
Admin / Processing
Server
8. • Index latency
– How long, on average, a document takes to index
• Documents per second
– How many documents processed and indexed per
second
• Query latency
– How quickly are the results generated from a query
• Queries per second (QPS)
– How many queries are processed per second
How is search performance measured?
9. • Search Administration Reports
• Crawl Rate Per Content Source
• Crawl Rate Per Type
• Crawl Processing Per Activity
• Crawl Processing Per Component
• Crawl Queue
• Query Latency
and more
• Web Analytics Reports
• Total Number of Search Queries
• Top queries
• Failed queries
• Best Bet usage
• Keywords usage
and more
FS4SP- Search Administration Reports
Analyze Ribbon
- More date range options
- Filtering search scope
- Search query text
- Export to Spreadsheet etc.
Web Analytics Web Part
- Display popular items on a site
(such as popular content, popular
search queries, or search results)
10. • No of documents
• Content types
• Deep or shallow refiners
• Entity extraction
• Complex queries (many terms)
• Substring search
• Lemmatisation
• Spell check
• Maximum and average document size
• And many more….
What influences performance?
Lots of things!
11. Resource Consumptions
CPU RAM DISK TRANS DISK SPACE NETW B/W
Content Distributor
Document Processor
Index Dispatcher
Indexer
Search Engine
QR Server
Admin Services
Web Link Analysis
12. FAST Search for SharePoint Scale out
Content
Volume
Query
Volume
Scale-out multiple
“dimensions”
Query Volume
Content Volume
Indexing freshness
Redundancy options
Search
Indexing
Performance targets*
15M Docs/node
25 QPS/node
*Depends on content and hardware specifics
Search and Indexing
Crawling and Content
Processing
Query and Result
Processing
Back-end with extreme and flexible scale out options
13. FS4SP – Medium Deployment
FAST Search for SharePoint 2010 Farm
FAST-ADM-1
Admin
Content Distributor 1
Web Analyzer
12 Docprocs+
FAST-FSTIDX-11
Index (Search)
12 Docprocs+
FAST-FSTIDX-12
Index (Search)
12 Docprocs+
FAST-FSTIDX-21
(Index) Search
QR Server
FAST-FSTIDX-22
(Index) Search
QR Server
FAST-ADM-2
Content Distributor 2
Web Analyzer
12 Docprocs+
(Enterprise Crawler)
FAST-FSTIDX-13
Index (Search)
12 Docprocs+
FAST-FSTIDX-23
(Index) Search
QR Server
SP2010 Farm
SQL 2008 Cluster
WFE
Query SSA
WFE
Query SSA
SP Crawl
People Crawl
SP Crawl
People Crawl
Crawl DB
Search Admin DB
14. FS4SP – Large Deployment
SP2010 Farm
FAST Search for SharePoint 2010 Farm
SQL 2008 Cluster
WFE
Query SSA
WFE
Query SSA
SP Crawl
People Crawl
SP Crawl
People Crawl
Crawl DB
Search Admin DB
SP Crawl
FAST-ADM-1
Admin
ConfigServer
Spelltuner
SamAdmin
Content Distributor 1
Web Analyzer
12 Docprocs+
FAST-FSTIDX-11
Index (Search)
12 Docprocs+
FAST-FSTIDX-12
Index (Search)
12 Docprocs+
FAST-FSTIDX-21
(Index) Search
QR Server
FAST-FSTIDX-22
(Index) Search
QR Server
FAST-ADM-2
Content Distributor 2
Web Analyzer
12 Docprocs+
FAST-FSTIDX-13
Index (Search)
12 Docprocs+
FAST-FSTIDX-23
(Index) Search
QR Server
FAST-FSTIDX-14
Index (Search)
12 Docprocs+
FAST-FSTIDX-15
Index (Search)
12 Docprocs+
FAST-FSTIDX-24
(Index) Search
QR Server
FAST-FSTIDX-25
(Index) Search
QR Server
FAST-FSTIDX-16
Index (Search)
12 Docprocs+
FAST-FSTIDX-26
(Index) Search
QR Server
FAST-ADM-3
Web Analyzer
12 Docprocs+
16. FS4SP – Disk Calculation Matrix
Disclaimer:
This table is based on early testing and results from an internal dogfood project. The numbers might not be
representative for the customer environment and data. Please use caution when using these numbers for sizing.
Max item count
(in Millions) Adm Web Analyzer Crawl DB Server Indexer
1 1 x 72 GB 1 x 5 GB 1 x 10 GB 1 x 120 GB
10 1 x 72 GB 1 x 50 GB 1 x 40 GB 1 x 1.2 TB
40 1 x 72 GB 1 x 60 GB 1 x 150 GB 3 x 2.0 TB
100 1 x 72 GB 2 x 75 GB 1 x 350 GB 6 x 2.0 TB
150 1 x 72 GB 4 x 75 GB 1 x 500 GB 10 x 2.0 TB
200 1 x 72 GB 5 x 75 GB 2 x 350 GB 14 x 2.0 TB
500 1 x 72 GB 9 x 75 GB 2 x 500 GB 34 x 2.0 TB
17. Feeding and Indexing Performance
• Feed and indexing processing chain in FS4SP
has the components below:
– Crawler
– Content Distributors
– Item Processing
– Indexing Dispatcher
– Primary Indexer
– Backup Indexer