SlideShare ist ein Scribd-Unternehmen logo
1 von 24
So you think you can crawl?
Stretching the Boundaries of SharePoint 2013!
Petter Skodvin-Hvammen
AD-Gruppen, Norway
Who am I?
Petter Skodvin-Hvammen
Oseberg ship - Discovered 1904 in Tønsberg, Norway. Buried by Vikings in 834 AD
• Solutions Architect
• SharePoint Consultant
• Search Enthusiast
• Community Lead
@pettersh - psh@adgruppen.no
www.adgruppen.no
Enterprise Search
Index thousands
of sources
Automate index
management
Infrastructure
sizing
Challenges and Solutions
Not Included: code/scripts, user experience, relevancy, governancewww.sharepointeurope.com
Enterprise Search using SharePoint Server 2013
• 30,000 users
• 85 locations in 30 countries
• 15,000 daily searches
• 100,000,000 documents(?)
• 60 core systems, 2,000 applications
The Mission…
What do we index?
100,000,000
documents
3,000
fileshares
500
servers
Where is the data?
• Datacenters
• Time zones
• Bandwidth
www.sharepointeurope.com
* http://blogs.technet.com/b/shanecothran/archive/2010/07/16/maxtokensize-and-kerberos-token-bloat.aspx
How can we get it?
• Limit bandwidth usage for specific server locations
• Limit crawler impact within local business hours
• Grant read access to crawler per file share
• Avoid token bloat issues with more than 1,015*
groups per account
How do we operate it?
• File shares are created, changed, and deleted every
day using a custom self service solution
• File shares are moved between servers every day by
automation rules
• Manage indexing and crawling of each file shares with
minimum manual effort
www.sharepointeurope.com
What can SharePoint do?
• Max 50 content sources per service application
– Max 500 with October 2013 CU installed
• Max 100 start addresses per content source
– Max 500 with October 2013 CU installed
• Max 20 concurrent crawls per service application
– Limitation has been removed
http://technet.microsoft.com/en-us/library/cc262787(v=office.15).aspx#Search
It’s complicated
• More data than we have space for
• It’s located all over the place
• Everything changes all of the time
• There are limitations in SharePoint
• Someone’s gotta maintain this
• It has to be secure and relevant
www.sharepointeurope.com
What did we do?
• Created logical groups of file shares
• Used symbolic linking
www.sharepointeurope.com
fewer
content
sources
file01share01
file02share03
file03share03
file00sharesym01
file00sharesym02
file00sharesym03
file00share
Start address
What did we do?
• Grouped file shares based on region
• One content source per region
• Incremental crawls every night
www.sharepointeurope.com
crawling
based on
time zones
What did we do?
• Created DNS alias per impact rule in
etc/hosts on crawl servers
www.sharepointeurope.com
reduced
crawler
impact
What did we do?
• Granted file share access to the
account included in least groups
• Monitored group memberships
• Grouped file shares by crawl account
• Crawl rules matched folder structure
managed pool
of crawl
accounts
file://.*/spcrwl01/.*
file://.*/spcrwl02/.*
Include
Include
SPspcrwl01
SPspcrwl02
www.sharepointeurope.com
The bigger picture
• Folder structure:
• Start addresses:
<content source>/<crawler impact>/<crawl account>/<symbolic link>
file://<crawler impact>/<content source>/<crawler impact>
Source Start addresses Folder Crawl rule Impact rule
Europe file://default/europe/default europe/default/spcrwl01 file://.*/spcrwl01/.* Default
europe/default/spcrwl02 file://.*/spcrwl02/.* Default
file://wait-60/europe/wait-60 europe/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60
europe/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
Asia file://default/asia/default asia/default/spcrwl01 file://.*/spcrwl01/.* Default
asia/default/spcrwl02 file://.*/spcrwl02/.* Default
file://wait-60/asia/wait-60 asia/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60
asia/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
How did we manage this?
www.sharepointeurope.com
self service portal for
enabling indexing of
file shares
custom web service
integration in self service portal
custom solution for
granting access to
crawl accounts
custom timer job to get list of file shares
to crawl from self service portal
custom timer job for creating
and removing symbolic links
custom lists for mapping
server to content source, schedule
and impact, shares to crawl accounts
and metadata, UNC to symlink
content enrichment service for
replacing symlinks in paths with actual file paths
www.sharepointeurope.com
Title: European SharePoint Conference
Owner: Petter Skodvin-Hvammen
Business Area: Consulting
Classification: Internal
Type: Project
UNC Path: Assigned automatically
Crawl Account: Assigned automatically
CancelSave
Example: Self Service Portal Example: Custom Lists
Title: European SharePoint Conference
Owner: Petter Skodvin-Hvammen
Business Area: Consulting
Classification: Internal
Type: Project
UNC Path: file01share01
Crawl Account: SPspcrawl01
Symlink: defaulteuropedefaultspcrwl01e5dc12a41d
Location: europe (server file01 is located in Oslo DC)
Bandwidth: 5Mbps
Index-0
Query
WFE
Doc Proc
Crawling
Central Admin
Enrichment
Query
WFE
Index-2
Index-1
Index-3
Index-0
Index-2
Index-1
Index-3
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Crawling
Analytics
AdminAdmin
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Analytics
Doc Proc
Enrichment
Doc Proc
Enrichment
40Million
Documents
10Queries /
Second
SQL Server SQL Server
• Admin DB
• Analytics DB
• Crawl DB
• Link DB
• Other SP DBs
Caching Caching
Capacity testing
Purpose
• Crawling of symbolic links
• Scaling of virtual machines
• Sizing of disk space
• Verify Microsoft’s advises
Approach
• 4 server farm with 2 partitions
• 8 vCPU, 16 GB RAM, 850 GB
• Crawl 10 file shares (3.7M files)
• Replay top 300 queries
• Apache JMeter
www.sharepointeurope.com
Capacity testing – findings
• Crawl rate declined 1% per million items indexed
• Query latency increased exponentially from 12 million items
indexed per partition
• Database latency was insignificant during crawling
• Successfully crawled file shares via symbolic directory links
• Disk space usage was significant lower than expected
– Reduced data volume from 850 GB to 450 GB
– 40+ servers => huge cost savings
www.sharepointeurope.com
Infrastructure – VM sizing
Dedicated ESX Cluster
• 14 x VM for SharePoint 2013
– 4 physical machines
– 4 x 32 = 128 CPUs
– 4 x 56 = 1024 GB memory
• HA max utiliization = ¾
– 3 x 32 = 96 CPUs
– 3 x 56 = 768 GB memory
• CPU and Memory can be over-
commited
• CPU over-commited 1,34
(1,78 if one physical host fail)
• VM’s must wait for physical CPU
Wait time for 8 cpu = 2 x 4 cpu
• Mitigation:
a) Reduce allocated virtual CPU, or
b) Increase physical CPU
• Memory factor 0,44 (0,59)
• Reserved and locked memory
prevents HA failover
www.sharepointeurope.com
Infrastructure – VM tuning
www.sharepointeurope.com
DC Role vCPU Peak Average Calculated Recommended Change
A Web, Query, Admin 8 187,55 37,03 2 4 -4
B Web, Query, Admin 8 621,88 92,69 8 8 0
A Crawl, Analytics, Content, CEWS, Central Admin 8 724,35 210,59 8 8 0
B Crawl, Analytics, Content, CEWS, Symbolic Links 8 724,56 198,44 8 8 0
A Index 0, Content, CEWS 8 486,18 62,55 6 6 -2
B Index 0, Content, CEWS 8 520,63 63,98 6 6 -2
A Index 1, Content, CEWS 8 547,08 69,3 6 6 -2
B Index 1, Content, CEWS 8 546,44 91,74 6 6 -2
A Index 2, Content, CEWS 8 491,38 65,6 6 6 -2
B Index 2, Content, CEWS 8 532,01 77,83 6 6 -2
A Index 3, Content, CEWS 8 540,45 78,72 6 6 -2
B Index 3, Content, CEWS 8 621,88 92,69 8 8 0
A Distributed Cache 4 91,71 5,99 2 2 -2
B Distributed Cache* (added later) - - - - - -
100 78 80 -20
Peak and average CPU usage is calculated over 30 days
Summary
1. Indexing thousands of content sources
2. Automation for rapid changing index requirements
3. Sizing the infrastructure for performance and HA
www.sharepointeurope.com
Questions?
petter.skodvin-hvammen@adgruppen.no http://linkedin.com/in/petterskodvin@pettersh

Weitere ähnliche Inhalte

Was ist angesagt?

Share point 2013 in a hybrid world
Share point 2013 in a hybrid worldShare point 2013 in a hybrid world
Share point 2013 in a hybrid world
Jethro Seghers
 
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Nik Patel
 
2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release
Dan Usher
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
Elaine Van Bergen
 
SPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePointSPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePoint
Dan Usher
 

Was ist angesagt? (20)

SharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid worldSharePoint 2013 in a hybrid world
SharePoint 2013 in a hybrid world
 
SharePoint 2013 Performance Enhancements
SharePoint 2013 Performance EnhancementsSharePoint 2013 Performance Enhancements
SharePoint 2013 Performance Enhancements
 
Sps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitvSps boston 2014_o365_power_shell_csom_amitv
Sps boston 2014_o365_power_shell_csom_amitv
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
 
Share point 2013 in a hybrid world
Share point 2013 in a hybrid worldShare point 2013 in a hybrid world
Share point 2013 in a hybrid world
 
How to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DBHow to take advantage of scale out graph in Azure Cosmos DB
How to take advantage of scale out graph in Azure Cosmos DB
 
Rev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best PracticesRev Your Engines: SharePoint Performance Best Practices
Rev Your Engines: SharePoint Performance Best Practices
 
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...
 
2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release2014 05-19 - getting started with office 365.release
2014 05-19 - getting started with office 365.release
 
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data ConnectECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
ECS19 - Mike Ammerlaan - Microsoft Graph Data Connect
 
SPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint SearchSPUnite17 IT Pros Guide to Managing SharePoint Search
SPUnite17 IT Pros Guide to Managing SharePoint Search
 
SharePoint on Microsoft Azure
SharePoint on Microsoft AzureSharePoint on Microsoft Azure
SharePoint on Microsoft Azure
 
What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?What’s new in SharePoint 2016 Beta 2?
What’s new in SharePoint 2016 Beta 2?
 
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
[McDermott] Configuring SharePoint Hybrid Search and Taxonomy
 
SharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi VončinaSharePoint 2013 Performance Analysis - Robi Vončina
SharePoint 2013 Performance Analysis - Robi Vončina
 
Dealing with and learning from the sandbox
Dealing with and learning from the sandboxDealing with and learning from the sandbox
Dealing with and learning from the sandbox
 
SPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePointSPTechCon 2014 - Boston - Worst practices of SharePoint
SPTechCon 2014 - Boston - Worst practices of SharePoint
 
O365 Sydney - Hybrid Dev
O365 Sydney - Hybrid DevO365 Sydney - Hybrid Dev
O365 Sydney - Hybrid Dev
 
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
SharePoint Workflows - SharePoint Saturday Twin Cities April 2012
 

Ähnlich wie ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013!

How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...
Petter Skodvin-Hvammen
 
Share point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practicesShare point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practices
Eric Shupps
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
Eric Shupps
 

Ähnlich wie ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013! (20)

I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas VochtenI2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
I2 - SharePoint Hybrid Search Start to Finish - Thomas Vochten
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...
 
Share point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practicesShare point 2010 performance and capacity planning best practices
Share point 2010 performance and capacity planning best practices
 
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
SPSNL17 - Implementing SharePoint hybrid search, start to finish - Thomas Voc...
 
SharePoint 2013 Search Operations
SharePoint 2013 Search OperationsSharePoint 2013 Search Operations
SharePoint 2013 Search Operations
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)SPSUtah 2014 SharePoint 2013 Performance (Admin)
SPSUtah 2014 SharePoint 2013 Performance (Admin)
 
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectSharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
What's new in sharepoint 2016
What's new in sharepoint 2016What's new in sharepoint 2016
What's new in sharepoint 2016
 
Leveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organizationLeveraging microsoft’s e discovery platform in your organization
Leveraging microsoft’s e discovery platform in your organization
 
Velocity - Edge UG
Velocity - Edge UGVelocity - Edge UG
Velocity - Edge UG
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Tips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint OnlineTips and tricks for complex migrations to SharePoint Online
Tips and tricks for complex migrations to SharePoint Online
 
SharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - AnnouncementsSharePoint Conference North America 2018 - Las Vegas - Announcements
SharePoint Conference North America 2018 - Las Vegas - Announcements
 
Deep thoughts from the real world of azure
Deep thoughts from the real world of azureDeep thoughts from the real world of azure
Deep thoughts from the real world of azure
 

Kürzlich hochgeladen

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Kürzlich hochgeladen (20)

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

ESPC14 380 So you think you can crawl? Stretching the Boundaries of SharePoint 2013!

  • 1. So you think you can crawl? Stretching the Boundaries of SharePoint 2013! Petter Skodvin-Hvammen AD-Gruppen, Norway
  • 2. Who am I? Petter Skodvin-Hvammen Oseberg ship - Discovered 1904 in Tønsberg, Norway. Buried by Vikings in 834 AD • Solutions Architect • SharePoint Consultant • Search Enthusiast • Community Lead @pettersh - psh@adgruppen.no www.adgruppen.no
  • 3. Enterprise Search Index thousands of sources Automate index management Infrastructure sizing Challenges and Solutions Not Included: code/scripts, user experience, relevancy, governancewww.sharepointeurope.com
  • 4. Enterprise Search using SharePoint Server 2013 • 30,000 users • 85 locations in 30 countries • 15,000 daily searches • 100,000,000 documents(?) • 60 core systems, 2,000 applications The Mission…
  • 5. What do we index? 100,000,000 documents 3,000 fileshares 500 servers
  • 6. Where is the data? • Datacenters • Time zones • Bandwidth www.sharepointeurope.com
  • 7. * http://blogs.technet.com/b/shanecothran/archive/2010/07/16/maxtokensize-and-kerberos-token-bloat.aspx How can we get it? • Limit bandwidth usage for specific server locations • Limit crawler impact within local business hours • Grant read access to crawler per file share • Avoid token bloat issues with more than 1,015* groups per account
  • 8. How do we operate it? • File shares are created, changed, and deleted every day using a custom self service solution • File shares are moved between servers every day by automation rules • Manage indexing and crawling of each file shares with minimum manual effort www.sharepointeurope.com
  • 9. What can SharePoint do? • Max 50 content sources per service application – Max 500 with October 2013 CU installed • Max 100 start addresses per content source – Max 500 with October 2013 CU installed • Max 20 concurrent crawls per service application – Limitation has been removed http://technet.microsoft.com/en-us/library/cc262787(v=office.15).aspx#Search
  • 10. It’s complicated • More data than we have space for • It’s located all over the place • Everything changes all of the time • There are limitations in SharePoint • Someone’s gotta maintain this • It has to be secure and relevant www.sharepointeurope.com
  • 11. What did we do? • Created logical groups of file shares • Used symbolic linking www.sharepointeurope.com fewer content sources file01share01 file02share03 file03share03 file00sharesym01 file00sharesym02 file00sharesym03 file00share Start address
  • 12. What did we do? • Grouped file shares based on region • One content source per region • Incremental crawls every night www.sharepointeurope.com crawling based on time zones
  • 13. What did we do? • Created DNS alias per impact rule in etc/hosts on crawl servers www.sharepointeurope.com reduced crawler impact
  • 14. What did we do? • Granted file share access to the account included in least groups • Monitored group memberships • Grouped file shares by crawl account • Crawl rules matched folder structure managed pool of crawl accounts file://.*/spcrwl01/.* file://.*/spcrwl02/.* Include Include SPspcrwl01 SPspcrwl02 www.sharepointeurope.com
  • 15. The bigger picture • Folder structure: • Start addresses: <content source>/<crawler impact>/<crawl account>/<symbolic link> file://<crawler impact>/<content source>/<crawler impact> Source Start addresses Folder Crawl rule Impact rule Europe file://default/europe/default europe/default/spcrwl01 file://.*/spcrwl01/.* Default europe/default/spcrwl02 file://.*/spcrwl02/.* Default file://wait-60/europe/wait-60 europe/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60 europe/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60 Asia file://default/asia/default asia/default/spcrwl01 file://.*/spcrwl01/.* Default asia/default/spcrwl02 file://.*/spcrwl02/.* Default file://wait-60/asia/wait-60 asia/wait-60/spcrwl01 file://.*/spcrwl01/.* Wait-60 asia/wait-60/spcrwl02 file://.*/spcrwl02/.* Wait-60
  • 16. How did we manage this? www.sharepointeurope.com self service portal for enabling indexing of file shares custom web service integration in self service portal custom solution for granting access to crawl accounts custom timer job to get list of file shares to crawl from self service portal custom timer job for creating and removing symbolic links custom lists for mapping server to content source, schedule and impact, shares to crawl accounts and metadata, UNC to symlink content enrichment service for replacing symlinks in paths with actual file paths
  • 17. www.sharepointeurope.com Title: European SharePoint Conference Owner: Petter Skodvin-Hvammen Business Area: Consulting Classification: Internal Type: Project UNC Path: Assigned automatically Crawl Account: Assigned automatically CancelSave Example: Self Service Portal Example: Custom Lists Title: European SharePoint Conference Owner: Petter Skodvin-Hvammen Business Area: Consulting Classification: Internal Type: Project UNC Path: file01share01 Crawl Account: SPspcrawl01 Symlink: defaulteuropedefaultspcrwl01e5dc12a41d Location: europe (server file01 is located in Oslo DC) Bandwidth: 5Mbps
  • 18. Index-0 Query WFE Doc Proc Crawling Central Admin Enrichment Query WFE Index-2 Index-1 Index-3 Index-0 Index-2 Index-1 Index-3 Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Doc Proc Crawling Analytics AdminAdmin Enrichment Enrichment Enrichment Enrichment Enrichment Enrichment Enrichment Analytics Doc Proc Enrichment Doc Proc Enrichment 40Million Documents 10Queries / Second SQL Server SQL Server • Admin DB • Analytics DB • Crawl DB • Link DB • Other SP DBs Caching Caching
  • 19. Capacity testing Purpose • Crawling of symbolic links • Scaling of virtual machines • Sizing of disk space • Verify Microsoft’s advises Approach • 4 server farm with 2 partitions • 8 vCPU, 16 GB RAM, 850 GB • Crawl 10 file shares (3.7M files) • Replay top 300 queries • Apache JMeter www.sharepointeurope.com
  • 20. Capacity testing – findings • Crawl rate declined 1% per million items indexed • Query latency increased exponentially from 12 million items indexed per partition • Database latency was insignificant during crawling • Successfully crawled file shares via symbolic directory links • Disk space usage was significant lower than expected – Reduced data volume from 850 GB to 450 GB – 40+ servers => huge cost savings www.sharepointeurope.com
  • 21. Infrastructure – VM sizing Dedicated ESX Cluster • 14 x VM for SharePoint 2013 – 4 physical machines – 4 x 32 = 128 CPUs – 4 x 56 = 1024 GB memory • HA max utiliization = ¾ – 3 x 32 = 96 CPUs – 3 x 56 = 768 GB memory • CPU and Memory can be over- commited • CPU over-commited 1,34 (1,78 if one physical host fail) • VM’s must wait for physical CPU Wait time for 8 cpu = 2 x 4 cpu • Mitigation: a) Reduce allocated virtual CPU, or b) Increase physical CPU • Memory factor 0,44 (0,59) • Reserved and locked memory prevents HA failover www.sharepointeurope.com
  • 22. Infrastructure – VM tuning www.sharepointeurope.com DC Role vCPU Peak Average Calculated Recommended Change A Web, Query, Admin 8 187,55 37,03 2 4 -4 B Web, Query, Admin 8 621,88 92,69 8 8 0 A Crawl, Analytics, Content, CEWS, Central Admin 8 724,35 210,59 8 8 0 B Crawl, Analytics, Content, CEWS, Symbolic Links 8 724,56 198,44 8 8 0 A Index 0, Content, CEWS 8 486,18 62,55 6 6 -2 B Index 0, Content, CEWS 8 520,63 63,98 6 6 -2 A Index 1, Content, CEWS 8 547,08 69,3 6 6 -2 B Index 1, Content, CEWS 8 546,44 91,74 6 6 -2 A Index 2, Content, CEWS 8 491,38 65,6 6 6 -2 B Index 2, Content, CEWS 8 532,01 77,83 6 6 -2 A Index 3, Content, CEWS 8 540,45 78,72 6 6 -2 B Index 3, Content, CEWS 8 621,88 92,69 8 8 0 A Distributed Cache 4 91,71 5,99 2 2 -2 B Distributed Cache* (added later) - - - - - - 100 78 80 -20 Peak and average CPU usage is calculated over 30 days
  • 23. Summary 1. Indexing thousands of content sources 2. Automation for rapid changing index requirements 3. Sizing the infrastructure for performance and HA www.sharepointeurope.com