2. MY FAVORITE BEYOND RELATIONAL APPLICATION
Structured and
unstructured Search
Related/”Semantic”
Search
3. BEYOND RELATIONAL DATA
Building and Maintaining Applications with
relational and non-relational data is hard
Pain Complex integration
Duplicated functionality
Points Compensation for unavailable services
Reduce the cost of managing all data
Simplify the development of applications
Goals over all data
Provide management and programming
services for all data
4. RICH UNSTRUCTURED DATA IN SQL SERVER 2012
• 80% of all data is not stored in databases!
Most of it is “unstructured”
• Make SQL Server the preferred choice for managing Unstructured Data
and allow building Rich Application Experience on top
• Address important customer requests for Capabilities and rich services
for Rich Unstructured Data (RUDS)
o Scale Up for storage and search to 100mio to 500mio documents
o Easy use/access to Unstructured data from all applications
o Rich insight into unstructured data to make better decisions
8. FILETABLE OVERVIEW
• FileTable: A Table of Files/Directories FileTable Folder Hierarchy
• User created Table with a fixed schema
• contains FILESTREAM and File Attributes FILESTREAM Share
MSSQLSERVER
• Each row represents a File or a Directory
my_machineMSSQLSERVER
• System defined constraints maintain the tree Database
Office DocsDocuments
integrity Directories
Private Docs Office Docs
(Database1) (Database2)
• File/Directory hierarchy view through a Windows
Share FileTable Directories
Media Documents LogFiles
• Supports Win32 APIs for File/Directory (FileTable) (FileTable) (FileTable)
Management User-Defined
• DB Storage is Transparent to Win32 applications
Directory Structure
• SMB level of application compatibility
• Virtual network name (VNN) path support for
transparent Win32 application failover
9. CREATING A FILETABLE
Pre-requisites
Enable FILESTREAM
Create FILESTREAM Share and Filegroup
Enable non-transactional access at the DB level
ALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL,
Directory_name = N’Contoso’)
Create FileTable
CREATE TABLE Contoso..Documents AS FILETABLE
WITH (filetable_directory = N'Document Library')
Access at <machine name><FILESTREAM share>ContosoDocument Library
10. MODIFYING A FILETABLE
FileTable has a fixed schema
Columns, system defined constraints cannot be altered/dropped
Allows user defined indexes/constraints/triggers
Disabling/Enabling FileTable Namespace
ALTER TABLE Documents DISABLE FILETABLE_NAMESPACE
Disables all system-defined constraints and Win32 access to
FileTable
Useful for bulk-loading/re-organization of data
FileTable can be dropped similar to any other table
Catalog views can be used for obtaining metadata
11. DATA ACCESS – FILE SYSTEM ACCESS
FileTable hierarchy is visible through Filestream share
machine<FILESTREAMshare><Database_directory><FileTable_Directory>...
Provides transparent Win32 API & File/Directory Management capabilities
e.g. MS word can create/open/save files; xcopy for copying directory trees into
database..
Win32 API operations are non-transactional
Operations cannot be part of any user transactions
Win32 operations are intercepted by SQL Server at the File system level
e.g. File/Directory creation/deletion => insert/delete into FileTable
Full locking/concurrency semantics with other accesses
Allows in-place update of file stream data/File attributes
Transactional FILESTREAM APIs can also be used.
12. DATA ACCESS – T-SQL ACCESS
Normal Insert/Update/Delete allowed for the FileTable manipulation
FileTable Namespace integrity constraints enforced
Set based operations on the File-attributes – value add
Built-in functions
GetFileNamespacePath() – UNC path for a file/directory
FileTableRootPath() – UNC path to the FileTable root
GetPathlocator() – path_locator value for a file/directory
DDL/DML Triggers are supported
DML triggers on a FileTable cannot update any FileTables
13. MANAGING FILETABLE
DB Backup/Restore operations include FileTable data
Point in time Restore‟ may contain more recent FILESTREAM data due to
non-transactional updates during backup
FileTables are secured similar to any other user tables
Same security is enforced for Win32 access also
Data Loading
Windows tools like xcopy/robocopy OR drag-drop operations through
Windows Explorer can be used
BCP operations are supported for direct T-SQL data inserts
SSMS supports FileTable creation/exploration
14. MANAGING FILETABLE – HIGH AVAILABILITY
SQL Server 2012 AlwaysOn is fully supported
Transparent data failover
FileTables can be configured with multiple secondary nodes
Both sync and async data replication is supported
File and metadata is available in the secondary in case of failover
Transparent application failover
Virtual network name (VNN) path support for transparent Win32 application failover
Applications use VNNSharedb... Path
Applications are automatically redirected to the secondary in case of failover
Restrictions
FileTables cannot participate in “Read-only” replicas.
15. FILETABLE RESTRICTIONS
FileTables cannot be partitioned
Merge/Transactional replications are not supported
RCSI/SnapShot isolation mode
Applications cannot modify file stream data in FileTables
Win32 Application compatibility
Memory mapped files, Directory notifications, links are not supported
16. UNSTRUCTURED DATA SCALE-UP
MULTIPLE CONTAINERS FOR FILESTREAM DATA
SQL 2008 R2
Only one storage container/FILESTREAM filegroup
Limits storage capacity scaling and I/O scaling
SQL Server 2012
Support for multiple storage containers/filegroup.
DDL Changes to Create/Alter Database statements
Ability to set max_size for the containers
DBCC Shrinkfile Emptyfile support
Scaling Flexibility
Storage scaling by adding additional storage drives
I/O scaling with multiple spindles
17. UNSTRUCTURED DATA : MULTIPLE CONTAINERS
Use of multiple spindles for achieving better I/O Scalability
18. RUDS SCALE-UP: FILESTREAM PERF/SCALE
Improved performance of T-SQL and File I/O access
Various enhancements to improve read/write throughput
5 fold increase in Read throughput
Linear scaling with large number of concurrent threads
2012 2012
19. SUMMARY: FILETABLE
Application Compatibility for Windows Applications
Windows applications run on top of files stored in FileTables with
no modifications
Relational Value Proposition
Provide Integrated Administration and Services
Backup, Log Shipping, HA-DR, Full text and Semantic search, …
T-SQL orthogonality
File/Folder attributes surfaced through relational columns
Power of set based operations, Policy Management, Reporting etc
FileNamespace Hierarchy management
20. FULL TEXT SEARCH IMPROVEMENTS IN SQL SERVER 2012
Improved Performance and Scale:
Scale-up to 350M documents
iFTS query perf 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times < 3 sec for corpus
At par or better than main database search competitors
New Functionality:
Property Search
customizable NEAR
New Wordbrakers: update existing WB, add Czech and Greek
Innovation in Search:
Semantic Similarity Search
21. FULLTEXT SEARCH PERFORMANCE & SCALE IMPROVEMENTS
Architectural Improvements
Improved internal implementation
Queries no longer block Index updates
Improved Query Plans:
Better Plans for common queries
Fulltext predicate folding
Parallel Plan execution
Index and Query tested on scale up to 350Million documents with
<~2 Sec Response
~3X better w/o DML and ~9X better with DML throughput
Scale easily with increasing number of connections
22. SCALE-UP: FULL-TEXT SEARCH
2005/8 vs 2012
2005/8
2012
Queries over 350M documents database and random DMLs running in background.
Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
23. SCALE-UP: FULL-TEXT SEARCH
2005/8 vs 2012
2005/8
2012
Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer
playback benchmark
24. FULLTEXT PROPERTY SCOPED SEARCH
New Search Filter for Document Properties
CONTAINS (PROPERTY ( { column_name }, 'property_name' ), „contains_search_condition‟ )
• Setup once per database instance to load the office filters
exec sp_fulltext_service 'load_os_resources',1
go
exec sp_fulltext_service 'restart_all_fdhosts'
go
• Create a property list
CREATE SEARCH PROPERTY LIST p1;
• Add properties to be extracted
ALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH
(PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab91-08002b27b3d9',
PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author');
• Create/Alter Fulltext index to specify property list to be extracted
ALTER FULLTEXT INDEX ON fttable... SET SEARCH PROPERTY LIST = [p1];
• Query for properties
SELECT * FROM fttable WHERE CONTAINS(PROPERTY(ftcol, 'System.Author'), 'fernlope');
25. FULL-TEXT CUSTOMIZABLE NEAR
OLD NEAR SYNTAX
select * from fttable where contains(*, 'test near Space')
NEW NEAR USAGES
• SPECIFY DISTANCE
select * from fttable
where contains(*, 'near((test, Space), 5,false)')
• REDUCE DISTANCE
select * from fttable
where contains(*, 'near((test, Space), 2,false)')
• ORDER OF WORDS IS SPECIFIED AS IMPORTANT
select * from fttable
where contains(*, 'near((test, Space), 5,true)')
26. STATISTICAL SEMANTIC SEARCH
Semantic Insight into textual content
Uses language models to find most important keywords in document
No need to build brittle ontologies!
Statistically Prominent Keywords
Autogenerated tag clouds
Potentially Related Content based on extracted Keywords, such as
Similar Products (based on description)
Similar Jobs or Applicants
Similar Support Incidents (based on call logs)
Potential Solutions (based on similar incidents)
First class usage experience
Efficent linear algorithms
Integrated with FTS and SQL
New Rowset functions for all results using SQL query
29. SEMANTIC EXTRACTION: END-2-END EXPERIENCE
• Downloadable Language Statistical Database with registration stored
procedure
• Setup along with Full-Text
• Metadata / Catalog views
• System level DMVs for progress state and usage
• Manageability through SSMS and SMO
30. KEY TAKEAWAYS
SQL Server‟s unstructured data support is key strategy to
enable you to build complex data applications that go
beyond relational data!
Content and Collaboration, eDiscovery, Healthcare, Document
management etc.
31. RELATED CONTENT
SQL Server 2012 Whitepapers and information:
http://www.sqlserverlaunch.com
Channel 9 DataBound Episode 2: http://channel9.msdn.com
MySemanticsSearch Demo: http://mysemanticsearch.codeplex.com
More demo data sets and demo scripts:
http://blogs.msdn.com/b/sqlfts/archive/2011/07/21/introducing-fulltext-
statistical-semantic-search-in-sql-server-codename-denali-release.aspx
Microsoft Virtual Academy Recording: Coming Soon!
Hinweis der Redaktion
Let’s take a look at a BR application. What services does it provide. What about having these services supported in the database instead of each application building their own?
Examples: Manage an application that manages images in the file system and additional information in the databaseBuilding a spatial database application before SQL Server 2008Example services: Backup/restore, search over relational and non-relational data
SQL 2008 provides Filestreams as a way add large blobs/unstructured data streams into SQL and still be able to open a Win32 handle (using SQL API) and provide high streaming performance for the data Win32 Namespace support in SQL Server 2012 has the following goals Reduce the barrier to entry for customers who have data in file servers and have Win32 applications that work on these currently. By enabling Win32 namespace, SQL will generate Windows Share that can be exposed to existing Win32 applications similar to any file server shares. This can allow Win32 applications/mid tier servers (like IIS) to work with this data without having to understand the database/transaction semantics Single integrated set of Admin tools – SQL backup/restore, Replication, HA solutions etc Scale up – Add multiple disks on a machine for storing Filestream data. Use SQL services like Full text search for both FileStream and relational metadata, Property Promotion Infrastructure fro extracting interesting properties from SQL blobs/filestream to surface as relational columns for query
Optimized hot paths, removed unnecessary serialization, expensive FileSystem operations etc