This document discusses secrets of enterprise data mining. It begins by defining data mining as the automated or semi-automated process of discovering patterns in data. It then discusses how data mining can be applied in various industries like telecommunications, oil and gas, and Volkswagen Group. Finally, it discusses how Microsoft offers solutions for enterprise data mining through SQL Server Analysis Services and Microsoft Azure Machine Learning.
6. Definition
Data mining is the automated or semi-automated process of discovering patterns in data
Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
15. Excel Data Mining Add-In
For Office 2007: The 32-bit data mining add-in works with SQL Server 2008 or 2008 R2:
http://www.microsoft.com/en-us/download/details.aspx?id=7294
For Office 2010: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier:
http://www.microsoft.com/en-us/download/details.aspx?id=35578
For Office 2013: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier:
http://www.microsoft.com/en-us/download/details.aspx?id=35578
16. Secret: Data Science provides an Epistemology
Data mining is part of a complete data science cycle
21. Gartner 2013
Magic Quadrant for Business Intelligence and Analytics Platforms
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb–February 5, 2013
22. Gartner 2013
Magic Quadrant for Data Warehouse Database Management Systems
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb–January 31, 2013
23. KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project?
http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
24. KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project?
http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
31. Data platform: SQL Server 2014
Database Services
SQL Server* SQL Azure*
ReplicationSQL Azure Data Sync*
Full Text & Semantic Search*
Data Integration Services
Integration Services*
Master Data Services*
Data Quality Services*
StreamInsight* Project “Austin”*
Analytical Services
Analysis Services*
Data Mining
PowerPivot*
Reporting Services
Reporting Services* SQL Azure Reporting*
Report Builder
Power View*
32. Secret: Microsoft offers two choices
SQL Server Analysis Services = SQL Server Data Mining
Microsoft Azure Machine Learning
33. Advanced analytic tools for data scientists
•Advanced descriptive analytics (e.g. clustering algorithm in SQL Server Analysis Services)
•Predictive analytics (Neural Nets, Regression, Decision Tree, Time Series, Naïve Bayes algorithms in SQL Server Analysis Services)
•Further advanced analytics (Semantic Search and Geospatial Data and functions in SQL Server 2012)
•Big Data analytics(Hadoop integration)
35. SSAS Data Mining Capacities
SQL Server 2014Analysis Services Object
Maximum sizes/numbers
Maximum data mining models per structure
2^31-1 = 2,147,483,647
Maximum data mining structures per solution
2^31-1 = 2,147,483,647
Maximum data mining structures per Analysis Services database
2^31-1 = 2,147,483,647
Maximum data mining attributes (variables) per structure
64K
Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
39. Future: Most data is Text
•Quantitative research = data mining
•Qualitative research = text mining
Two Research Types
The future is combining both
40. (iFilterRequired)
Documents
Full-Text Keyword Index
“FTI”
iFilters
Semantic Document Similarity Index “DSI”
Semantic Database
Semantic Key Phrase Index –
Tag Index “TI”
41. Languages Currently Supported
Traditional Chinese
German
English
French
Italian
Brazilian
Russian
Swedish
Simplified Chinese
British English
Portuguese
Chinese (Hong Kong SAR, PRC)
Spanish
Chinese (Singapore)
Chinese (Macau SAR)
43. Integrated Full Text Search (iFTS)
Improved Performance and Scale:
Scale-up to 350M documents for storage and search
iFTSquery performance 7-10 times faster than in SQL Server 2008
Worst-case iFTSquery response times less than 3 sec for corpus
Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)
44. Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry
Time in Seconds vs. Number of Documents
(2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)
45. Text Mining References
Video
http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search
http://www.microsoftpdc.com/2009/SVR32
Semantic Search (Books Online) –explains the demo
http://msdn.microsoft.com/en-us/library/gg492075.aspx
Paper
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
47. Major Websites
SQL Server Data Mining
http://technet.microsoft.com/en-us/sqlserver/cc510301.aspx
http://www.sqlserverdatamining.com/
Microsoft Azure Machine Learning (currently in preview) http://azure.microsoft.com/en-us/services/machine-learning/
48. Software
Dreamspark(students); BizSpark(businesses)
SQL Server 2014 Enterprise (includes database engine, Analysis Services, SSMS and SSDT)
http://www.microsoft.com/en-us/server-cloud/products/sql-server/default.aspx
Microsoft Office
http://office.microsoft.com/en-us/
Primer on Power BI --MarkTab
http://blogs.msdn.com/b/mvpawardprogram/archive/2014/08/04/primer-on-power-bi-business- intelligence.aspx
51. Conclusion
Excel data mining
Data Science provides an epistemology
Microsoft is an analytics competitor
Many already have Microsoft analytics
Microsoft offers two enterprise solutions
Semantic search scales linearly
52. Abstract
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.