Why you should be mining your data and how to actually do it. Every company needs a rock star. We want it to be you. This session will give real world examples of data mining successes as well as walk you through how to get started down the path of data enlightenment, so that you too can say "I Am A Data Miner℠".
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
Mine craft:
1. Mine Craft
Why you should be mining your data and how to actually do it.
2. Mark Tabladillo – MVP
• Mark provides enterprise data science analytics advice and solutions.
He uses Microsoft Azure Machine Learning, Microsoft SQL Server
Data Mining, SAS, SPSS, R, and Hadoop (among other tools). He works
with Microsoft Business Intelligence (SSAS, SSIS, SSRS, SharePoint,
Power BI, .NET). He is a consultant for SolidQ.
• Mark has been a national leader in analytics and data science (data
mining and machine learning) through conference speaking and
instructional leadership since 1998: Microsoft TechEd, PASS Business
Analytics Conference, Predictive Analytics World, SAS Global Forum,
PASS Summit. He connects with people on Linked In and Twitter
@marktabnet
3. David McFarland - MSNCSVUP
• Sr. Mgr. Business Intelligence – Rentpath
• 2007 to present
• CTO – AdventureWorks Cycles
• 2005 to present
• CTO – Northwind Traders
• 2000 to 2005
• CTO – Lucerne Publishing
• 1990 to 2000
4.
5. How could data mining apply?
Let’s look at three companies
10. What Why How
Relational Data
Warehouse
Flexible query Data from disparate sources; tables,
schema, keys, relationships, index
Hadoop &
HDInsight
Flexible storage and schema, massive
parallel processing
Multiple nodes and distributed computing,
commodity hardware, Java; Map Reduce
and YARN
Tabular Fast query and calculations, easy to
understand
In-memory, columnstore indexes
Multidimensional
OLAP
Fast query; ad-hoc analysis Pre-aggregations, calculations
Data Mining &
Machine Learning
Discovery of knowledge, find outliers, find
similarities, make predictions
Estimations, creation of models
11.
12.
13.
14.
15.
16. In the beginning, there was…
Margaret*
*Her real name, as I don’t think that she is THAT innocent.
22. Major Websites
SQL Server Data Mining
http://technet.microsoft.com/en-us/sqlserver/cc510301.aspx
http://www.sqlserverdatamining.com/
Microsoft Azure Machine Learning (currently in preview)
http://azure.microsoft.com/en-us/services/machine-learning/
23. Software
Dreamspark (students); BizSpark (businesses)
SQL Server 2014 Enterprise
(includes database engine, Analysis Services, SSMS and SSDT)
http://www.microsoft.com/en-us/server-cloud/products/sql-server/default.aspx
Microsoft Office
http://office.microsoft.com/en-us/
Primer on Power BI -- MarkTab
http://blogs.msdn.com/b/mvpawardprogram/archive/2014/08/04/primer-on-power-bi-business-intelligence.
aspx
24. Preparing for Microsoft SQL Server Data Mining
Last updated: October 28, 2014
SQL Server
• You will need SQL Server 2008 or higher; please include “Database Engine”, “Integration Services”, and “Analysis
Services”. For SQL Server 2012 or 2014, you need the “Multidimensional and Data Mining Mode” for Analysis Services.
(You may optionally install semantic search in SQL Server 2012 or 2014, print out the following directions before
installing: http://msdn.microsoft.com/en-us/library/gg509085 )
o SQL Server 2008 or 2008 R2 – Enterprise Edition (or Developer Edition) The requirements for SQL Server
2008 are on http://msdn.microsoft.com/en-us/library/ms143506(v=SQL.100).aspx All client tools should be
installed, including SQL Server Management Studio (SSMS) and Business Intelligence Development Studio (BIDS).
Directions for installation are at http://msdn.microsoft.com/en-us/library/ms143219(v=SQL.100).aspx
o SQL Server 2012 or 2014 – Business Intelligence Edition or Enterprise Edition (or Developer Edition) The
requirements for SQL Server 2014 are on http://msdn.microsoft.com/en-us/library/ms143506 and include NET 3.5
SP1 (.NET 4.0 is also required, but it is installed during installation)All client tools should be installed, including SQL
Server Management Studio (SSMS) and SQL Server Data Tools (SSDT). Directions for installation are at
http://technet.microsoft.com/en-us/library/ms143219(v=sql.120).aspx
• Click http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx for a 180-day trial version of SQL Server
2014
• Make sure you run Windows Update to have all the latest service packs and security updates applied
25. Microsoft Office
• (Data mining does not integrate with the browser-based Office 365, which is otherwise a nice product.)
• You will need Office 2007 or higher (with Excel) along with the free data mining add-in:
o For Office 2007: The 32-bit data mining add-in works with SQL Server 2008 or 2008 R2:
http://www.microsoft.com/en-us/download/details.aspx?id=7294
o For Office 2010: The 32- or 64-bit data mining add-in works with SQL Server 2012 or earlier:
http://www.microsoft.com/en-us/download/details.aspx?id=35578
o For Office 2013: The 32- or 64-bit data mining add-in works with SQL Server 2012 or earlier:
http://www.microsoft.com/en-us/download/details.aspx?id=35578
• Install the add-in, and choose all the parts (sometimes not all the parts are checked).
• After installation, run the “Server Configuration Utility” (from the Windows menu) to make sure you can connect
from Excel to Analysis Services. Please also open the “Sample Excel Data” (Excel Workbook) to see if you can see the
Data Mining tab, and also connect to Analysis Services. If you need help, there is a separate “Help and Documentation”
link (which comes up either from Excel or from the Windows menu).
o You will need to have your own instance of an Analysis Services database, where you have administrative
privileges (allowing both read and write access); if you have any questions on this point, please talk with a
professional in your Information Technology group.
• Click http://technet.microsoft.com/en-us/evalcenter/jj192782.aspx for a 60-day trial version of Office Professional
Plus 2013
• Make sure you run Windows Update to have all the latest service packs and security updates applied
Hinweis der Redaktion
Microsoft non-certified semi-valuable unprofessional
What is it?
Turning raw data into actual useful information, often to answer questions for which there is no “silver bullet” answer.
Classification algorithms for things like retail fraud prevention
I’m thinking that this might be better served via Mark than myself.
Why do it?
Confirm or bust long held myths that youre company has assumed were true.
Month-to-Month
Competitors are nearby, like-priced.
Etc.
Useful for those questions for which there is no silver bullet answer.
Why do customers leave us?
Who is most likely to purchasing product x?
Who are our customers’ true competitors?
What drives leads?
You can expect to run into many roadblocks. We did as well, but that just slowed us down a bit.
Let me tell you a little story about what can be accomplished in 8 short years, in spite of the roadblocks.
CEO: David who?
1 to 50,000 users
4 reports to thousands (not completely proud of this one)
Cubes
Data mining
CEO: Cookie stash
Data mining is easy. Getting the data sanitized is where the work lies
“Normalizing” the data might be necessary (use entry rent example)
I’m not liking this specifically, for reasons I can’t articulate, but I do want something similar as a wrap-up.