08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Â
DMDW Lesson 04 - Data Mining Theory
1. STUDIEREN UND DURCHSTARTEN. Author I: Dip.-Inf. (FH) Johannes Hoppe Author II: M.Sc. Johannes Hofmeister Author III: Prof. Dr. Dieter Homeister Date: 25.03.2011
2. Data Mining Theory Author I: Dip.-Inf. (FH) Johannes Hoppe Author II: M.Sc. Johannes Hofmeister Author III: Prof. Dr. Dieter Homeister Date: 25.03.2011
5. Data Warehouse Practical task Create groups of 3 people Grab âdmdw_rooms_fh_heidelberg-2006-04-03.xlsâ Explore the data in the âroom reservationsâ spreadsheet Discuss and create a simple database table / documentthat matches the data Find a way to migrate the data from the excel spreadsheet to the database For today I recommend SQL Server Business Intelligence Development Studio + SQL Server Today's System: 5
6. Data Warehouse Next practical task One team will have to present a different solution to migrate the data It should be a hands-on lab for the other students I will upload the materials to my blog Preferred time box: 45 - 90 minutes First team will be: _________________________________ Next System: ? 6
7. Data Warehouse ETL Teams 1. Team: Access â 2x Sebastian, Matthias 2. Team: Access ï Access2MySQL ï MySQL â Mercedes, Fabian, Marcus, Albert 3. Teams: Silverlight ï MS SQL â Sebastian, Patrick 4. Teams: PHP ï MySQL â Lars, Maurice, Jeff Next System: ? 7
9. Data Mining Introduction(1/3) Data Mining is done by running software that examines a database and looks for patterns in the data A data warehouse by itself will respond to queries from users It will not tell users about patterns in data that users may not have thought about To find patterns in data, data mining is used to try and mine key information from a data warehouse 9
10. Data Mining Introduction(2/3) Data mining allows companies to collect information ⊠to make them more productive and ⊠to beat their competitors Data mining helps to identify why customers buy certain products ideas for very direct marketing ideas for shelf placement training of employees vs. employee retention employee benefits vs. employee retention 10
11. Data Mining Introduction(3/3) Data mining attempts to find patterns in data that we did not know about Often data mining is just a new buzzword for statistics But data mining differs from (school) statistics in the waythat large volumes of data are used Trivial information or well known facts are not an aim of data mining! 11
18. Data Mining Implementing DM on Top of a DW (1/2) Data mining tools / mining algorithms require data! There are two approaches: Copy data from the Data Warehouse and mine it Mine the data directly in the Data Warehouse Popular tools use a variety of different data mining algorithms: association rules genetic algorithms decision trees neural networks 13
19. Data Mining Implementing DM on Top of a DW (2/2) a) Copy data from the data warehouse to data mining tools Advantage : Data mining tools may organize data so they can run faster Disadvantage: Can be very "expensiveâ to move large amounts of data b) Data mining tools can access data directly in the Data Warehouse Advantage: No copy of data is needed for data mining Disadvantage: Data may not be organized in a way that is efficient for the tool 14
20. Data Mining The Data Mining Process Step 1: Data preparation: cleanup ("scrubbing"), selection, check by specialists for the data. (ï data warehouse) Step 2: Analysis phase, process the data by a data mining algorithm. Phase 3: Evaluation of the output, check if something new was discovered. 15
21. Data Mining The Data Mining Process Step 1 - Data preparation It is useful to fetch data from a data warehouse. This eliminates the need of collecting data from different sources, filtering and handling inconsistencies. Theoretically a data warehouse is not absolutely necessary, but in practice it is. The data preparation process includes data selection and manipulation. Validating and cleaning is necessary to eliminate out-of-range values and to handle missing values of our raw data. This may include plausibility checks. 16
22. Data Mining The Data Mining Process Step 1 - Data preparation Even if the data warehouse data are already cleaned and filtered, experience shows that this is not good enough for data mining. The Data preparation also includes formatting, scaling and transformation of the raw data depending on the needs of the data mining algorithm. Examples: scaling of numeric data, currency or metric/inch conversion. 17
23. Data Mining The Data Mining Process Step 1 - Data preparation Many joined tables may be involved, selecting of rows or columns may be necessary, or two fields are combined as ratio, or we need derived values. This process needs guidance of someone with a good knowledge about the data and the problem domain. It is usual that this data preparation consumes 50% to 80% of the data mining budget. 18
24. Data Mining The Data Mining Process Step 2 - Analysis phase process the data by a data mining algorithm. information discovery Analysis servicesbuild in: Association Clustering Decision Trees Linear Regression Logistic Regression Naives Bayes Neural Network 19
25. Data Mining The Data Mining Process Step 3 - Evaluation oftheoutput The interpretation and presentation of the results. The purpose is either decision support or the application development. Presentation: A graphical representation is often useful to present the results to executives. Example in text form: "If a customer buys washers or dryers, 61% buy a service agreement. This pattern is present in 1.0% of the transactions". 20
26. Data Mining The Data Mining Process Step 3 - Evaluation oftheoutput Interpretation of the output data might be necessary. Data mining may replace DSS/EIS(which is mainly a query application with a graphical display). In addition to traditional business software with a clearly visible idea and algorithm, it can also offer the possibility to construct an automated decision support. 21
27. Data Mining The Data Mining Process Step 3 - Evaluation oftheoutput Automated decision support - Example: Every loan application of a bank is passed to a previously trained neural network and results in a score for loan rejected to loan approved. The results of data mining on lots of loan contracts lead to training of the neural network. Such algorithms may work even if the underlying processes are not well understood. Warning: neural networks are a black box!! 22
28. References Additional Books and References for Data Mining David A. Grossman, Ophir Frieder: Introductionto Data Mining, Illinois Institute of Technology 2005 J.P. Bigus, Data Mining withNeural Networks, McGraw-Hill, 1996 Olivia Parr Rud et. al, Data Mining Cookbook - Modeling Data for Marketing, Risk, and Customer Relationship Management, Wiley, 2001 NongYe (ed.): The Handbook of Data Mining, Lawrence Erlbaum Associates, 2003 http://www.eruditionhome.com/datamining/http://en.wikipedia.org/wiki/Data_mininghttp://www.the-data-mine.com/bin/view/Misc/IntroductionToDataMining 23