SlideShare ist ein Scribd-Unternehmen logo
1 von 38
BUSINESS INTELLIGENCE
AND DATA WAREHOUSING.
Presented by:
Vaishnavi Chigarapalle.
Agenda:
• ID3 Algorithm.
• WEKA.
• Web Mining Applications for Business.
• References.
ID3 Algorithm.
Overview:
• What is ID3 ?
• Decision Trees.
• Simple example of Decision Trees.
• ID3 Algorithm.
• Problem.
• Solution to the discussed problem.
• Conclusion.
What is ID3 ?
• ID3 Stands for Iterative Dichotomiser 3.
• This is a mathematical algorithm for building Decision Trees from a
dataset.
• Invented by J . Ross Quinlan in 1979.
• Uses Information Theory invented by Shannon in 1948.
• The algorithm attempts to create smallest possible decision tree from
top down, with no backtracking.
• ID3 is the precursor to the C4.5 algorithm.
• This is typically used in machine learning and Natural Language
Processing Domains.
Decision trees
• The tree consists of decision nodes and leaf nodes.
• A decision node has two or more branches, each representing values for the
attribute set.
• A leaf node attribute produces a homogeneous result, which does not
require additional classification testing.
• Decision trees are produced by algorithms that identify various ways of
splitting a data set into branch-like segments.
• These segments form an inverted decision tree that originates with a root
node at the top of the tree.
Simple Example of a Decision Tree.
ID3 Algorithm
• First step involves creating a root node for the tree.
• If all the examples turn out to be containing positive values then return the
single-node r=tree root, with label „+‟.
• If all the examples turn out to be containing negative values then return the
single-node root, with label „-„.
• If the number of predicting attributes is empty, then return the single node
tree root, with label being the most common value of the target attribute.
• Else
A = Attribute that best classifies examples.
Decision tree attribute for root that equals to A.
For each possible value, vi, of A,
 Add a new tree branch below root, corresponding to the test A = vi.
ID3 Algorithm
 Let examples (vi), be the subset of examples that have the value vi
for A.
 If examples (vi) is empty
 Then below this new branch add a leaf node with label equal to most
common target value in the examples.
– Else below this new branch add the subtree ID3 (Examples
(vi), Target_Attribute, Attributes-{A}).
• End
• Return Root.
Conclusion
• ID3 attempts to make the shortest decision tree out of a set of learning
data, shortest is not always the best classification.
• Requires learning data to have completely consistent patterns with no
uncertainty.
WEKA
(University of Waikato)
Overview
• What is WEKA ?
• WEKA GUI Chooser.
• Data Mining with WEKA.
• Problem.
• Solution for the discussed problem.
• Conclusion
What is WEKA ?
• WEKA is an acronym for Waikato Analysis for Knowledge Analysis.
• This is a popular suite of machine learning software written in Java.
• This is developed at University of Waikato, New Zealand.
• WEKA is portable, since it is fully implemented in the Java programming
language and thus runs on almost any modern computing platform.
• WEKA is free software available under the GNU General Public License.
• WEKA‟s applications:
 Explorer.
 Knowledge Flow.
 Experimenter.
 Simple CLI.
WEKA GUI Chooser.
Data Mining With WEKA
Input
•Raw data
Data Mining by WEKA
•Pre-processing
•Classification
•Regression
•Clustering
•Association Rules
•Visualization
Output
•Result
Explorer
• Explorer is WEKA‟s main user interface.
• The Explorer interface features several panels providing access to the main
component of the work bench :
 Preprocess.
 Classify
 Associate
 Cluster
 Select Attributes
 Visualize.
• Preprocess Panel: This can be used to transform the data and make it
possible to delete the instances and attributes according to specific criteria.
• Classify Panel: Enables the users to apply classification and regression
algorithms to resulting dataset, to estimate accuracy of the resulting
predictive model.
• Associate Panel: This provides access to association rule learners that
attempt to identify all important interrelationships between attributes in the
data.
• Cluster Panel: This gives access to the clustering techniques in WEKA.
• Select Panel: This panel provides algorithms for identifying the most
predictive attributes in a dataset.
• Visualize Panel: This panel shows a scatter plot matrix, where individual
scatter plots can be selected and enlarged, and analyzed further using
various selection operators.
Experimenter
• This allows the systematic comparison of the predictive performance of
WEKA‟s machine learning algorithms on a collection of datasets.
• Experimenter also allows us to set large-scale experiments, start them
running, leave them, and they analyze the performance statistics that have
been collected.
• They automate the experimental process.
• The statistics can be stored in ARFF format.
• It allows users to distribute the computing load across multiple machines
using Java RMI.
The Experimenter
Knowledge Flow
• The Knowledge Flow provides an alternative to the Explorer as a graphical
front end to WEKA‟s core algorithms.
• The Knowledge Flow presents a data-flow inspired interface to WEKA.
• The user can select WEKA components from a tool bar, place them on a
layout canvas and connect them together in order to form a knowledge for
Flow processing and analyzing data.
• Unlike the Explorer the Knowledge Flow can handle data either
incrementally or in batches.
Knowledge-Flow
Simple CLI
• Simple CLI provides a command line mode to access WEKA.
Conclusion
• In sum, the overall goal of WEKA is to build a state-of-the-art facility for
developing machine learning (ML) techniques and allow people to apply
them to real-world data mining problems.
• Detailed documentation about different functions provided by WEKA can
be found on WEKA website.
WEB MINING
Overview
• What is Web mining ?
• Challenges related to web mining.
• Web mining applications.
• Problems with Web search.
• Improvised search – adding structure to the web.
• Conclusion.
What is Web Mining ?
• Web mining is the use of data mining techniques to automatically discover
and extract information from web documents / services.
• Discovering useful information from the World-wide Web and its usage
patterns.
• Web mining can be divided into three different type:
 Web usage mining.
 Web Content mining.
 Web structure mining.
Challenges related to Web Mining
• The web is a huge collection of documents except for the following:
 Hyperlink information
 Access and usage information.
• The web is very dynamic, new pages are constantly being generated.
• Challenge: The main challenge is to develop new web mining algorithms
and adapt traditional data mining algorithms to exploit hyperlinks and
access patterns.
Web Mining Applications
• E-Commerce (Infrastructure)
 Generate User profiles.
 Internet Advertising.
 Fraud.
 Similar Image Retrieval.
• Information retrieval (search) on web
 Automatic generation of topic hierarchies.
 Web Knowledge bases.
 Extraction of schema for XML documents.
• Network Management
 Performance Management.
 Fault Management.
User Profiling.
• Important for improving customization:
 Provides users with pages, advertisements of interest.
 Example profiles: on-line trader, on-line shopper.
• Generate user profiles based on their access patterns
 Cluster users based on frequently accessed URLs
 Use classifier to generate a profile for each cluster.
Internet Advertising.
• Scheme 1:
 Manually associate a set of ads with each user profile.
 For each user, display an ad from the set based on profile.
• Scheme 2:
 Automate association between ads and users.
 Use ad click information to cluster users.
 For each cluster, find ads that occur most frequently in the cluster and these
become the ads for the set of users in the cluster.
Fraud
• With the growing popularity of E-commerce, systems to detect and prevent
fraud on the web become important.
• Maintain a signature for each user based on buying patterns on the web.
• If buying pattern changes significantly, then signal fraud.
• HNC software uses domain knowledge and neural networks for credit card
fraud detection.
Image Retrieval System
• Given:
 A set of images
• Find:
 All images similar to a given image.
 All pairs of similar images.
• Few applications of the image retrieval system are :
 Medical diagnosis.
 Weather Prediction
 Web search engine for images.
 E-commerce.
Problems with Web Search
• Today‟s search engine are plagued by many problems and few of them are
as mentioned below:
 The “abundance” problem.
 “Limited coverage” of the web.
(largest crawlers cover less than 18% of all the web pages.
 “Limited Query” interface based on keyword-oriented search.
 “Limited customization” to individual users.
 Web is “highly dynamic”.
Improvised searching – Adding
structure to the web
Conclusion
• Web mining systems needs to be implemented to:
 Understand visitor‟s profiles.
 Identify company‟s strength and weaknesses.
 Measure the effectiveness of online marketing efforts.
• Web mining support on-going continuous improvements for E-businesses.
References
• http://www.slideshare.net/dataminingtools/WEKA-the-experimenter
• http://www.cs.waikato.ac.nz/ml/WEKA/arff.html
• http://en.wikipedia.org/wiki/WEKA_(machine_learning)
• http://www.cs.umd.edu/Grad/scholarlypapers/papers/Bahety.pdf
• http://software.ucv.ro/~eganea/AIR/KnowledgeFlowTutorial-3-5-8.pdf
Business intelligence and data warehousing

Weitere ähnliche Inhalte

Andere mochten auch

Management - Chapter 7 : Individual & Group Decision Making
Management - Chapter 7 : Individual & Group Decision MakingManagement - Chapter 7 : Individual & Group Decision Making
Management - Chapter 7 : Individual & Group Decision MakingUTAR
 
Decision Trees Notes
Decision Trees NotesDecision Trees Notes
Decision Trees Notesmattbentley34
 
Implementing analytics? You need decision modeling and business rules
Implementing analytics? You need decision modeling and business rulesImplementing analytics? You need decision modeling and business rules
Implementing analytics? You need decision modeling and business rulesDecision Management Solutions
 
L7 decision tree & table
L7 decision tree & tableL7 decision tree & table
L7 decision tree & tableNeha Gupta
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Sunil Nair
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learningAmr BARAKAT
 
decision making criterion
decision making criteriondecision making criterion
decision making criterionGaurav Sonkar
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3Laila Fatehy
 
Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree AnalysisAnand Arora
 
Food and fashion
Food and fashion Food and fashion
Food and fashion Any Ataide
 

Andere mochten auch (19)

IBM: Managerial Decision Making
IBM: Managerial Decision MakingIBM: Managerial Decision Making
IBM: Managerial Decision Making
 
Management - Chapter 7 : Individual & Group Decision Making
Management - Chapter 7 : Individual & Group Decision MakingManagement - Chapter 7 : Individual & Group Decision Making
Management - Chapter 7 : Individual & Group Decision Making
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Trees Notes
Decision Trees NotesDecision Trees Notes
Decision Trees Notes
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Implementing analytics? You need decision modeling and business rules
Implementing analytics? You need decision modeling and business rulesImplementing analytics? You need decision modeling and business rules
Implementing analytics? You need decision modeling and business rules
 
L7 decision tree & table
L7 decision tree & tableL7 decision tree & table
L7 decision tree & table
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
 
Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
decision making criterion
decision making criteriondecision making criterion
decision making criterion
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree example problem
Decision tree example problemDecision tree example problem
Decision tree example problem
 
Decision trees
Decision treesDecision trees
Decision trees
 
Decision Tree Analysis
Decision Tree AnalysisDecision Tree Analysis
Decision Tree Analysis
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Contemporary issues 04 09-2014
Contemporary issues 04 09-2014Contemporary issues 04 09-2014
Contemporary issues 04 09-2014
 
Projects
ProjectsProjects
Projects
 
Food and fashion
Food and fashion Food and fashion
Food and fashion
 

Ähnlich wie Business intelligence and data warehousing

Data Structure and Algorithms
Data Structure and AlgorithmsData Structure and Algorithms
Data Structure and Algorithmsiqbalphy1
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusoneDotNetCampus
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATADotNetCampus
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroSi Krishan
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellSri Ambati
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013Neo4j
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 antimo musone
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskQuantUniversity
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Software design with Domain-driven design
Software design with Domain-driven design Software design with Domain-driven design
Software design with Domain-driven design Allan Mangune
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
 

Ähnlich wie Business intelligence and data warehousing (20)

Data Structure and Algorithms
Data Structure and AlgorithmsData Structure and Algorithms
Data Structure and Algorithms
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Net campus2015 antimomusone
Net campus2015 antimomusoneNet campus2015 antimomusone
Net campus2015 antimomusone
 
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATAPREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
 
Lecture2 (1).ppt
Lecture2 (1).pptLecture2 (1).ppt
Lecture2 (1).ppt
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Introduction
IntroductionIntroduction
Introduction
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Software design with Domain-driven design
Software design with Domain-driven design Software design with Domain-driven design
Software design with Domain-driven design
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 

Mehr von Vaishnavi

Traffic lights detecting a car that has pulled
Traffic lights detecting a car that has pulledTraffic lights detecting a car that has pulled
Traffic lights detecting a car that has pulledVaishnavi
 
Magnetic stripe on the back of credit card
Magnetic stripe on the back of credit cardMagnetic stripe on the back of credit card
Magnetic stripe on the back of credit cardVaishnavi
 
Web services using sales force.com
Web services using sales force.comWeb services using sales force.com
Web services using sales force.comVaishnavi
 
5 g wireless systems
5 g wireless systems5 g wireless systems
5 g wireless systemsVaishnavi
 
Brain storming ideas for tackling the
Brain storming ideas for tackling theBrain storming ideas for tackling the
Brain storming ideas for tackling theVaishnavi
 
Is cloud computing really ready for prime time
Is cloud computing really ready for prime timeIs cloud computing really ready for prime time
Is cloud computing really ready for prime timeVaishnavi
 
Synchronization of multihop sensor networks in the app layer
Synchronization of multihop sensor networks in the app layerSynchronization of multihop sensor networks in the app layer
Synchronization of multihop sensor networks in the app layerVaishnavi
 
Tackling the sleep problem
Tackling the sleep problemTackling the sleep problem
Tackling the sleep problemVaishnavi
 

Mehr von Vaishnavi (9)

Traffic lights detecting a car that has pulled
Traffic lights detecting a car that has pulledTraffic lights detecting a car that has pulled
Traffic lights detecting a car that has pulled
 
Magnetic stripe on the back of credit card
Magnetic stripe on the back of credit cardMagnetic stripe on the back of credit card
Magnetic stripe on the back of credit card
 
Web services using sales force.com
Web services using sales force.comWeb services using sales force.com
Web services using sales force.com
 
5 g wireless systems
5 g wireless systems5 g wireless systems
5 g wireless systems
 
Barcode
BarcodeBarcode
Barcode
 
Brain storming ideas for tackling the
Brain storming ideas for tackling theBrain storming ideas for tackling the
Brain storming ideas for tackling the
 
Is cloud computing really ready for prime time
Is cloud computing really ready for prime timeIs cloud computing really ready for prime time
Is cloud computing really ready for prime time
 
Synchronization of multihop sensor networks in the app layer
Synchronization of multihop sensor networks in the app layerSynchronization of multihop sensor networks in the app layer
Synchronization of multihop sensor networks in the app layer
 
Tackling the sleep problem
Tackling the sleep problemTackling the sleep problem
Tackling the sleep problem
 

Kürzlich hochgeladen

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Kürzlich hochgeladen (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Business intelligence and data warehousing

  • 1. BUSINESS INTELLIGENCE AND DATA WAREHOUSING. Presented by: Vaishnavi Chigarapalle.
  • 2. Agenda: • ID3 Algorithm. • WEKA. • Web Mining Applications for Business. • References.
  • 4. Overview: • What is ID3 ? • Decision Trees. • Simple example of Decision Trees. • ID3 Algorithm. • Problem. • Solution to the discussed problem. • Conclusion.
  • 5. What is ID3 ? • ID3 Stands for Iterative Dichotomiser 3. • This is a mathematical algorithm for building Decision Trees from a dataset. • Invented by J . Ross Quinlan in 1979. • Uses Information Theory invented by Shannon in 1948. • The algorithm attempts to create smallest possible decision tree from top down, with no backtracking. • ID3 is the precursor to the C4.5 algorithm. • This is typically used in machine learning and Natural Language Processing Domains.
  • 6. Decision trees • The tree consists of decision nodes and leaf nodes. • A decision node has two or more branches, each representing values for the attribute set. • A leaf node attribute produces a homogeneous result, which does not require additional classification testing. • Decision trees are produced by algorithms that identify various ways of splitting a data set into branch-like segments. • These segments form an inverted decision tree that originates with a root node at the top of the tree.
  • 7. Simple Example of a Decision Tree.
  • 8. ID3 Algorithm • First step involves creating a root node for the tree. • If all the examples turn out to be containing positive values then return the single-node r=tree root, with label „+‟. • If all the examples turn out to be containing negative values then return the single-node root, with label „-„. • If the number of predicting attributes is empty, then return the single node tree root, with label being the most common value of the target attribute. • Else A = Attribute that best classifies examples. Decision tree attribute for root that equals to A. For each possible value, vi, of A,  Add a new tree branch below root, corresponding to the test A = vi.
  • 9. ID3 Algorithm  Let examples (vi), be the subset of examples that have the value vi for A.  If examples (vi) is empty  Then below this new branch add a leaf node with label equal to most common target value in the examples. – Else below this new branch add the subtree ID3 (Examples (vi), Target_Attribute, Attributes-{A}). • End • Return Root.
  • 10. Conclusion • ID3 attempts to make the shortest decision tree out of a set of learning data, shortest is not always the best classification. • Requires learning data to have completely consistent patterns with no uncertainty.
  • 12. Overview • What is WEKA ? • WEKA GUI Chooser. • Data Mining with WEKA. • Problem. • Solution for the discussed problem. • Conclusion
  • 13. What is WEKA ? • WEKA is an acronym for Waikato Analysis for Knowledge Analysis. • This is a popular suite of machine learning software written in Java. • This is developed at University of Waikato, New Zealand. • WEKA is portable, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform. • WEKA is free software available under the GNU General Public License. • WEKA‟s applications:  Explorer.  Knowledge Flow.  Experimenter.  Simple CLI.
  • 15. Data Mining With WEKA Input •Raw data Data Mining by WEKA •Pre-processing •Classification •Regression •Clustering •Association Rules •Visualization Output •Result
  • 16. Explorer • Explorer is WEKA‟s main user interface. • The Explorer interface features several panels providing access to the main component of the work bench :  Preprocess.  Classify  Associate  Cluster  Select Attributes  Visualize. • Preprocess Panel: This can be used to transform the data and make it possible to delete the instances and attributes according to specific criteria. • Classify Panel: Enables the users to apply classification and regression algorithms to resulting dataset, to estimate accuracy of the resulting predictive model.
  • 17. • Associate Panel: This provides access to association rule learners that attempt to identify all important interrelationships between attributes in the data. • Cluster Panel: This gives access to the clustering techniques in WEKA. • Select Panel: This panel provides algorithms for identifying the most predictive attributes in a dataset. • Visualize Panel: This panel shows a scatter plot matrix, where individual scatter plots can be selected and enlarged, and analyzed further using various selection operators.
  • 18.
  • 19. Experimenter • This allows the systematic comparison of the predictive performance of WEKA‟s machine learning algorithms on a collection of datasets. • Experimenter also allows us to set large-scale experiments, start them running, leave them, and they analyze the performance statistics that have been collected. • They automate the experimental process. • The statistics can be stored in ARFF format. • It allows users to distribute the computing load across multiple machines using Java RMI.
  • 21. Knowledge Flow • The Knowledge Flow provides an alternative to the Explorer as a graphical front end to WEKA‟s core algorithms. • The Knowledge Flow presents a data-flow inspired interface to WEKA. • The user can select WEKA components from a tool bar, place them on a layout canvas and connect them together in order to form a knowledge for Flow processing and analyzing data. • Unlike the Explorer the Knowledge Flow can handle data either incrementally or in batches.
  • 23. Simple CLI • Simple CLI provides a command line mode to access WEKA.
  • 24. Conclusion • In sum, the overall goal of WEKA is to build a state-of-the-art facility for developing machine learning (ML) techniques and allow people to apply them to real-world data mining problems. • Detailed documentation about different functions provided by WEKA can be found on WEKA website.
  • 26. Overview • What is Web mining ? • Challenges related to web mining. • Web mining applications. • Problems with Web search. • Improvised search – adding structure to the web. • Conclusion.
  • 27. What is Web Mining ? • Web mining is the use of data mining techniques to automatically discover and extract information from web documents / services. • Discovering useful information from the World-wide Web and its usage patterns. • Web mining can be divided into three different type:  Web usage mining.  Web Content mining.  Web structure mining.
  • 28. Challenges related to Web Mining • The web is a huge collection of documents except for the following:  Hyperlink information  Access and usage information. • The web is very dynamic, new pages are constantly being generated. • Challenge: The main challenge is to develop new web mining algorithms and adapt traditional data mining algorithms to exploit hyperlinks and access patterns.
  • 29. Web Mining Applications • E-Commerce (Infrastructure)  Generate User profiles.  Internet Advertising.  Fraud.  Similar Image Retrieval. • Information retrieval (search) on web  Automatic generation of topic hierarchies.  Web Knowledge bases.  Extraction of schema for XML documents. • Network Management  Performance Management.  Fault Management.
  • 30. User Profiling. • Important for improving customization:  Provides users with pages, advertisements of interest.  Example profiles: on-line trader, on-line shopper. • Generate user profiles based on their access patterns  Cluster users based on frequently accessed URLs  Use classifier to generate a profile for each cluster.
  • 31. Internet Advertising. • Scheme 1:  Manually associate a set of ads with each user profile.  For each user, display an ad from the set based on profile. • Scheme 2:  Automate association between ads and users.  Use ad click information to cluster users.  For each cluster, find ads that occur most frequently in the cluster and these become the ads for the set of users in the cluster.
  • 32. Fraud • With the growing popularity of E-commerce, systems to detect and prevent fraud on the web become important. • Maintain a signature for each user based on buying patterns on the web. • If buying pattern changes significantly, then signal fraud. • HNC software uses domain knowledge and neural networks for credit card fraud detection.
  • 33. Image Retrieval System • Given:  A set of images • Find:  All images similar to a given image.  All pairs of similar images. • Few applications of the image retrieval system are :  Medical diagnosis.  Weather Prediction  Web search engine for images.  E-commerce.
  • 34. Problems with Web Search • Today‟s search engine are plagued by many problems and few of them are as mentioned below:  The “abundance” problem.  “Limited coverage” of the web. (largest crawlers cover less than 18% of all the web pages.  “Limited Query” interface based on keyword-oriented search.  “Limited customization” to individual users.  Web is “highly dynamic”.
  • 35. Improvised searching – Adding structure to the web
  • 36. Conclusion • Web mining systems needs to be implemented to:  Understand visitor‟s profiles.  Identify company‟s strength and weaknesses.  Measure the effectiveness of online marketing efforts. • Web mining support on-going continuous improvements for E-businesses.
  • 37. References • http://www.slideshare.net/dataminingtools/WEKA-the-experimenter • http://www.cs.waikato.ac.nz/ml/WEKA/arff.html • http://en.wikipedia.org/wiki/WEKA_(machine_learning) • http://www.cs.umd.edu/Grad/scholarlypapers/papers/Bahety.pdf • http://software.ucv.ro/~eganea/AIR/KnowledgeFlowTutorial-3-5-8.pdf