SlideShare ist ein Scribd-Unternehmen logo
1 von 5
ABSTRACT
We live in the age of information. Data is the most valuable resource of an enterprise. In today’s
competitive global business environment, understanding and managing enterprise wide information is
crucial for making timely decisions and responding to changing business conditions. Many companies are
realizing a business advantage by leveraging one of their key assets – business Data. There is a
tremendous amount of data generated by day-to-day business operational applications. In addition there
is valuable data available from external sources such as market research organizations, independent
surveys and quality testing labs. Studies indicate that the amount of data in a given organization doubles
every 5 years.
Data warehousing has emerged as an increasingly popular and powerful concept of applying information
technology to turn these huge islands of data into meaningful information for better business. Data
mining, the extraction of hidden predictive information from large databases is a powerful new
technology with great potential to help companies focus on the most important information in their data
warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make
proactive,                                   knowledge-driven                                    decisions.
This paper describes the practicalities and the constraints in Data mining and Data warehousing and its
advancements from the earlier technologies.
INTRODUCTION

Data Warehousing
    A data warehouse can be defined as any centralized data repository which can be queried for
    business benefit
    Warehousing makes it possible to
   o Extract archived operational data
   o Overcome inconsistencies between different legacy data formats
   o Integrate data throughout an enterprise, regardless of location, format, or communication
       requirements
   o Incorporate additional or expert information
Data Mining
Data mining is not an “intelligence” tool or framework, typically drawn from an enterprise data
warehouse is used to analyze and uncover information about past performance on an aggregate level.
Data warehousing and business intelligence provide a method for users to anticipate future trends from
analyzing past patterns in organizational data. Data mining is more intuitive, allowing for increased
insight beyond data warehousing. An implementation of data mining in an organization will serve as a
guide to uncover inherent trends and tendencies in historical information, as well as allow for statistical
predictions, groupings and Classification of data.
Typical data warehousing implementations in organizations will allow users to ask and answer questions
such as “How many sales were made, by territory, by sales person between the months of May and June
in 1999?” Data mining will allow business decision makers to ask and answer questions, such as “Who is
my core customer that purchases a particular product we sell?” or “Geographically, how well would a line
of products sell in a particular region and who would purchase them, given the sale of similar products in
that region.
WHAT IS DATA MINING?
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data
from different perspectives and summarizing it into useful information – information that can be used to
increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for
analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it,
and summarize the relationships identified. Technically, data mining is the process of finding correlations
 or patterns among dozens of fields in large relational databases.
 Although data mining is a relatively new term, the technology is not. Companies have used powerful
 computers to sift through volumes of supermarket scanner data and analyze market research reports for
 years. However, continuous innovations in computer processing power, disk storage, and statistical
 software are dramatically increasing the accuracy of analysis while driving down the cost.
 WHAT IS DATA WAREHOUSING?
 Dramatic advances in data capture, processing power, data transmission, and storage capabilities are
 enabling organizations to integrate their various databases into data warehouses. Data warehousing is
 defined as a process of centralized data management and retrieval. Data warehousing, like data mining,
 is a relatively new term although the concept itself has been around for years. Data warehousing
 represents an ideal vision of maintaining a central repository of all organizational data. Centralization of
 data is needed to maximize user access and analysis. Dramatic technological advances are making this
 vision a reality for many companies. And, equally dramatic advances in data analysis software are
 allowing users to access this data freely. The data analysis software is what supports data mining.
 According to Bill Inman, author of Building the Data Warehouse and the guru who is widely considered to
 be the originator of the data warehousing concept, there are generally four characteristics that describe a
 data warehouse:
1. Subject-oriented: data are organized according to subject instead of application e.g. an insurance
     company using a data warehouse would organize their data by customer, premium, and claim, instead
     of by different products (auto, life, etc.). The data organized by subject contain only the information
     necessary for decision support processing.
2. Integrated: When data resides in many separate applications in the operational environment, encoding
     of data is often inconsistent. For instance, in one application, gender might be coded as “m” and “f” in
     another by 0 and 1. When data are moved from the operational environment into the data warehouse,
     they assume a consistent coding convention e.g. gender data is transformed to “m” and “f”.
3. Time-variant: The data warehouse contains a place for storing data that are five to 10 years old, or
     older, to be used for comparisons, trends, and forecasting. These data are not updated.
 An                Overview                 of              Data               Mining              Techniques:
 This overview provides a description of some of the most common data mining algorithms in use today.
 We have broken the discussion into two sections, each with a specific theme:
 1)       Classical     Techniques      such      as   statistics,   neighborhoods     and    clustering,    and
 2)        Next       Generation       Techniques        such      as     trees,    networks       and     rules.
 Each section will describe a number of data mining algorithms at a high level, focusing on the “big
 picture” so that the reader will be able to understand how each algorithm fits into the landscape of data
 mining techniques.
 HOW         DO        DATAMINING            AND       DATAWAREHOUSING               WORK         TOGETHER??
 Extracting meaningful information from numerous databases and cross-referencing it to find patterns,
 trends and correlations that might otherwise be overlooked is called “data mining.” Assembling the
 information in one place is called “data warehousing.”
Datamining and Data warehousing

1. All the information is stored in Information repositories.
2. Data warehouse takes the cleaned and integrated data.
3. The data taken by Data warehouse is selected and transformed and the useful data is sent through
    Data mining.
4. The data, which is sent through data mining is evaluated and presented.
 APPLICATIONS
 Data Warehousing
    Insulate data – i.e. the current operational information
   o Preserves the security and integrity of mission-critical OLTP applications
   o Gives access to the broadest possible base of data.
    Retrieve data – from a variety of heterogeneous operational databases
   o Data is transformed and delivered to the data warehouse/store based on a selected model (or
        mapping definition)
   o Metadata – information describing the model and definition of the source data elements
    Data cleansing – removal of certain aspects of operational data, such as low-level transaction
    information, which slow down the query times.
    Transfer – processed data transferred to the data warehouse, a large database on a high performance
    box.
 Data Mining
    Medicine – drug side effects, hospital cost analysis, genetic sequence analysis, prediction etc.
    Finance – stock market prediction, credit assessment, fraud detection etc.
    Marketing/sales – product analysis, buying patterns, sales prediction, target mailing, identifying
    `unusual behavior’ etc.
    Knowledge Acquisition
    Scientific discovery – superconductivity research, etc.
    Engineering – automotive diagnostic expert systems, fault detection etc.
 ADVANTAGES:
1. Enhances end-user access to a wide variety of data.
2. Business decision makers can obtain various kinds of trend reports e.g. the item with the most sales in
    a particular area / country for the last two years.
 A data warehouse can be a significant enabler of commercial business applications, most notably
 Customer relationship Management (CRM).
 DISADVANTAGES:
 Data mining systems rely on databases to supply the raw data for input and this raises problems in that
 databases tend be dynamic, incomplete, noisy, and large. Other problems arise as a result of the
 adequacy and relevance of the information stored.
Limited                                                                                        Information
A database is often designed for purposes different from data mining and sometimes the properties or
attributes that would simplify the learning task are not present nor can they be requested from the real
world. Inconclusive data causes problems because if some attributes essential to knowledge about the
application domain are not present in the data it may be impossible to discover significant knowledge
about a given domain. For example cannot diagnose malaria from a patient database if that database
does not contain the red blood cell count of the patients.
Missing data can be treated by discovery systems in a number of ways such as;
    Simply disregard missing values
    Omit the corresponding records
    Infer missing values from known values
    Treat missing data as a special value to be included additionally in the attribute domain
    Or average over the missing values using Bayesian techniques.
FUTURE VIEWS
The future of data mining lies in predictive analytics. The technology innovations in data mining since
2000 have been truly Darwinian and show promise of consolidating and stabilizing around predictive
analytics. Variations, novelties and new candidate features have been expressed in a proliferation of small
start-ups that have been ruthlessly culled from the herd by a perfect storm of bad economic news.
Nevertheless, the emerging market for predictive analytics has been sustained by professional services,
service bureaus (rent a recommendation) and profitable applications in verticals such as retail, consumer
finance, telecommunications, travel and leisure, and related analytic applications. Predictive analytics
have successfully proliferated into applications to support customer recommendations, customer value
and churn management, campaign optimization, and fraud detection. On the product side, success
stories in demand planning; just in time inventory and market basket optimization are a staple of
predictive analytics. Predictive analytics should be used to get to know the customer, segment and
predict customer behavior and forecast product demand and related market dynamics. Be realistic about
the required complex mixture of business acumen, statistical processing and information technology
support as well as the fragility of the resulting predictive model; but make no assumptions about the
limits of predictive analytics. Breakthroughs often occur in the application of the tools and methods to
new commercial opportunities.
Datamining and Data warehousing Future Views

CONCLUSION:
Comprehensive data warehouses that integrate operational data with customer, supplier, and market
information have resulted in an explosion of information. Competition requires timely and sophisticated
analysis on an integrated view of the data. However, there is a growing gap between more powerful
storage and retrieval systems and the users’ ability to effectively analyze and act on the information they
contain. Both relational and OLAP technologies have tremendous capabilities for navigating massive data
warehouses, but brute force navigation of data is not enough. A new technological leap is needed to
structure and prioritize information for specific end-user problems. The data mining tools can make this
leap. Quantifiable business benefits have been proven through the integration of data mining with current
information systems, and new products are on the horizon that will bring this integration to an even
wider audience of users.
    Data mining has a lot of potential
    Diversity in the field of application
    Estimated market for data mining is $500 million
REFERENCES:
Books                                                                                             Referred:
a.           Data             Mining:       concepts            and          techniques-Jiawei          Han
b.              Data             Mining          Techniques-             Arun           k.           Pujari.
c. Decision Support and Data Warehouse systems-EfremG.Mallach

Weitere ähnliche Inhalte

Was ist angesagt?

Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining8trackweb
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Seerat Malik
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining techniquePawneshwar Datt Rai
 
Business intelligence and data warehousing
Business intelligence and data warehousingBusiness intelligence and data warehousing
Business intelligence and data warehousingOZ Assignment help
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashokAshok Kumar
 
Data mining
Data mining Data mining
Data mining AthiraR23
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Miningsamiksha sharma
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentationmillerca2
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorialgrinu
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Harish Chand
 

Was ist angesagt? (20)

Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data mining
Data miningData mining
Data mining
 
Significance of Data Mining
Significance of Data MiningSignificance of Data Mining
Significance of Data Mining
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Data mining
Data miningData mining
Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining technique
 
Business intelligence and data warehousing
Business intelligence and data warehousingBusiness intelligence and data warehousing
Business intelligence and data warehousing
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashok
 
Data mining
Data mining Data mining
Data mining
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorial
 
Data mining
Data mining Data mining
Data mining
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Datamining
DataminingDatamining
Datamining
 

Andere mochten auch

Andere mochten auch (11)

Graph (edit)
Graph (edit)Graph (edit)
Graph (edit)
 
UB0203
UB0203UB0203
UB0203
 
Think aloud
Think aloudThink aloud
Think aloud
 
Think aloud
Think aloudThink aloud
Think aloud
 
UB0203
UB0203 UB0203
UB0203
 
UB0203
UB0203UB0203
UB0203
 
Findings
FindingsFindings
Findings
 
Chapter02
Chapter02Chapter02
Chapter02
 
Findings
FindingsFindings
Findings
 
Chapter01
Chapter01Chapter01
Chapter01
 
Melayu islam beraja pb1501
Melayu islam beraja pb1501Melayu islam beraja pb1501
Melayu islam beraja pb1501
 

Ähnlich wie Abstract

Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Data warehouse
Data warehouseData warehouse
Data warehouseMR Z
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeCognizant
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
notes_dmdw_chap1.docx
notes_dmdw_chap1.docxnotes_dmdw_chap1.docx
notes_dmdw_chap1.docxAbshar Fatima
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeThomas Kelly, PMP
 
Data Warehouse: A Primer
Data Warehouse: A PrimerData Warehouse: A Primer
Data Warehouse: A PrimerIJRTEMJOURNAL
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective ApproachIRJET Journal
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Toolsijsrd.com
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business IntelligenceSukirti Garg
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyKate Campbell
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 
BVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxBVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxDrNilimaThakur
 

Ähnlich wie Abstract (20)

Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
notes_dmdw_chap1.docx
notes_dmdw_chap1.docxnotes_dmdw_chap1.docx
notes_dmdw_chap1.docx
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Data Warehouse: A Primer
Data Warehouse: A PrimerData Warehouse: A Primer
Data Warehouse: A Primer
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing Technology
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
AIS 3 - EDITED.pdf
AIS 3 - EDITED.pdfAIS 3 - EDITED.pdf
AIS 3 - EDITED.pdf
 
BVRM 402 IMS UNIT V
BVRM 402 IMS UNIT VBVRM 402 IMS UNIT V
BVRM 402 IMS UNIT V
 
BVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxBVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptx
 
Data Science
Data ScienceData Science
Data Science
 

Kürzlich hochgeladen

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Kürzlich hochgeladen (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Abstract

  • 1. ABSTRACT We live in the age of information. Data is the most valuable resource of an enterprise. In today’s competitive global business environment, understanding and managing enterprise wide information is crucial for making timely decisions and responding to changing business conditions. Many companies are realizing a business advantage by leveraging one of their key assets – business Data. There is a tremendous amount of data generated by day-to-day business operational applications. In addition there is valuable data available from external sources such as market research organizations, independent surveys and quality testing labs. Studies indicate that the amount of data in a given organization doubles every 5 years. Data warehousing has emerged as an increasingly popular and powerful concept of applying information technology to turn these huge islands of data into meaningful information for better business. Data mining, the extraction of hidden predictive information from large databases is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This paper describes the practicalities and the constraints in Data mining and Data warehousing and its advancements from the earlier technologies. INTRODUCTION Data Warehousing A data warehouse can be defined as any centralized data repository which can be queried for business benefit Warehousing makes it possible to o Extract archived operational data o Overcome inconsistencies between different legacy data formats o Integrate data throughout an enterprise, regardless of location, format, or communication requirements o Incorporate additional or expert information Data Mining Data mining is not an “intelligence” tool or framework, typically drawn from an enterprise data warehouse is used to analyze and uncover information about past performance on an aggregate level. Data warehousing and business intelligence provide a method for users to anticipate future trends from analyzing past patterns in organizational data. Data mining is more intuitive, allowing for increased insight beyond data warehousing. An implementation of data mining in an organization will serve as a guide to uncover inherent trends and tendencies in historical information, as well as allow for statistical predictions, groupings and Classification of data. Typical data warehousing implementations in organizations will allow users to ask and answer questions such as “How many sales were made, by territory, by sales person between the months of May and June in 1999?” Data mining will allow business decision makers to ask and answer questions, such as “Who is my core customer that purchases a particular product we sell?” or “Geographically, how well would a line of products sell in a particular region and who would purchase them, given the sale of similar products in that region. WHAT IS DATA MINING? Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information – information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it,
  • 2. and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost. WHAT IS DATA WAREHOUSING? Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software are allowing users to access this data freely. The data analysis software is what supports data mining. According to Bill Inman, author of Building the Data Warehouse and the guru who is widely considered to be the originator of the data warehousing concept, there are generally four characteristics that describe a data warehouse: 1. Subject-oriented: data are organized according to subject instead of application e.g. an insurance company using a data warehouse would organize their data by customer, premium, and claim, instead of by different products (auto, life, etc.). The data organized by subject contain only the information necessary for decision support processing. 2. Integrated: When data resides in many separate applications in the operational environment, encoding of data is often inconsistent. For instance, in one application, gender might be coded as “m” and “f” in another by 0 and 1. When data are moved from the operational environment into the data warehouse, they assume a consistent coding convention e.g. gender data is transformed to “m” and “f”. 3. Time-variant: The data warehouse contains a place for storing data that are five to 10 years old, or older, to be used for comparisons, trends, and forecasting. These data are not updated. An Overview of Data Mining Techniques: This overview provides a description of some of the most common data mining algorithms in use today. We have broken the discussion into two sections, each with a specific theme: 1) Classical Techniques such as statistics, neighborhoods and clustering, and 2) Next Generation Techniques such as trees, networks and rules. Each section will describe a number of data mining algorithms at a high level, focusing on the “big picture” so that the reader will be able to understand how each algorithm fits into the landscape of data mining techniques. HOW DO DATAMINING AND DATAWAREHOUSING WORK TOGETHER?? Extracting meaningful information from numerous databases and cross-referencing it to find patterns, trends and correlations that might otherwise be overlooked is called “data mining.” Assembling the information in one place is called “data warehousing.”
  • 3. Datamining and Data warehousing 1. All the information is stored in Information repositories. 2. Data warehouse takes the cleaned and integrated data. 3. The data taken by Data warehouse is selected and transformed and the useful data is sent through Data mining. 4. The data, which is sent through data mining is evaluated and presented. APPLICATIONS Data Warehousing Insulate data – i.e. the current operational information o Preserves the security and integrity of mission-critical OLTP applications o Gives access to the broadest possible base of data. Retrieve data – from a variety of heterogeneous operational databases o Data is transformed and delivered to the data warehouse/store based on a selected model (or mapping definition) o Metadata – information describing the model and definition of the source data elements Data cleansing – removal of certain aspects of operational data, such as low-level transaction information, which slow down the query times. Transfer – processed data transferred to the data warehouse, a large database on a high performance box. Data Mining Medicine – drug side effects, hospital cost analysis, genetic sequence analysis, prediction etc. Finance – stock market prediction, credit assessment, fraud detection etc. Marketing/sales – product analysis, buying patterns, sales prediction, target mailing, identifying `unusual behavior’ etc. Knowledge Acquisition Scientific discovery – superconductivity research, etc. Engineering – automotive diagnostic expert systems, fault detection etc. ADVANTAGES: 1. Enhances end-user access to a wide variety of data. 2. Business decision makers can obtain various kinds of trend reports e.g. the item with the most sales in a particular area / country for the last two years. A data warehouse can be a significant enabler of commercial business applications, most notably Customer relationship Management (CRM). DISADVANTAGES: Data mining systems rely on databases to supply the raw data for input and this raises problems in that databases tend be dynamic, incomplete, noisy, and large. Other problems arise as a result of the adequacy and relevance of the information stored.
  • 4. Limited Information A database is often designed for purposes different from data mining and sometimes the properties or attributes that would simplify the learning task are not present nor can they be requested from the real world. Inconclusive data causes problems because if some attributes essential to knowledge about the application domain are not present in the data it may be impossible to discover significant knowledge about a given domain. For example cannot diagnose malaria from a patient database if that database does not contain the red blood cell count of the patients. Missing data can be treated by discovery systems in a number of ways such as; Simply disregard missing values Omit the corresponding records Infer missing values from known values Treat missing data as a special value to be included additionally in the attribute domain Or average over the missing values using Bayesian techniques. FUTURE VIEWS The future of data mining lies in predictive analytics. The technology innovations in data mining since 2000 have been truly Darwinian and show promise of consolidating and stabilizing around predictive analytics. Variations, novelties and new candidate features have been expressed in a proliferation of small start-ups that have been ruthlessly culled from the herd by a perfect storm of bad economic news. Nevertheless, the emerging market for predictive analytics has been sustained by professional services, service bureaus (rent a recommendation) and profitable applications in verticals such as retail, consumer finance, telecommunications, travel and leisure, and related analytic applications. Predictive analytics have successfully proliferated into applications to support customer recommendations, customer value and churn management, campaign optimization, and fraud detection. On the product side, success stories in demand planning; just in time inventory and market basket optimization are a staple of predictive analytics. Predictive analytics should be used to get to know the customer, segment and predict customer behavior and forecast product demand and related market dynamics. Be realistic about the required complex mixture of business acumen, statistical processing and information technology support as well as the fragility of the resulting predictive model; but make no assumptions about the limits of predictive analytics. Breakthroughs often occur in the application of the tools and methods to new commercial opportunities.
  • 5. Datamining and Data warehousing Future Views CONCLUSION: Comprehensive data warehouses that integrate operational data with customer, supplier, and market information have resulted in an explosion of information. Competition requires timely and sophisticated analysis on an integrated view of the data. However, there is a growing gap between more powerful storage and retrieval systems and the users’ ability to effectively analyze and act on the information they contain. Both relational and OLAP technologies have tremendous capabilities for navigating massive data warehouses, but brute force navigation of data is not enough. A new technological leap is needed to structure and prioritize information for specific end-user problems. The data mining tools can make this leap. Quantifiable business benefits have been proven through the integration of data mining with current information systems, and new products are on the horizon that will bring this integration to an even wider audience of users. Data mining has a lot of potential Diversity in the field of application Estimated market for data mining is $500 million REFERENCES: Books Referred: a. Data Mining: concepts and techniques-Jiawei Han b. Data Mining Techniques- Arun k. Pujari. c. Decision Support and Data Warehouse systems-EfremG.Mallach