SlideShare ist ein Scribd-Unternehmen logo
1 von 6
LITERATURE SURVEY

Literature survey is the most important step in software development
process. Before developing the tool it is necessary to determine the time
factor, economy n company strength. Once these things r satisfied, ten next
steps is to determine which operating system and language can be used for
developing the tool. Once the programmers start building the tool the
programmers need lot of external support. This support can be obtained from
senior programmers, from book or from websites. Before building the
system the above consideration r taken into account for developing the
proposed system.

We have to analysis the DATA MINING Outline Survey:

Data Mining

Generally, data mining (sometimes called data or knowledge discovery) is
the process of analyzing data from different perspectives and summarizing it
into useful information - information that can be used to increase revenue,
cuts costs, or both. Data mining software is one of a number of analytical
tools for analyzing data. It allows users to analyze data from many different
dimensions or angles, categorize it, and summarize the relationships
identified. Technically, data mining is the process of finding correlations or
patterns among dozens of fields in large relational databases.


The Scope of Data Mining

Data mining derives its name from the similarities between searching for
valuable business information in a large database — for example, finding
linked products in gigabytes of store scanner data — and mining a mountain
for a vein of valuable ore. Both processes require either sifting through an
immense amount of material, or intelligently probing it to find exactly where
the value resides. Given databases of sufficient size and quality, data mining
technology can generate new business opportunities by providing these
capabilities:

   •   Automated prediction of trends and behaviors. Data mining
       automates the process of finding predictive information in large
       databases. Questions that traditionally required extensive hands-on
analysis can now be answered directly from the data — quickly. A
       typical example of a predictive problem is targeted marketing. Data
       mining uses data on past promotional mailings to identify the targets
       most likely to maximize return on investment in future mailings.
       Other predictive problems include forecasting bankruptcy and other
       forms of default, and identifying segments of a population likely to
       respond similarly to given events.

   •   Automated discovery of previously unknown patterns. Data
       mining tools sweep through databases and identify previously hidden
       patterns in one step. An example of pattern discovery is the analysis
       of retail sales data to identify seemingly unrelated products that are
       often purchased together. Other pattern discovery problems include
       detecting fraudulent credit card transactions and identifying
       anomalous data that could represent data entry keying errors.

The most commonly used techniques in data mining are:

   •   Artificial neural networks: Non-linear predictive models that learn
       through training and resemble biological neural networks in structure.

   •   Decision trees: Tree-shaped structures that represent sets of decisions.
       These decisions generate rules for the classification of a dataset.
       Specific decision tree methods include Classification and Regression
       Trees (CART) and Chi Square Automatic Interaction Detection
       (CHAID) .

   •   Genetic algorithms: Optimization techniques that use process such as
       genetic combination, mutation, and natural selection in a design based
       on the concepts of evolution.

   •   Nearest neighbor method: A technique that classifies each record in
       a dataset based on a combination of the classes of the k record(s) most
       similar to it in a historical dataset (where k ³ 1). Sometimes called the
       k-nearest neighbor technique.

   •   Rule induction: The extraction of useful if-then rules from data based
       on statistical significance.
Architecture for Data Mining

To best apply these advanced techniques, they must be fully integrated with
a data warehouse as well as flexible interactive business analysis tools.
Many data mining tools currently operate outside of the warehouse,
requiring extra steps for extracting, importing, and analyzing the data.
Furthermore, when new insights require operational implementation,
integration with the warehouse simplifies the application of results from data
mining. The resulting analytic data warehouse can be applied to improve
business processes throughout the organization, in areas such as promotional
campaign management, fraud detection, new product rollout, and so on.
Figure 1 illustrates an architecture for advanced analysis in a large data
warehouse.




Figure 1 - Integrated Data Mining Architecture



The ideal starting point is a data warehouse containing a combination of
internal data tracking all customer contact coupled with external market data
about competitor activity. Background information on potential customers
also provides an excellent basis for prospecting. This warehouse can be
implemented in a variety of relational database systems: Sybase, Oracle,
Redbrick, and so on, and should be optimized for flexible and fast data
access.
Data Mining Products:
Data mining products are taking the industry by storm. The major database
vendors have already taken steps to ensure that their platforms incorporate
data mining techniques. Oracle's Data Mining Suite (Darwin) implements
classification and regression trees, neural networks, k-nearest neighbors,
regression analysis and clustering algorithms. Microsoft's SQL Server also
offers data mining functionality through the use of classification trees and
clustering algorithms. If you're already working in a statistics environment,
you're probably familiar with the data mining algorithm implementations
offered by the advanced statistical packages SPSS, SAS, and S-Plus.

Data Mining Uses

Classification:

This means getting to know your data. If you can categorize, classify, and/or
codify your data, you can place it into chunks that are manageable by a
human. Rather than dealing with 3.5 million merchants at a credit card
company, if we could classify them into 100 or 150 different classifications
that were virtually dead on for each merchant, a few employees could
manage the relationships rather than needing a sales and service force to deal
with each customer individually. Likewise, at a university, if an alumni
group treats its donors according to their classifications, part-time students
might be the representatives who work with minor donors and full-time
professionals might receive incoming calls from the donors whose names
appear on buildings on campus.

Estimation:

This process is useful in just about every facet of business. From finance to
marketing to Sales, the better you can estimate your expenses, product mix
optimization, or potential customer value, the better off you will be. This
and the next use are fairly self-evident if you have ever spent a day at a
business.

Prediction:
Forecasting, like estimation, is ubiquitous in business. Accurate prediction
can reduce Inventory levels (costs), optimize sales, blah, blah, blah. If you
can predict the future, you will rule the world.

Affinity Grouping/Market Basket Analysis

This is a use that marketing loves. Product placement within a store can be
set up based on sales maximization when you know what people buy
together. There are several schools of thought on how to do it. For example,
you know people buy paint and paint brushes together. One, do you make a
sale on paint then jack up the prices on brushes, two do you put the paint in
aisle 1 and the brushes in aisle 7 hoping that people walking from one to the
other will see something else they will need, three do you set cheap stuff on
the end of the aisle for everyone to see hoping they will buy it on impulse
knowing they will need something else with that impulse buy (chips and dip,
charcoal briquettes and lighter fluid, etc). As you can see, knowing what
people buy together has serious benefits for the retail world.

Clustering/Target Marketing:

Target marketing saves millions of dollars in wasted coupons, promotions,
etc. If you send your promo to only the most likely to accept the offer, use
the coupon, or buy your product, you will be much better served. If you sell
acne medication, sending coupons to people over sixty is usually a waste of
your marketing dollars. If, however, you can cluster your customers and
know which households have a 75% chance of having a teenager, you are
pushing your marketing on a group most likely to buy your product.

Conclusion

Comprehensive data warehouses that integrate operational data with
customer, supplier, and market information have resulted in an explosion of
information. Competition requires timely and sophisticated analysis on an
integrated view of the data. However, there is a growing gap between more
powerful storage and retrieval systems and the users’ ability to effectively
analyze and act on the information they contain. Both relational and OLAP
technologies have tremendous capabilities for navigating massive data
warehouses, but brute force navigation of data is not enough. A new
technological leap is needed to structure and prioritize information for
specific end-user problems. The data mining tools can make this leap.
Quantifiable business benefits have been proven through the integration of
data mining with current information systems, and new products are on the
horizon that will bring this integration to an even wider audience of users.

Weitere ähnliche Inhalte

Andere mochten auch (10)

Vidushi info tech mobile app 2013
Vidushi info tech mobile app 2013Vidushi info tech mobile app 2013
Vidushi info tech mobile app 2013
 
Census Hispanic Profile
Census Hispanic ProfileCensus Hispanic Profile
Census Hispanic Profile
 
Community chest school of fundraising
Community chest school of fundraisingCommunity chest school of fundraising
Community chest school of fundraising
 
Vidushi info tech mobile app 2013
Vidushi info tech mobile app 2013Vidushi info tech mobile app 2013
Vidushi info tech mobile app 2013
 
Team positive
Team positiveTeam positive
Team positive
 
MAGADASKARI
MAGADASKARIMAGADASKARI
MAGADASKARI
 
Recognising outstanding youth community service
Recognising outstanding youth community serviceRecognising outstanding youth community service
Recognising outstanding youth community service
 
Conversations engaging our communities
Conversations engaging our communitiesConversations engaging our communities
Conversations engaging our communities
 
Domain model example
Domain model exampleDomain model example
Domain model example
 
Unitat Didáctica Papallones
Unitat Didáctica PapallonesUnitat Didáctica Papallones
Unitat Didáctica Papallones
 

Kürzlich hochgeladen

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Kürzlich hochgeladen (20)

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

Literature survey

  • 1. LITERATURE SURVEY Literature survey is the most important step in software development process. Before developing the tool it is necessary to determine the time factor, economy n company strength. Once these things r satisfied, ten next steps is to determine which operating system and language can be used for developing the tool. Once the programmers start building the tool the programmers need lot of external support. This support can be obtained from senior programmers, from book or from websites. Before building the system the above consideration r taken into account for developing the proposed system. We have to analysis the DATA MINING Outline Survey: Data Mining Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. The Scope of Data Mining Data mining derives its name from the similarities between searching for valuable business information in a large database — for example, finding linked products in gigabytes of store scanner data — and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find exactly where the value resides. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities: • Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive hands-on
  • 2. analysis can now be answered directly from the data — quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events. • Automated discovery of previously unknown patterns. Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors. The most commonly used techniques in data mining are: • Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure. • Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) . • Genetic algorithms: Optimization techniques that use process such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution. • Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k ³ 1). Sometimes called the k-nearest neighbor technique. • Rule induction: The extraction of useful if-then rules from data based on statistical significance.
  • 3. Architecture for Data Mining To best apply these advanced techniques, they must be fully integrated with a data warehouse as well as flexible interactive business analysis tools. Many data mining tools currently operate outside of the warehouse, requiring extra steps for extracting, importing, and analyzing the data. Furthermore, when new insights require operational implementation, integration with the warehouse simplifies the application of results from data mining. The resulting analytic data warehouse can be applied to improve business processes throughout the organization, in areas such as promotional campaign management, fraud detection, new product rollout, and so on. Figure 1 illustrates an architecture for advanced analysis in a large data warehouse. Figure 1 - Integrated Data Mining Architecture The ideal starting point is a data warehouse containing a combination of internal data tracking all customer contact coupled with external market data about competitor activity. Background information on potential customers also provides an excellent basis for prospecting. This warehouse can be implemented in a variety of relational database systems: Sybase, Oracle, Redbrick, and so on, and should be optimized for flexible and fast data access.
  • 4. Data Mining Products: Data mining products are taking the industry by storm. The major database vendors have already taken steps to ensure that their platforms incorporate data mining techniques. Oracle's Data Mining Suite (Darwin) implements classification and regression trees, neural networks, k-nearest neighbors, regression analysis and clustering algorithms. Microsoft's SQL Server also offers data mining functionality through the use of classification trees and clustering algorithms. If you're already working in a statistics environment, you're probably familiar with the data mining algorithm implementations offered by the advanced statistical packages SPSS, SAS, and S-Plus. Data Mining Uses Classification: This means getting to know your data. If you can categorize, classify, and/or codify your data, you can place it into chunks that are manageable by a human. Rather than dealing with 3.5 million merchants at a credit card company, if we could classify them into 100 or 150 different classifications that were virtually dead on for each merchant, a few employees could manage the relationships rather than needing a sales and service force to deal with each customer individually. Likewise, at a university, if an alumni group treats its donors according to their classifications, part-time students might be the representatives who work with minor donors and full-time professionals might receive incoming calls from the donors whose names appear on buildings on campus. Estimation: This process is useful in just about every facet of business. From finance to marketing to Sales, the better you can estimate your expenses, product mix optimization, or potential customer value, the better off you will be. This and the next use are fairly self-evident if you have ever spent a day at a business. Prediction:
  • 5. Forecasting, like estimation, is ubiquitous in business. Accurate prediction can reduce Inventory levels (costs), optimize sales, blah, blah, blah. If you can predict the future, you will rule the world. Affinity Grouping/Market Basket Analysis This is a use that marketing loves. Product placement within a store can be set up based on sales maximization when you know what people buy together. There are several schools of thought on how to do it. For example, you know people buy paint and paint brushes together. One, do you make a sale on paint then jack up the prices on brushes, two do you put the paint in aisle 1 and the brushes in aisle 7 hoping that people walking from one to the other will see something else they will need, three do you set cheap stuff on the end of the aisle for everyone to see hoping they will buy it on impulse knowing they will need something else with that impulse buy (chips and dip, charcoal briquettes and lighter fluid, etc). As you can see, knowing what people buy together has serious benefits for the retail world. Clustering/Target Marketing: Target marketing saves millions of dollars in wasted coupons, promotions, etc. If you send your promo to only the most likely to accept the offer, use the coupon, or buy your product, you will be much better served. If you sell acne medication, sending coupons to people over sixty is usually a waste of your marketing dollars. If, however, you can cluster your customers and know which households have a 75% chance of having a teenager, you are pushing your marketing on a group most likely to buy your product. Conclusion Comprehensive data warehouses that integrate operational data with customer, supplier, and market information have resulted in an explosion of information. Competition requires timely and sophisticated analysis on an integrated view of the data. However, there is a growing gap between more powerful storage and retrieval systems and the users’ ability to effectively analyze and act on the information they contain. Both relational and OLAP technologies have tremendous capabilities for navigating massive data warehouses, but brute force navigation of data is not enough. A new technological leap is needed to structure and prioritize information for specific end-user problems. The data mining tools can make this leap.
  • 6. Quantifiable business benefits have been proven through the integration of data mining with current information systems, and new products are on the horizon that will bring this integration to an even wider audience of users.