SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
UNIT IV
Machine Learning
Machine learning is a method of data analysis that automates analytical model building. It is a
branch of artificial intelligence based on the idea that systems can learn from data, identify
patterns and make decisions with minimal human intervention. Once you have trained the
model, you can use it to reason over data that it hasn't seen before, and make predictions
about those data. For example, let's say you want to build an application that can recognize a
user's emotions based on their facial expressions. You can train a model by providing it with
images of faces that are each tagged with a certain emotion, and then you can use that model
in an application that can recognize any user's emotion.
While artificial intelligence (AI) is the broad science of mimicking human abilities, machine
learning is a specific subset of AI that trains a machine how to learn.
Because of new computing technologies, machine learning today is not like machine learning
of the past. It was born from pattern recognition and the theory that computers can learn
without being programmed to perform specific tasks; researchers interested in artificial
intelligence wanted to see if computers could learn from data. The iterative aspect of machine
learning is important because as models are exposed to new data, they are able to
independently adapt. They learn from previous computations to produce reliable, repeatable
decisions and results. It‘s a science that‘s not new – but one that has gained fresh momentum.
While many machine learning algorithms have been around for a long time, the ability to
automatically apply complex mathematical calculations to big data – over and over, faster
and faster – is a recent development. Here are a few widely publicized examples of machine
learning applications you may be familiar with:
 The heavily hyped, self-driving Google car? The essence of machine learning.
 Online recommendation offers such as those from Amazon and Netflix? Machine learning
applications for everyday life.
 Knowing what customers are saying about you on Twitter? Machine learning combined with
linguistic rule creation.
 Fraud detection? One of the more obvious, important uses in our world today.
Eg: https://www.sas.com/en_in/insights/analytics/machine-learning.html
Application of Machine Learning
Machine learning is one modern innovation that has helped man enhance not only many
industrial and professional processes but also advances everyday living. It is a subset of
artificial intelligence, which focuses on using statistical techniques to build intelligent
computer systems in order to learn from databases available to it. Currently, machine
learning has been used in multiple fields and industries.
The intelligent systems built on machine learning algorithms have the capability to learn from
past experience or historical data. Machine learning applications provide results on the basis
of past experience.
Image Recognition
Image recognition is one of the most common uses of machine learning. There are many
situations where you can classify the object as a digital image. For example, in the case of a
black and white image, the intensity of each pixel is served as one of the measurements. In
colored images, each pixel provides 3 measurements of intensities in three different colors –
red, green and blue (RGB).
Machine learning can be used for face detection in an image as well. There is a separate
category for each person in a database of several people. Machine learning is also used for
character recognition to discern handwritten as well as printed letters. We can segment a
piece of writing into smaller images, each containing a single character.
Speech Recognition
Speech recognition is the translation of spoken words into the text. It is also known as
computer speech recognition or automatic speech recognition. Here, a software application
can recognize the words spoken in an audio clip or file, and then subsequently convert the
audio into a text file. The measurement in this application can be a set of numbers that
represent the speech signal. We can also segment the speech signal by intensities in different
time-frequency bands.
Speech recognition is used in the applications like voice user interface, voice searches and
more. Voice user interfaces include voice dialing, call routing, and appliance control. It can
also be used a simple data entry and the preparation of structured documents.
Medical diagnosis
Machine learning can be used in the techniques and tools that can help in the diagnosis of
diseases. It is used for the analysis of the clinical parameters and their combination for the
prognosis example prediction of disease progression for the extraction of medical knowledge
for the outcome research, for therapy planning and patient monitoring. These are the
successful implementations of the machine learning methods. It can help in the integration of
computer-based systems in the healthcare sector.
Statistical Arbitrage
In finance, arbitrage refers to the automated trading strategies that are of a short-term and
involve a large number of securities. In these strategies, the user focuses on implementing the
trading algorithm for a set of securities on the basis of quantities like historical correlations
and the general economic variables. Machine learning methods are applied to obtain an index
arbitrage strategy. We apply linear regression and the Support Vector Machine to the prices
of a stream of stocks.
Learning associations
Learning associations is the process of developing insights into the various associations
between the products. A good example is how the unrelated products can be associated with
one another. One of the applications of machine learning is studying the associations between
the products that people buy. If a person buys a product, he will be shown similar products
because there is a relation between the two products. When any new products are launched in
the market, they are associated with the old ones to increase their sales.
Classification
A classification is a process of placing each individual under study in many classes.
Classification helps to analyze the measurements of an object to identify the category to
which that object belongs. To establish an efficient relation, analysts use data. For example,
before a bank decides to distribute loans, it assesses the customers on their ability to pay
loans. By considering the factors like customer‘s earnings, savings, and financial history, we
can do it. This information is taken from the past data on the loan.
Prediction
Machine learning can also be used in the prediction systems. Considering the loan example,
to compute the probability of a fault, the system will need to classify the available data in
groups. It is defined by a set of rules prescribed by the analysts. Once the classification is
done, we can calculate the probability of the fault. These computations can compute across
all the sectors for varied purposes. Making predictions is one of the best machine learning
applications.
Extraction
Extraction of information is one of the best applications of machine learning. It is the process
of extracting structured information from the unstructured data. For example, the web pages,
articles, blogs, business reports, and emails. The relational database maintains the output
produced by the information extraction. The process of extraction takes a set of documents as
input and outputs the structured data.
Regression
We can also implement machine learning in the regression as well. In regression, we can use
the principle of machine learning to optimize the parameters. It can also be used to decrease
the approximation error and calculate the closest possible outcome. We can also use the
machine learning for the function optimization. We can also choose to alter the inputs in
order to get the closest possible outcome.
Financial Services
Machine learning has a lot of potential in the financial and banking sector. It is the driving
force behind the popularity of the financial services. Machine learning can help the banks,
financial institutions to make smarter decisions. Machine learning can help the financial
services to spot an account closure before it occurs. It can also track the spending pattern of
the customers. Machine learning can also perform the market analysis. Smart machines can
be trained to track the spending patterns. The algorithms can identify the tends easily and can
react in real time.
Government
Government agencies such as public safety and utilities have a particular need for machine
learning since they have multiple sources of data that can be mined for insights. Analyzing
sensor data, for example, identifies ways to increase efficiency and save money. Machine
learning can also help detect fraud and minimize identity theft.
Health care
Machine learning is a fast-growing trend in the health care industry, thanks to the advent of
wearable devices and sensors that can use data to assess a patient's health in real time. The
technology can also help medical experts analyze data to identify trends or red flags that may
lead to improved diagnoses and treatment.
Retail
Websites recommending items you might like based on previous purchases are using machine
learning to analyze your buying history. Retailers rely on machine learning to capture data,
analyze it and use it to personalize a shopping experience, implement a marketing
campaign, price optimization, merchandise supply planning, and for customer insights.
Oil and gas
Finding new energy sources. Analyzing minerals in the ground. Predicting refinery sensor
failure. Streamlining oil distribution to make it more efficient and cost-effective. The number
of machine learning use cases for this industry is vast – and still expanding.
Transportation
Analyzing data to identify patterns and trends is key to the transportation industry, which
relies on making routes more efficient and predicting potential problems to increase
profitability. The data analysis and modeling aspects of machine learning are important tools
to delivery companies, public transportation and other transportation organizations.
Methods or Types of Machine Learning
Supervised learning algorithms are trained using labeled examples, such as an input where
the desired output is known. For example, a piece of equipment could have data points
labeled either ―F‖ (failed) or ―R‖ (runs). The learning algorithm receives a set of inputs along
with the corresponding correct outputs, and the algorithm learns by comparing its actual
output with correct outputs to find errors. It then modifies the model accordingly. Through
methods like classification, regression, prediction and gradient boosting, supervised learning
uses patterns to predict the values of the label on additional unlabeled data. Supervised
learning is commonly used in applications where historical data predicts likely future events.
For example, it can anticipate when credit card transactions are likely to be fraudulent or
which insurance customer is likely to file a claim.
Unsupervised learning is used against data that has no historical labels. The system is not
told the "right answer." The algorithm must figure out what is being shown. The goal is to
explore the data and find some structure within. Unsupervised learning works well on
transactional data. For example, it can identify segments of customers with similar attributes
who can then be treated similarly in marketing campaigns. Or it can find the main attributes
that separate customer segments from each other. Popular techniques include self-organizing
maps, nearest-neighbor mapping, k-means clustering and singular value decomposition.
These algorithms are also used to segment text topics, recommend items and identify data
outliers.
Semisupervised learning is used for the same applications as supervised learning. But it uses
both labeled and unlabeled data for training – typically a small amount of labeled data with a
large amount of unlabeled data (because unlabeled data is less expensive and takes less effort
to acquire). This type of learning can be used with methods such as classification, regression
and prediction. Semisupervised learning is useful when the cost associated with labeling is
too high to allow for a fully labeled training process. Early examples of this include
identifying a person's face on a web cam.
Reinforcement learning is often used for robotics, gaming and navigation. With
reinforcement learning, the algorithm discovers through trial and error which actions yield
the greatest rewards. This type of learning has three primary components: the agent (the
learner or decision maker), the environment (everything the agent interacts with) and actions
(what the agent can do). The objective is for the agent to choose actions that maximize the
expected reward over a given amount of time. The agent will reach the goal much faster by
following a good policy. So the goal in reinforcement learning is to learn the best policy.
How businesses are using machine learning
Machine learning is the core of some companies‘ business models, like in the case
of Netflix‘s suggestions algorithm or Google‘s search engine. Other companies are engaging
deeply with machine learning, though it‘s not their main business proposition.
Others are still trying to determine how to use machine learning in a beneficial way. ―In my
opinion, one of the hardest problems in machine learning is figuring out what problems I can
solve with machine learning,‖ Shulman said. ―There‘s still a gap in the understanding.‖
In a 2018 paper, researchers from the MIT Initiative on the Digital Economy outlined a 21-
question rubric to determine whether a task is suitable for machine learning. The researchers
found that no occupation will be untouched by machine learning, but no occupation is likely
to be completely taken over by it. The way to unleash machine learning success, the
researchers found, was to reorganize jobs into discrete tasks, some which can be done by
machine learning, and others that require a human.
Companies are already using machine learning in several ways, including:
Recommendation algorithms. The recommendation engines behind Netflix and YouTube
suggestions, what information appears on your Facebook feed, and product
recommendations are fueled by machine learning. ―[The algorithms] are trying to learn our
preferences,‖ Madry said. ―They want to learn, like on Twitter, what tweets we want them to
show us, on Facebook, what ads to display, what posts or liked content to share with us.‖
Image analysis and object detection. Machine learning can analyze images for different
information, like learning to identify people and tell them apart — though facial recognition
algorithms are controversial. Business uses for this vary. Shulman noted that hedge funds
famously use machine learning to analyze the number of cars in parking lots, which helps
them learn how companies are performing and make good bets.
Fraud detection. Machines can analyze patterns, like how someone normally spends or
where they normally shop, to identify potentially fraudulent credit card transactions, log-in
attempts, or spam emails.
Automatic helplines or chatbots. Many companies are deploying online chatbots, in which
customers or clients don‘t speak to humans, but instead interact with a machine. These
algorithms use machine learning and natural language processing, with the bots learning from
records of past conversations to come up with appropriate responses.
Self-driving cars. Much of the technology behind self-driving cars is based on machine
learning, deep learning in particular.
Medical imaging and diagnostics. Machine learning programs can be trained to examine
medical images or other information and look for certain markers of illness, like a tool that
can predict cancer risk based on a mammogram.
How has machine learning evolved?
1642 - Blaise Pascal invents a mechanical machine that can add, subtract, multiply and
divide.
1679 - Gottfried Wilhelm Leibniz devises the system of binary code.
1834 - Charles Babbage conceives the idea for a general all-purpose device that could be
programmed with punched cards.
1842 - Ada Lovelace describes a sequence of operations for solving mathematical problems
using Charles Babbage's theoretical punch-card machine and becomes the first programmer.
1847 - George Boole creates Boolean logic, a form of algebra in which all values can be
reduced to the binary values of true or false.
1936 - English logician and cryptanalyst Alan Turing proposes a universal machine that
could decipher and execute a set of instructions. His published proof is considered the basis
of computer science.
1952 - Arthur Samuel creates a program to help an IBM computer get better at checkers the
more it plays.
1959 - MADALINE becomes the first artificial neural network applied to a real-world
problem: removing echoes from phone lines.
1985 - Terry Sejnowski's and Charles Rosenberg's artificial neural network taught itself how
to correctly pronounce 20,000 words in one week.
1997 - IBM's Deep Blue beat chess grandmaster Garry Kasparov.
1999 - A CAD prototype intelligent workstation reviewed 22,000 mammograms and detected
cancer 52% more accurately than radiologists did.
2006 - Computer scientist Geoffrey Hinton invents the term deep learning to describe neural
net research.
2012 - An unsupervised neural network created by Google learned to recognize cats in
YouTube videos with 74.8% accuracy.
2014 - A chatbot passes the Turing Test by convincing 33% of human judges that it was a
Ukrainian teen named Eugene Goostman.
2014 - Google's AlphaGo defeats the human champion in Go, the most difficult board game
in the world.
2016 - LipNet, DeepMind's artificial intelligence system, identifies lip-read words in video
with an accuracy of 93.4%.
2019 - Amazon controls 70% of the market share for virtual assistants in the U.S.
What is the future of machine learning?
While machine learning algorithms have been around for decades, they've attained new
popularity as artificial intelligence has grown in prominence. Deep learning models, in
particular, power today's most advanced AI applications.
Machine learning platforms are among enterprise technology's most competitive realms, with
most major vendors, including Amazon, Google, Microsoft, IBM and others, racing to sign
customers up for platform services that cover the spectrum of machine learning activities,
including data collection, data preparation, data classification, model building, training and
application deployment.
As machine learning continues to increase its importance to business operations and AI
becomes more practical in enterprise settings, the machine learning platform wars will only
intensify.
Continued research into deep learning and AI is increasingly focused on developing more
general applications. Today's AI models require extensive training in order to produce an
algorithm that is highly optimized to perform one task. But some researchers are exploring
ways to make models more flexible and are seeking techniques that allow a machine to apply
context learned from one task to future, different tasks.
Business Intelligence (BI)
The term was coined in 1958 by an IBM researcher, Hans Peter Luhn and was first used in
1865, and was later adapted by Howard Dresner at Gartner in 1989, to describe making better
business decisions through searching, gathering, and analyzing the accumulated data saved
by an organization. Using the term ―Business Intelligence‖ as a description of decision-
making based on data technologies was both novel and far-sighted. Large companies first
used BI in the form of analyzing customer data systematically, as a necessary step in making
business decisions.
Business Intelligence can be described as a pipeline that spans across the entire realm of
managing complex data in organisations for generating intelligent outcomes that aid business
decision making. It include business objectives, methodologies, tools, techniques, models,
architecture, processing and communicating desired outcomes.
Note: Historical data: In a broad context, it is a collected data about past events and
circumstances pertaining to a particular subject. It includes most data generated either
manually or automatically within an enterprise. Sources may include press releases,
financial reports, project documentation, email and other communications.
Business intelligence (BI) helps organizations analyze historical and current data, so they can
quickly uncover actionable insights for making strategic decisions. Business intelligence
tools make this possible by processing large data sets across multiple sources and presenting
findings in visual formats that are easy to understand and share.
Benefits of using business intelligence
Because business intelligence tools speed up information analysis and performance
evaluation, they‘re valuable in helping companies reduce inefficiencies, flag potential
problems, find new revenue streams, and identify areas of future growth.
Some of the specific benefits that businesses experience when using BI include:
 Increased efficiency of operational processes.
 Insight into customer behavior and shopping patterns.
 Accurate tracking of sales, marketing, and financial performance.
 Clear benchmarks based on historical and current data.
 Instant alerts about data anomalies and customer issues.
 Analyses that can be shared in real-time across departments.
In the past, business intelligence tools were primarily used by data analysts and IT users.
Now, self-service BI platforms make business intelligence available to everyone from
executives to operations teams.
Stages of Business Intelligence
Business Intelligence is generally divided into four different stages which together form the
process of BI that businesses working with data should be aware of.
1. Information gathering
During the information gathering stage, data is either prepared from existing sources (existing
contact data, ERP data, financial database) or collected externally through the use of in-
person or online surveys, polls, questionnaires or forms.
Feedback data can be gathered from customers, staff or advisors, and consideration given to
anonymity and privacy in order to provide the most honest and reflective data possible.
2. Analysis
This is one of the key areas of turning raw data into information. BI makes it easier for the
user to explore the data and turn it into useful information. There are three common types of
analysis:
Spreadsheet Analysis - probably the oldest form of analysis where data from a spreadsheet
application is translated into tables, pivot tables and graphs in order to identify specific trends
and inconsistencies.
Software that allows users to develop their own specific data queries - where data has been
collected it may be automatically analysed by software or on importation - for example
results from a SurveyMonkey public survey.
Visualisation Tools – graphs and charts that take raw data and create visualisations that users
can read and understand - legacy programs like Crystal Reports and new technologies
like Power BI are good examples of visualisation tools.
3. Reporting
Once data has been analysed it needs to be reported on.
Reporting is the act of taking the analysed data and presenting it in a way that makes a human
connection, or some sort of focus where advantages are to be gained through actions.
Depending on the tools involved, reporting can happen as an extension of the analysis phase,
but for BI to be effective it must be reported on after being filtered or defined during the
analysis stage before being presented as a report.
Reports may be presented as tables of data on screen or paper, but can also be shown as pivot
tables, graphs, or as an executive summary in a corporate report.
4. Monitoring and Prediction
Business Intelligence is a circular process, and therefore the forth stage of monitoring and
prediction can flow on back to the first stage, being information gathering.
Monitoring allows the user to monitor data and information in real-time. Monitoring provides
snapshots between reporting periods or when making decisions. The three main types of
monitoring are:
Dashboard – A central location where all useful and actionable metrics and data are
contained. They are usually represented graphically to make it easier for users to read.
Key Performance Indicators (KPIs) – KPIs measure the performance of selected key drivers
from the organisation.
Business Performance Management – Also known as a Balanced Scorecard is a system that is
designed to ensure that performance goals for your organisation or projects are being met and
results are being delivered.
Prediction helps management predict what will happen based on the data currently available
and other trends. Prediction can be an incredibly complex form of BI, and uses a combination
of insights gathered during the analysis and monitor/predict stages in order to make decisions
on future outcomes, or on what data to focus on for the next Information Gathering stage.
TYPES OF BUSINESS INTELLIGENCE
Business intelligence combines a broad set of data analysis applications. Depending on your
needs, available data, tech stack, and the type of the task at hand here are the most common
deliverables of Business Intelligence implementation:
 Ad hoc analytics helps you answer a single business question. Focusing on a specific
issue, this tool can either generate a report that does not already exist or dig deeper
into a static report to get additional details about a particular business process or part
of operations.
 Online analytical processing (OLAP) allows users to extract and query certain data
in order to analyze it from different points of view. It is typically used to analyze
trends, financial reporting, sales forecasting, or other planning purposes.
 Real-time BI. Real-time business intelligence enables users to get up-to-the-minute
data by accessing operational systems or feeding business information into a real-time
data warehouse and/or BI system.
 Operational BI. Operational intelligence is an approach to data analysis that enables
business operations decisions and actions to be based on real-time data as it's
generated or collected by companies. Typically, the data analysis process is
automated, and the resulting information is integrated into operational systems for
immediate use by business managers and workers.
 Collaborative BI emerged through combining business intelligence software with
collaboration tools to support improved data-driven decision making.
 BI dashboards and data visualization display key business metrics at a glance.
Four Most Common Components of a Business Intelligence System
Business intelligence systems are used for intelligent exploration, integration, aggregation,
and a multidimensional analysis of data originating from various information resources and
the data is treated as a highly valuable corporate resource (Kronos & Yeoh, 2010).
For a business intelligence system, it is require, at least, four specific components to produce
business intelligence. They include (a) data warehouses, (b) ETL tools, (c) OLAP techniques
and (d) data mining (Olszak & Ziemba, 2006).
1. Data warehouses.
The data warehouse is considered the core component of a business intelligence system. This
collection of data is used to support the management decision-making Process. In addition to
providing the snapshot of historical data, a data warehouse also provides room for the
thematic storing of aggregated information, data that has been analyzed by an ETL tool then
loaded into the appropriate data warehouse. A well implemented data warehouse is easy to
use, allows for quick information recovery, stores more information, improves productivity,
allows for better decisions, increases an organization's competitive advantage. Hevner and
March (2005) conclude that the key role of a data warehouse is to provide an understanding
of business problems, opportunities, and performance based on compelling business
intelligence facilitating decision making.
Data Warehousing
Data Warehouse: This term was coined in 1980s. When the amount of data being collected
continued to grow significantly, there arise a requirement to store the data to help in
transforming data coming from operational systems into decision-making support systems.
Data Warehouses are normally part of an organization‘s mainframe server. A Data
Warehouse is normally optimized for a quick response time to queries. In a data warehouse,
data is often stored using a timestamp, Process of saving data based on time and date. If all
sales transactions were stored using timestamps, an organization could use a Data Warehouse
to compare the sales trends of each month.
A data warehousing is defined as a technique for collecting and managing data from varied
sources to provide meaningful business insights. It is a blend of technologies and components
which aids the strategic use of data.
It is electronic storage of a large amount of information by a business which is designed for
query and analysis instead of transaction processing. It is a process of transforming data into
information and making it available to users in a timely manner to make a difference. By
merging all of this information in one place, an organization can analyze its customers more
holistically. This helps to ensure that it has considered all the information available. Data
warehousing makes data mining possible. Data mining is looking for patterns in the data that
may lead to higher sales and profits.
Types of Data Warehouse: Three main types of Data Warehouses are:
1. Enterprise Data Warehouse: Enterprise Data Warehouse is a centralized warehouse. It
provides decision support service across the enterprise. It offers a unified approach for
organizing and representing data. It also provide the ability to classify data according to the
subject and give access according to those divisions.
2. Operational Data Store: Operational Data Store, which is also called ODS, are nothing
but data store required when neither Data warehouse nor OLTP systems support
organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is
widely preferred for routine activities like storing records of the Employees.
3. Data Mart:A data mart is a subset of the data warehouse, allows access rights for specific
functional teams or user groups and speed up the process of query, data transfer and analysis
at individual department level. It specially designed for a particular line of business, such as
sales, finance, Marketing etc. In an independent data mart, data can collect directly from
sources.
Data Lakes are similar to Data Warehouse (DW) as both are used to store data.
However data lakes store data in raw form in large scale as is captured from the source unlike
DW where it is stored methodically to facilitate analytical processes. In many organisations,
business leaders are using a hybrid solution for their analytical needs. The raw data, whether
unstructured data, text, audio, video, web data, sensor data are all stored together in a data
lake. It can compartmentalise the data depending on the source from where it is received or
the requirement of the business team. Simple analysis on this data can provide insights that
may be of interest to the business teams.
General stages of Data Warehousing
Earlier, organizations started relatively simple use of data warehousing. However, over time,
more sophisticated use of data warehousing begun.The following are general stages of use of
the data warehouse:
1. Offline Operational Database:
In this stage, data is just copied from an operational system to another server. In this way,
loading, processing, and reporting of the copied data do not impact the operational system's
performance.
2. Offline Data Warehouse:
Data in the Data warehouse is regularly updated from the Operational Database. The data in
Data warehouse is mapped and transformed to meet the Data warehouse objectives.
3. Real time Data Warehouse:
In this stage, Data warehouses are updated whenever any transaction takes place in
operational database. For example, Airline or railway booking system.
4. Integrated Data Warehouse:
In this stage, Data Warehouses are updated continuously when the operational system
performs a transaction. The Datawarehouse then generates transactions which are passed
back to the operational system.
Components of Data warehouse: Four components of Data Warehouses are:
1. Load manager: Load manager is also called the front component. It performs with all
the operations associated with the extraction and load of data into the warehouse.
These operations include transformations to prepare the data for entering into the Data
warehouse.
2. Warehouse Manager: Warehouse manager performs operations associated with the
management of the data in the warehouse. It performs operations like analysis of data
to ensure consistency, creation of indexes and views, generation of denormalization
and aggregations, transformation and merging of source data and archiving and
baking-up data.
3. Query Manager: Query manager is also known as backend component. It performs
all the operation operations related to the management of user queries. The operations
of this Data warehouse components are direct queries to the appropriate tables for
scheduling the execution of queries.
4. End-user access tools: This is categorized into five different groups like 1. Data
Reporting 2. Query Tools 3. Application development tools 4. EIS tools, 5. OLAP
tools and data mining tools.
Who needs Data warehouse?
Data warehouse is needed for all types of users like:
 Decision makers who rely on mass amount of data
 Users who use customized, complex processes to obtain information from multiple
data sources.
 It is also used by the people who want simple technology to access the data
 It also essential for those people who want a systematic approach for making
decisions.
 If the user wants fast performance on a huge amount of data which is a necessity for
reports, grids or charts, then Data warehouse proves useful.
 Data warehouse is a first step If you want to discover 'hidden patterns' of data-flows
and groupings.
Here, are most common sectors where Data warehouse is used:
Airline:
In the Airline system, it is used for operation purpose like crew assignment, analyses of route
profitability, frequent flyer program promotions, etc.
Banking:
It is widely used in the banking sector to manage the resources available on desk effectively.
Few banks also used for the market research, performance analysis of the product and
operations.
Healthcare:
Healthcare sector also used Data warehouse to strategize and predict outcomes, generate
patient's treatment reports, share data with tie-in insurance companies, medical aid services,
etc.
Public sector:
In the public sector, data warehouse is used for intelligence gathering. It helps government
agencies to maintain and analyze tax records, health policy records, for every individual.
Investment and Insurance sector:
In this sector, the warehouses are primarily used to analyze data patterns, customer trends,
and to track market movements.
Retain chain:
In retail chains, Data warehouse is widely used for distribution and marketing. It also helps to
track items, customer buying pattern, promotions and also used for determining pricing
policy.
Telecommunication:
A data warehouse is used in this sector for product promotions, sales decisions and to make
distribution decisions.
Hospitality Industry:
This Industry utilizes warehouse services to design as well as estimate their advertising and
promotion campaigns where they want to target clients based on their feedback and travel
patterns.
Steps to Implement Data Warehouse
The best way to address the business risk associated with a Data warehouse implementation
is to employ a three-prong strategy as below
1. Enterprise strategy: Here we identify technical including current architecture and
tools. We also identify facts, dimensions, and attributes. Data mapping and
transformation is also passed.
2. Phased delivery: Datawarehouse implementation should be phased based on subject
areas. Related business entities like booking and billing should be first implemented
and then integrated with each other.
3. Iterative Prototyping: Rather than a big bang approach to implementation, the Data
warehouse should be developed and tested iteratively.
Best practices to implement a Data Warehouse
1. Decide a plan to test the consistency, accuracy, and integrity of the data.
2. The data warehouse must be well integrated, well defined and time stamped.
3. While designing Data warehouse make sure you use right tool, stick to life cycle,
take care about data conflicts and ready to learn you're your mistakes.
4. Never replace operational systems and reports
5. Don't spend too much time on extracting, cleaning and loading data.
6. Ensure to involve all stakeholders including business personnel in Data warehouse
implementation process. Establish that Data warehousing is a joint/ team project.
You don't want to create Data warehouse that is not useful to the end users.
7. Prepare a training plan for the end users.
Why We Need Data Warehouse? Advantages & Disadvantages
Advantages of Data Warehouse:
1. Data warehouse allows business users to quickly access critical data from some
sources all in one place.
2. Data warehouse provides consistent information on various cross-functional activities.
It is also supporting ad-hoc reporting and query.
3. Data Warehouse helps to integrate many sources of data to reduce stress on the
production system.
4. Data warehouse helps to reduce total turnaround time for analysis and reporting.
5. Restructuring and Integration make it easier for the user to use for reporting and
analysis.
6. Data warehouse allows users to access critical data from the number of sources in a
single place. Therefore, it saves user's time of retrieving data from multiple sources.
7. Data warehouse stores a large amount of historical data. This helps users to analyze
different time periods and trends to make future predictions.
Disadvantages of Data Warehouse:
1. Not an ideal option for unstructured data.
2. Creation and Implementation of Data Warehouse is surely time confusing affair.
3. Data Warehouse can be outdated relatively quickly
4. Difficult to make changes in data types and ranges, data source schema, indexes, and
queries.
5. The data warehouse may seem easy, but actually, it is too complex for the average
users.
6. Despite best efforts at project management, data warehousing project scope will
always increase.
7. Sometime warehouse users will develop different business rules.
8. Organisations need to spend lots of their resources for training and Implementation
purpose.
The Future of Data Warehousing
1. Change in Regulatory constrains may limit the ability to combine source of
disparate data. These disparate sources may include unstructured data which is
difficult to store.
2. As the size of the databases grows, the estimates of what constitutes a very large
database continue to grow. It is complex to build and run data warehouse systems
which are always increasing in size. The hardware and software resources are
available today do not allow to keep a large amount of data online.
3. Multimedia data cannot be easily manipulated as text data, whereas textual
information can be retrieved by the relational software available today. This could be
a research subject.
Data Warehouse Tools
There are many Data Warehousing tools are available in the market. Here, are some most
prominent one:
1. MarkLogic:
MarkLogic is useful data warehousing solution that makes data integration easier and faster
using an array of enterprise features. This tool helps to perform very complex search
operations. It can query different types of data like documents, relationships, and metadata.
2. Oracle:
Oracle is the industry-leading database. It offers a wide range of choice of data warehouse
solutions for both on-premises and in the cloud. It helps to optimize customer experiences by
increasing operational efficiency.
3. Amazon RedShift:
Amazon Redshift is Data warehouse tool. It is a simple and cost-effective tool to analyze all
types of data using standard SQL and existing BI tools. It also allows running complex
queries against petabytes of structured data, using the technique of query optimization.
Conclusion:
 The data warehouse works as a central repository where information is coming from
one or more data sources.
 Three main types of Data warehouses are Enterprise Data Warehouse, Operational
Data Store, and Data Mart.
 General state of a data warehouse are Offline Operational Database, Offline Data
Warehouse, Real time Data Warehouse and Integrated Data Warehouse.
 Four main components of Data warehouse are Load manager, Warehouse Manager,
Query Manager, End-user access tools
 Data warehouse is used in diverse industries like Airline, Banking, Healthcare,
Insurance, Retail etc.
 Implementing Data warehouse is a 3 prong strategy viz. Enterprise strategy, Phased
delivery and Iterative Prototyping.
 Data warehouse allows business users to quickly access critical data from some
sources all in one place.
2. Extract-Transform-Load (ETL)
ETL tools and processes are responsible for the extraction of data from one or many source
systems, as they transform data from many different formats into a common format and then
load that data into a data warehouse. ETL tools are tasked with extracting information
deemed central to the business. They manipulate and present that data into information that is
then used for managerial decision making. The early in the history of business intelligence
systems, ETL design and implementation was considered a supporting task for the data
warehouse and thus was not viewed as a piece of the business intelligence puzzle but as a
subset of the data warehousing problem.
ETL solutions are divided into three distinct stages that find and convert data from various
sources and inserts the resulting product into a data warehouse. The three stages of ETL are:
1.The extraction stage: This stage involves obtaining access to data originating from
different, often heterogeneous sources. These sources are often distributed across multiple
platforms and can be part of a customer's information system.
2. The transformation stage: This stage transforms the extracted data and is considered the
most complex stage of the ETL process. The transformation stage converts the data into the
same schema of the data warehouse to which it is to be loaded. The transformation phase is
usually performed by means of traditional programming languages, script languages or the
SQL language.
3. The load stage: The load stage pushes the transformed data and loads the data warehouses
with data that are aggregated and filtered (Olszak & Ziemba, 2007). The requirement of a
business intelligence system to be able to extract data in different formats from disperse
sources, transform them into like formats, and then load them into the appropriate data
warehouse has traditionally made the ETL process the most expensive aspect of a business
intelligence system .
Generally there are four categories that ETL tools fall under:
1. ETL: tools that address the extraction and loading aspects of the ETL process.
2. ETL: tools that provide a preference for the data type and format to be extracted and
loaded.
3. ETL: tools that offer a balance across all tool functions; the lack of emphasis may cause
this aspect to result in poorer handling of a large volume of data formats.
4. ETL: tools that emphasize the integration of data into data warehouses.
3. OLAP Techniques
The origins of On-Line Analytical Processing are rooted in the difficulties encountered
when performing data analysis on databases that are constantly being updated during
transactions via other information systems. OLAP attempts to analyze complex data in real
time on a database that is constantly updated with transactional data. The OLAP optimizes
the searching of huge data files by means of automatic generation of SQL queries. OLAP
allows user access, analysis and modeling of business problems and sharing of information
that is stored in data warehouses.
OLAP tools use data mining techniques and statistical methods to create readable, fast report
generation that is used for forecasting that can further assist in strategic decision making.
These reports are generated based on a manager‘s pre-defined criteria (dimensions). OLAP is
an improvement to earlier single dimensional analysis tools that allowed managers to analyze
data from only one perspective at a time. By providing managers with a multi-dimensional
tool, OLAP enables managers to analyze data from multiple perspectives and explore it in
order to discover hidden information (Matei, 2010).
4. Data mining.
Data mining techniques are designed to identify relationships and rules within a data
warehouse, then create a report of these relationships and rules. The data mining process
involves discovering various patterns, generalizations, regularities and rules in data resources.
Knowledge from data mining may be used to predict an outcome of a decision and can also
describe reality. The predictions generated by data mining use known variables to predict the
outcome of a situation, while reality is measured by graphing, tabling, and creating formulas
based on the existing data.
Strategies for Data Mining
There are several basic strategies for data mining. The most common are:
These strategies can be aligned with the needs of an organization and help decision making
by discovering various patterns, generalizations, regularities and rules in data resources.
Examples of these strategies in business include using market basket analysis to model retail
sales or classification to classify unstructured data, such as email, as spam or a legitimate
piece of correspondence, such as business or personal information.
Data Mining
Case:
Nisha is not much an internet buyer, so she prefer to do her shopping by going to a physical
store. She received a call from her childhood friend that she would be visiting her over the
weekend. Nisha was scheduled to be on an overseas assignment all of that week and would
be back in India just in time to receive her friend at home. Her friend is from Maharashtrian
family. She wanted to gift her statute of God Ganesha. With very little time on hand, she
reluctant decided to check it out online. To her surprise, she was able to find numerous
statues in varied sizes, forms and made of different materials, many more options that she
could have found in any physical store. It made her job so simple to click, compare and
decide based on several parameters that would define her final purchase. The website also
provided guidance in the form of other customers profiled in Nisha‘s segment who had
purchased a statue she had viewed and simultaneously also purchased or viewed a similar
item. There was a listing of items that were frequently bought together to assist her in the
purchasing decision. What else did she need? From the cosy sofa of her drawing room, she
got a choice of multiple products and recommendations for purchase. Product description,
price, delivery options, discount, flexible payment methods – all bundled together. She was
tired looking around the websites that evening and hence decided to defer her purchase on
next morning. Next day she was flooded with digital advertisements of Ganesha idol across
websites.
In the described above, the website has used data mining techniques to provide customers
like Nisha with the best surfing and shopping experience online. Understanding her
requirements as soon as possible and then working towards providing her and similar
customers the ease of shopping to decide, select and buy the product of their choice from
innumerable options is a key feature of data mining techniques. A large number of data
mining techniques are being used by organisations to manage their customers by getting a
better understanding about their needs and purchase behaviour for improving organisational
processes for efficient operations, managing employees and other stakeholders, financial
decisions and strategic needs.
Data mining is used to describe the method of discovering or mining knowledge from large
reserves of data. It is a term used to describe the process through which previously unknown
patterns in data are discovered.
According to Fayyad, et.al (1996) data mining is defines as ―the nontrivial process of
identifying valid, novel, potentially useful and ultimately understandable patterns in data
stored in structured databases‖.
Data Mining Techniques/Methods
1.Classification:
This analysis is used to retrieve important and relevant information about data, and metadata.
This data mining method helps to classify data in different classes.
2. Time series analysis:
Series of data points indexed in time order. Most commonly a time series is a sequence taken
at successive equally spaced points in time. Thus it a sequence of discrete time data.
3. Market Basket Analysis (MBA):
It is a modelling technique based up on the theory that if you buy a certain group of items you
are more likely to buy a another group of items. These set of items a customer buys is
referred as an item set. And market basket analysis seeks to find relationships between
purchases. It helps to understand customer purchase and its pattern.
4. Clustering:
Clustering analysis is a data mining technique to identify data that are like each other. This
process helps to understand the differences and similarities between the data.
5. Regression:
Regression analysis is the data mining method of identifying and analyzing the strength of
relationship between one dependent variable and a series of other changing variables
(independent variables).
The term regression is coined by Francis Galton in the 19th
century to describe a biological
phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to
regress down towards a normal average.
Types: Linear regression and Logistic regression.
6. Association Rules:
This data mining technique helps to find the association between two or more Items. It
discovers a hidden pattern in the data set.
7. Outer detection:
This type of data mining technique refers to observation of data items in the dataset which do
not match an expected pattern or expected behavior. This technique can be used in a variety
of domains, such as intrusion, detection, fraud or fault detection, etc. Outer detection is also
called Outlier Analysis or Outlier mining.
8. Sequential Patterns:
This data mining technique helps to discover or identify similar patterns or trends in
transaction data for certain period.
9. Estimation:
This is the process of finding an approximation which has a value that is usable for some
purpose even if input data may be incomplete, uncertain or unusable.
10. Prediction:
It is a technique used to predict future outcomes. Eg: predictive analysis can be used to detect
incidents that led to the crime and identify the criminals behind them as well.
Prediction has used a combination of the other data mining techniques like trends, sequential
patterns, clustering, classification, etc. It analyzes past events or instances in a right sequence
for predicting a future event.
Benefits of Data Mining:
1. Data mining technique helps companies to get knowledge-based information.
2. Data mining helps organizations to make the profitable adjustments in operation and
production.
3. The data mining is a cost-effective and efficient solution compared to other statistical
data applications.
4. Data mining helps with the decision-making process.
5. Facilitates automated prediction of trends and behaviors as well as automated
discovery of hidden patterns.
6. It can be implemented in new systems as well as existing platforms
7. It is the speedy process which makes it easy for the users to analyze huge amount of
data in less time.
Disadvantages of Data Mining
1. There are chances of companies may sell useful information of their customers to
other companies for money. For example, American Express has sold credit card
purchases of their customers to the other companies.
2. Many data mining analytics software is difficult to operate and requires advance
training to work on.
3. Different data mining tools work in different manners due to different algorithms
employed in their design. Therefore, the selection of correct data mining tool is a very
difficult task.
4. The data mining techniques are not accurate, and so it can cause serious consequences
in certain conditions.
Summary:
1. Data Mining is all about explaining the past and predicting the future for analysis.
2. Data mining helps to extract information from huge sets of data. It is the
procedure of mining knowledge from data.
3. Data mining process includes business understanding, Data Understanding, Data
Preparation, Modelling, Evolution, Deployment.
4. Important Data mining techniques are Classification, clustering, Regression,
Association rules, Outer detection, Sequential Patterns, and prediction
5. R-language and Oracle Data mining are prominent data mining tools.
6. Data mining technique helps companies to get knowledge-based information.
7. The main drawback of data mining is that many analytics software is difficult to
operate and requires advance training to work on.
8. Data mining is used in diverse industries such as Communications, Insurance,
Education, Manufacturing, Banking, Retail, Service providers, eCommerce,
Supermarkets Bioinformatics.
Data Mining Applications
Applications Usage
Communications Data mining techniques are used in communication sector to predict customer
behavior to offer highly targetted and relevant campaigns.
Insurance Data mining helps insurance companies to price their products profitable and
promote new offers to their new or existing customers.
Education Data mining benefits educators to access student data, predict achievement levels
and find students or groups of students which need extra attention. For example,
students who are weak in maths subject.
Manufacturing With the help of Data Mining Manufacturers can predict wear and tear of production
assets. They can anticipate maintenance which helps them reduce them to minimize
downtime.
Banking Data mining helps finance sector to get a view of market risks and manage
regulatory compliance. It helps banks to identify probable defaulters to decide
whether to issue credit cards, loans, etc.
Retail Data Mining techniques help retail malls and grocery stores identify and arrange
most sellable items in the most attentive positions. It helps store owners to comes up
with the offer which encourages customers to increase their spending.
Service Providers Service providers like mobile phone and utility industries use Data Mining to predict
the reasons when a customer leaves their company. They analyze billing details,
customer service interactions, complaints made to the company to assign each
customer a probability score and offers incentives.
E-Commerce E-commerce websites use Data Mining to offer cross-sells and up-sells through their
websites. One of the most famous names is Amazon, who use Data mining
techniques to get more customers into their eCommerce store.
Super Markets Data Mining allows supermarket's develope rules to predict if their shoppers were
likely to be expecting. By evaluating their buying pattern, they could find woman
customers who are most likely pregnant. They can start targeting products like baby
powder, baby shop, diapers and so on.
Crime Data Mining helps crime investigation agencies to deploy police workforce (where
Investigation is a crime most likely to happen and when?), who to search at a border crossing etc.
Bioinformatics Data Mining helps to mine biological data from massive datasets gathered in
biology and medicine.
The Specific Role of Each Component in a Business Intelligence System
Business intelligence systems as a means to exploit information in order to help managers
solve their structured and unstructured problems. Each component of a business intelligence
system can be used to exploit information in one or more of these selected managerial
decision-making actions:
(a) acquiring information
(b) searching/gathering information
(c) analyzing information and
(d) delivery of information.
By analyzing historical data, business intelligence systems strive to eliminate communication
barriers that exist at the different organizational levels within a company. These barriers are
considered noise during the decision-making process. By allowing decisions to be made
using consistent information, this method of analysis enables managers to evaluate former
activities and direct future actions.
BI System Components Aligned with Managerial Decision-Making Actions
Business Intelligence System
Component
Managerial Information
Actions
ETL Tools Acquiring/Searching information
Data Warehouses Acquiring/Searching information
OLAP Techniques Analyzing and Delivery
Data Mining Analyzing and Delivery
Business Intelligence System
Component
Acquiring/gathering information: Acquiring information has become increasingly more
difficult as modern organizations adopt more distributed information systems in which to
store their business critical data.This action is used to find the business issue. This action
utilizes ETL tools, directing the processes to find what information is needed and into which
data warehouse to deposit that information.
Searching information: After the data are extracted from operational databases the newly
loaded high quality data are mined using data mining techniques and processes. This action is
performed at different levels of data quality. Lower quality data are searched by utilizing
ETL tools. The more refined or mature an ETL tool, the higher the data quality of the data
being loaded into a data warehouse.
Analyzing information: Managers need to create data models to understand and address
business issues. Through data pre-processing and applying OLAP and data mining techniques
managers can analyze information from multiple dimension at varying degrees of granularity,
and tasked with a different level of analysis. For example, information derived through
analysis directly affects decisions related to promotional campaigns, forecasting sales and
financial results and, in some cases, can be used in fraud detection.
OLAP summarizes data and makes forecasts based on historical data. Data mining discovers
hidden patterns in data. Data mining operates at a detail level instead of a summary level. In
other words, data mining predicts, while OLAP forecasts.
Data mining and OLAP can be used to analyze:
(a) financial data: analyzing and reporting on costs, revenue and profitability
(b) marketing data: analyzing sales receipts, sales profitability, sales target, actions taken
by competitors
(c) customer data: analyzing time of contact, customer profitability, customer behavior,
customer satisfaction, and customer loyalty
(d) production data: analyzing production bottle necks, delayed orders, in-process
materials, tool up-time
(e) logistical data: analyzing relationships in a supply chain and delivery partnerships
(f) wage related data: analyzing wage types, payroll surcharges, payroll collections,
employee contributions, and average wages
(g) personal data: analyzing employee turnover, employee type, presentation of
information related to individual data
Delivery of information: Data mining is also used in the delivery of information within an
organization. In business intelligence systems, data mining can not only interpret, and
evaluate results generated from the analysis performed on data stored in a data warehouse,
but it can also display reports enabling decision makers to discover various patterns,
generalizations, and regularities . In the same way, OLAP creates ad hoc report generation
using simpler data mining techniques by summarizing data without the pattern matching that
is unique to the data mining process.
How Business Intelligence Systems can be used to Better Facilitate Business Decision
Making at Each Level of Management
By utilizing business intelligence systems organizations are collecting, treating and diffusing
information with the objective of reducing uncertainty in the making of decisions. These
decisions are often made under pressure, almost always at critical times in which businesses
need real-time data.
A business intelligence system allows managers to make decisions using real time data by
monitoring competition, carrying out constant analysis of numerous data and considering
different variants of organization performance. Data is extracted from operational databases,
customer databases, and from data collected pertaining to the competition. The business
intelligence system extracts this data from these various data sources, transforms it into
specified formats, and then loads the newly formatted data into specially designated data
warehouses that are available to all three levels of decision making within the organization:
operational, strategic, and tactical .
Each level of the organization will utilize different OLAP techniques and data mining process
to analyze data and report information that is most relevant to them. The information
generated from the business intelligence system will be used in all decision-making
processes. At the strategic level, decisions set objectives and push the decision direction to
the tactical level of the organization. At the tactical level information is mined from the
business intelligence system to develop tactics to realize the strategic objectives and, in-turn,
will push a decision down to the operational level of the organization.
Both the tactical and operational levels of management are reactive to the strategic decisions
of the organization. Even with a shared objective, different levels of the organization will
utilize information for different purposes. At strategic and tactical levels, information
provides input to senior managers; at the operational levels, information provides input to
lower level managers.
Operational level decisions. At the operational level, decisions affect or are related to the
ongoing operations of an organization. These decisions are generally based on up-to-date
financial data, sales and co-operation with suppliers and customers . Data are the life blood of
daily operations in an organization and business intelligence takes that data and presents it to
decision makers in the form of information. Business intelligence systems provide
information used at the operational level of an organization to address the following specific
actions :
1. identify problems and ‗bottlenecks‘
2. provide analysis of ―the best‖ and ―the worst‖
3. provide analysis of products
4. provide analysis of employees
5. provide analysis of regions (using measurable metrics such as sales, costs or quantifiable
results)
6. perform ad-hoc analysis and answer questions related to departments ongoing operations,
up to date financial standing and sales.
Operational level decisions are noted as being the decisions that allow an organization to run
its day-to-day activities (Esat et al., 2007). The information provided by the business
intelligence system is at a summary level and the data feed into the business intelligence
system from the operational level of an organization is analyzed and combined with other
external information to create direction and allow for strategic planning to occur.
Tactical level decisions. Decisions made at the tactical level are related to planning and rely
on real-time data and forecasting to direct the future actions of marketing, sales, finance and
capital management. Tactical decisions are often used to support strategic decisions. The
literature details these related tactical decision-making activities as being supported by
business intelligence systems:
1. analyses of deviations from the realization of plans for particular organizational units,
individuals or indicators
2. decisions related to the direction of marketing, sales, finance and capital management
3. forecasting of demand for a given product or service
The information derived through these activities allows for optimizing future actions and for
modifying organizational aspects of the company's performance.
Strategic level decisions. Strategic level decisions set objectives as well as ensure that those
objectives are realized. Business intelligence systems provide information in support of
strategic decision related to the development of future results based on historical results,
profitability of offers (made or received) and the effectiveness of distribution channels.
asserts strategic decisions use business information systems to create forecasts based on
historical data from the past, combining it with current performance and then to estimate how
conditions will play out in the future. Based on the literature, information provided by
business intelligence systems inform these kinds of decisions made at the strategic level:
1. whether to enter new markets
2. the possibility of changing a company's orientation from product-centric to customer
centric
3. the launch of a new product (Watson & Wixom, 2007, p.97)
4. what objectives to set and to follow through on the realization of such established
objectives (Olszak & Ziemba, 2007)

Weitere ähnliche Inhalte

Ähnlich wie Unit IV.pdf

Applications of machine learning
Applications of machine learningApplications of machine learning
Applications of machine learningbusiness Corporate
 
How Marketing Automation is transformed by AI and Data Science
How Marketing Automation is transformed by AI and Data ScienceHow Marketing Automation is transformed by AI and Data Science
How Marketing Automation is transformed by AI and Data ScienceSALESmanago AI driven CDXP
 
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...eswaralaldevadoss
 
machine learning.docx
machine learning.docxmachine learning.docx
machine learning.docxJadhavArjun2
 
Available Research Topics in Machine Learning
Available Research Topics in Machine LearningAvailable Research Topics in Machine Learning
Available Research Topics in Machine LearningTechsparks
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdfJamieDornan2
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdfAnastasiaSteele10
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdfStephenAmell4
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdfStephenAmell4
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdfAnastasiaSteele10
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdfJamieDornan2
 
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...Ethical Consultant Services
 
Voice Based Search Engine for Visually Impairment Peoples
Voice Based Search Engine for Visually Impairment PeoplesVoice Based Search Engine for Visually Impairment Peoples
Voice Based Search Engine for Visually Impairment PeoplesIJASRD Journal
 
Machine Learning Assignment: How JD utilizes Artificial Intelligence?
 Machine Learning Assignment: How JD utilizes Artificial Intelligence? Machine Learning Assignment: How JD utilizes Artificial Intelligence?
Machine Learning Assignment: How JD utilizes Artificial Intelligence?Total Assignment Help
 
The A_Z of Artificial Intelligence Types and Principles_1687569150.pdf
The  A_Z of Artificial Intelligence Types and Principles_1687569150.pdfThe  A_Z of Artificial Intelligence Types and Principles_1687569150.pdf
The A_Z of Artificial Intelligence Types and Principles_1687569150.pdfssuseredfe14
 

Ähnlich wie Unit IV.pdf (20)

Applications of machine learning
Applications of machine learningApplications of machine learning
Applications of machine learning
 
AI.pdf
AI.pdfAI.pdf
AI.pdf
 
How Marketing Automation is transformed by AI and Data Science
How Marketing Automation is transformed by AI and Data ScienceHow Marketing Automation is transformed by AI and Data Science
How Marketing Automation is transformed by AI and Data Science
 
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
 
machine learning.docx
machine learning.docxmachine learning.docx
machine learning.docx
 
Available Research Topics in Machine Learning
Available Research Topics in Machine LearningAvailable Research Topics in Machine Learning
Available Research Topics in Machine Learning
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
How to build machine learning apps.pdf
How to build machine learning apps.pdfHow to build machine learning apps.pdf
How to build machine learning apps.pdf
 
Eckovation Machine Learning
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
 
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...
The Unleashing the Power of AI & How Machine Learning is Revolutionizing Ever...
 
Voice Based Search Engine for Visually Impairment Peoples
Voice Based Search Engine for Visually Impairment PeoplesVoice Based Search Engine for Visually Impairment Peoples
Voice Based Search Engine for Visually Impairment Peoples
 
Understanding Artificial Intelligence: A Comprehensive Guide
Understanding Artificial Intelligence: A Comprehensive GuideUnderstanding Artificial Intelligence: A Comprehensive Guide
Understanding Artificial Intelligence: A Comprehensive Guide
 
Machine Learning Assignment: How JD utilizes Artificial Intelligence?
 Machine Learning Assignment: How JD utilizes Artificial Intelligence? Machine Learning Assignment: How JD utilizes Artificial Intelligence?
Machine Learning Assignment: How JD utilizes Artificial Intelligence?
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
The A_Z of Artificial Intelligence Types and Principles_1687569150.pdf
The  A_Z of Artificial Intelligence Types and Principles_1687569150.pdfThe  A_Z of Artificial Intelligence Types and Principles_1687569150.pdf
The A_Z of Artificial Intelligence Types and Principles_1687569150.pdf
 

Kürzlich hochgeladen

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 

Kürzlich hochgeladen (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 

Unit IV.pdf

  • 1. UNIT IV Machine Learning Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. Once you have trained the model, you can use it to reason over data that it hasn't seen before, and make predictions about those data. For example, let's say you want to build an application that can recognize a user's emotions based on their facial expressions. You can train a model by providing it with images of faces that are each tagged with a certain emotion, and then you can use that model in an application that can recognize any user's emotion. While artificial intelligence (AI) is the broad science of mimicking human abilities, machine learning is a specific subset of AI that trains a machine how to learn. Because of new computing technologies, machine learning today is not like machine learning of the past. It was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks; researchers interested in artificial intelligence wanted to see if computers could learn from data. The iterative aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt. They learn from previous computations to produce reliable, repeatable decisions and results. It‘s a science that‘s not new – but one that has gained fresh momentum. While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data – over and over, faster and faster – is a recent development. Here are a few widely publicized examples of machine learning applications you may be familiar with:  The heavily hyped, self-driving Google car? The essence of machine learning.  Online recommendation offers such as those from Amazon and Netflix? Machine learning applications for everyday life.  Knowing what customers are saying about you on Twitter? Machine learning combined with linguistic rule creation.  Fraud detection? One of the more obvious, important uses in our world today. Eg: https://www.sas.com/en_in/insights/analytics/machine-learning.html Application of Machine Learning
  • 2. Machine learning is one modern innovation that has helped man enhance not only many industrial and professional processes but also advances everyday living. It is a subset of artificial intelligence, which focuses on using statistical techniques to build intelligent computer systems in order to learn from databases available to it. Currently, machine learning has been used in multiple fields and industries. The intelligent systems built on machine learning algorithms have the capability to learn from past experience or historical data. Machine learning applications provide results on the basis of past experience. Image Recognition Image recognition is one of the most common uses of machine learning. There are many situations where you can classify the object as a digital image. For example, in the case of a black and white image, the intensity of each pixel is served as one of the measurements. In colored images, each pixel provides 3 measurements of intensities in three different colors – red, green and blue (RGB). Machine learning can be used for face detection in an image as well. There is a separate category for each person in a database of several people. Machine learning is also used for character recognition to discern handwritten as well as printed letters. We can segment a piece of writing into smaller images, each containing a single character. Speech Recognition Speech recognition is the translation of spoken words into the text. It is also known as computer speech recognition or automatic speech recognition. Here, a software application can recognize the words spoken in an audio clip or file, and then subsequently convert the audio into a text file. The measurement in this application can be a set of numbers that represent the speech signal. We can also segment the speech signal by intensities in different time-frequency bands.
  • 3. Speech recognition is used in the applications like voice user interface, voice searches and more. Voice user interfaces include voice dialing, call routing, and appliance control. It can also be used a simple data entry and the preparation of structured documents. Medical diagnosis Machine learning can be used in the techniques and tools that can help in the diagnosis of diseases. It is used for the analysis of the clinical parameters and their combination for the prognosis example prediction of disease progression for the extraction of medical knowledge for the outcome research, for therapy planning and patient monitoring. These are the successful implementations of the machine learning methods. It can help in the integration of computer-based systems in the healthcare sector. Statistical Arbitrage In finance, arbitrage refers to the automated trading strategies that are of a short-term and involve a large number of securities. In these strategies, the user focuses on implementing the trading algorithm for a set of securities on the basis of quantities like historical correlations and the general economic variables. Machine learning methods are applied to obtain an index arbitrage strategy. We apply linear regression and the Support Vector Machine to the prices of a stream of stocks. Learning associations Learning associations is the process of developing insights into the various associations between the products. A good example is how the unrelated products can be associated with one another. One of the applications of machine learning is studying the associations between the products that people buy. If a person buys a product, he will be shown similar products because there is a relation between the two products. When any new products are launched in the market, they are associated with the old ones to increase their sales.
  • 4. Classification A classification is a process of placing each individual under study in many classes. Classification helps to analyze the measurements of an object to identify the category to which that object belongs. To establish an efficient relation, analysts use data. For example, before a bank decides to distribute loans, it assesses the customers on their ability to pay loans. By considering the factors like customer‘s earnings, savings, and financial history, we can do it. This information is taken from the past data on the loan. Prediction Machine learning can also be used in the prediction systems. Considering the loan example, to compute the probability of a fault, the system will need to classify the available data in groups. It is defined by a set of rules prescribed by the analysts. Once the classification is done, we can calculate the probability of the fault. These computations can compute across all the sectors for varied purposes. Making predictions is one of the best machine learning applications. Extraction Extraction of information is one of the best applications of machine learning. It is the process of extracting structured information from the unstructured data. For example, the web pages, articles, blogs, business reports, and emails. The relational database maintains the output produced by the information extraction. The process of extraction takes a set of documents as input and outputs the structured data. Regression We can also implement machine learning in the regression as well. In regression, we can use the principle of machine learning to optimize the parameters. It can also be used to decrease the approximation error and calculate the closest possible outcome. We can also use the machine learning for the function optimization. We can also choose to alter the inputs in order to get the closest possible outcome.
  • 5. Financial Services Machine learning has a lot of potential in the financial and banking sector. It is the driving force behind the popularity of the financial services. Machine learning can help the banks, financial institutions to make smarter decisions. Machine learning can help the financial services to spot an account closure before it occurs. It can also track the spending pattern of the customers. Machine learning can also perform the market analysis. Smart machines can be trained to track the spending patterns. The algorithms can identify the tends easily and can react in real time. Government Government agencies such as public safety and utilities have a particular need for machine learning since they have multiple sources of data that can be mined for insights. Analyzing sensor data, for example, identifies ways to increase efficiency and save money. Machine learning can also help detect fraud and minimize identity theft. Health care Machine learning is a fast-growing trend in the health care industry, thanks to the advent of wearable devices and sensors that can use data to assess a patient's health in real time. The technology can also help medical experts analyze data to identify trends or red flags that may lead to improved diagnoses and treatment. Retail Websites recommending items you might like based on previous purchases are using machine learning to analyze your buying history. Retailers rely on machine learning to capture data, analyze it and use it to personalize a shopping experience, implement a marketing campaign, price optimization, merchandise supply planning, and for customer insights. Oil and gas Finding new energy sources. Analyzing minerals in the ground. Predicting refinery sensor failure. Streamlining oil distribution to make it more efficient and cost-effective. The number of machine learning use cases for this industry is vast – and still expanding. Transportation
  • 6. Analyzing data to identify patterns and trends is key to the transportation industry, which relies on making routes more efficient and predicting potential problems to increase profitability. The data analysis and modeling aspects of machine learning are important tools to delivery companies, public transportation and other transportation organizations. Methods or Types of Machine Learning Supervised learning algorithms are trained using labeled examples, such as an input where the desired output is known. For example, a piece of equipment could have data points labeled either ―F‖ (failed) or ―R‖ (runs). The learning algorithm receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors. It then modifies the model accordingly. Through methods like classification, regression, prediction and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabeled data. Supervised learning is commonly used in applications where historical data predicts likely future events. For example, it can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim. Unsupervised learning is used against data that has no historical labels. The system is not told the "right answer." The algorithm must figure out what is being shown. The goal is to explore the data and find some structure within. Unsupervised learning works well on transactional data. For example, it can identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or it can find the main attributes that separate customer segments from each other. Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition. These algorithms are also used to segment text topics, recommend items and identify data outliers. Semisupervised learning is used for the same applications as supervised learning. But it uses both labeled and unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data (because unlabeled data is less expensive and takes less effort to acquire). This type of learning can be used with methods such as classification, regression and prediction. Semisupervised learning is useful when the cost associated with labeling is too high to allow for a fully labeled training process. Early examples of this include identifying a person's face on a web cam.
  • 7. Reinforcement learning is often used for robotics, gaming and navigation. With reinforcement learning, the algorithm discovers through trial and error which actions yield the greatest rewards. This type of learning has three primary components: the agent (the learner or decision maker), the environment (everything the agent interacts with) and actions (what the agent can do). The objective is for the agent to choose actions that maximize the expected reward over a given amount of time. The agent will reach the goal much faster by following a good policy. So the goal in reinforcement learning is to learn the best policy. How businesses are using machine learning Machine learning is the core of some companies‘ business models, like in the case of Netflix‘s suggestions algorithm or Google‘s search engine. Other companies are engaging deeply with machine learning, though it‘s not their main business proposition. Others are still trying to determine how to use machine learning in a beneficial way. ―In my opinion, one of the hardest problems in machine learning is figuring out what problems I can solve with machine learning,‖ Shulman said. ―There‘s still a gap in the understanding.‖ In a 2018 paper, researchers from the MIT Initiative on the Digital Economy outlined a 21- question rubric to determine whether a task is suitable for machine learning. The researchers found that no occupation will be untouched by machine learning, but no occupation is likely to be completely taken over by it. The way to unleash machine learning success, the researchers found, was to reorganize jobs into discrete tasks, some which can be done by machine learning, and others that require a human. Companies are already using machine learning in several ways, including: Recommendation algorithms. The recommendation engines behind Netflix and YouTube suggestions, what information appears on your Facebook feed, and product recommendations are fueled by machine learning. ―[The algorithms] are trying to learn our preferences,‖ Madry said. ―They want to learn, like on Twitter, what tweets we want them to show us, on Facebook, what ads to display, what posts or liked content to share with us.‖ Image analysis and object detection. Machine learning can analyze images for different information, like learning to identify people and tell them apart — though facial recognition algorithms are controversial. Business uses for this vary. Shulman noted that hedge funds famously use machine learning to analyze the number of cars in parking lots, which helps them learn how companies are performing and make good bets. Fraud detection. Machines can analyze patterns, like how someone normally spends or where they normally shop, to identify potentially fraudulent credit card transactions, log-in attempts, or spam emails. Automatic helplines or chatbots. Many companies are deploying online chatbots, in which customers or clients don‘t speak to humans, but instead interact with a machine. These algorithms use machine learning and natural language processing, with the bots learning from records of past conversations to come up with appropriate responses.
  • 8. Self-driving cars. Much of the technology behind self-driving cars is based on machine learning, deep learning in particular. Medical imaging and diagnostics. Machine learning programs can be trained to examine medical images or other information and look for certain markers of illness, like a tool that can predict cancer risk based on a mammogram. How has machine learning evolved? 1642 - Blaise Pascal invents a mechanical machine that can add, subtract, multiply and divide. 1679 - Gottfried Wilhelm Leibniz devises the system of binary code. 1834 - Charles Babbage conceives the idea for a general all-purpose device that could be programmed with punched cards. 1842 - Ada Lovelace describes a sequence of operations for solving mathematical problems using Charles Babbage's theoretical punch-card machine and becomes the first programmer. 1847 - George Boole creates Boolean logic, a form of algebra in which all values can be reduced to the binary values of true or false. 1936 - English logician and cryptanalyst Alan Turing proposes a universal machine that could decipher and execute a set of instructions. His published proof is considered the basis of computer science.
  • 9. 1952 - Arthur Samuel creates a program to help an IBM computer get better at checkers the more it plays. 1959 - MADALINE becomes the first artificial neural network applied to a real-world problem: removing echoes from phone lines. 1985 - Terry Sejnowski's and Charles Rosenberg's artificial neural network taught itself how to correctly pronounce 20,000 words in one week. 1997 - IBM's Deep Blue beat chess grandmaster Garry Kasparov. 1999 - A CAD prototype intelligent workstation reviewed 22,000 mammograms and detected cancer 52% more accurately than radiologists did. 2006 - Computer scientist Geoffrey Hinton invents the term deep learning to describe neural net research. 2012 - An unsupervised neural network created by Google learned to recognize cats in YouTube videos with 74.8% accuracy. 2014 - A chatbot passes the Turing Test by convincing 33% of human judges that it was a Ukrainian teen named Eugene Goostman. 2014 - Google's AlphaGo defeats the human champion in Go, the most difficult board game in the world. 2016 - LipNet, DeepMind's artificial intelligence system, identifies lip-read words in video with an accuracy of 93.4%. 2019 - Amazon controls 70% of the market share for virtual assistants in the U.S.
  • 10.
  • 11. What is the future of machine learning? While machine learning algorithms have been around for decades, they've attained new popularity as artificial intelligence has grown in prominence. Deep learning models, in particular, power today's most advanced AI applications. Machine learning platforms are among enterprise technology's most competitive realms, with most major vendors, including Amazon, Google, Microsoft, IBM and others, racing to sign customers up for platform services that cover the spectrum of machine learning activities, including data collection, data preparation, data classification, model building, training and application deployment. As machine learning continues to increase its importance to business operations and AI becomes more practical in enterprise settings, the machine learning platform wars will only intensify. Continued research into deep learning and AI is increasingly focused on developing more general applications. Today's AI models require extensive training in order to produce an algorithm that is highly optimized to perform one task. But some researchers are exploring ways to make models more flexible and are seeking techniques that allow a machine to apply context learned from one task to future, different tasks.
  • 12. Business Intelligence (BI) The term was coined in 1958 by an IBM researcher, Hans Peter Luhn and was first used in 1865, and was later adapted by Howard Dresner at Gartner in 1989, to describe making better business decisions through searching, gathering, and analyzing the accumulated data saved by an organization. Using the term ―Business Intelligence‖ as a description of decision- making based on data technologies was both novel and far-sighted. Large companies first used BI in the form of analyzing customer data systematically, as a necessary step in making business decisions. Business Intelligence can be described as a pipeline that spans across the entire realm of managing complex data in organisations for generating intelligent outcomes that aid business decision making. It include business objectives, methodologies, tools, techniques, models, architecture, processing and communicating desired outcomes. Note: Historical data: In a broad context, it is a collected data about past events and circumstances pertaining to a particular subject. It includes most data generated either manually or automatically within an enterprise. Sources may include press releases, financial reports, project documentation, email and other communications. Business intelligence (BI) helps organizations analyze historical and current data, so they can quickly uncover actionable insights for making strategic decisions. Business intelligence
  • 13. tools make this possible by processing large data sets across multiple sources and presenting findings in visual formats that are easy to understand and share. Benefits of using business intelligence Because business intelligence tools speed up information analysis and performance evaluation, they‘re valuable in helping companies reduce inefficiencies, flag potential problems, find new revenue streams, and identify areas of future growth. Some of the specific benefits that businesses experience when using BI include:  Increased efficiency of operational processes.  Insight into customer behavior and shopping patterns.  Accurate tracking of sales, marketing, and financial performance.  Clear benchmarks based on historical and current data.  Instant alerts about data anomalies and customer issues.  Analyses that can be shared in real-time across departments. In the past, business intelligence tools were primarily used by data analysts and IT users. Now, self-service BI platforms make business intelligence available to everyone from executives to operations teams. Stages of Business Intelligence Business Intelligence is generally divided into four different stages which together form the process of BI that businesses working with data should be aware of. 1. Information gathering During the information gathering stage, data is either prepared from existing sources (existing contact data, ERP data, financial database) or collected externally through the use of in- person or online surveys, polls, questionnaires or forms. Feedback data can be gathered from customers, staff or advisors, and consideration given to anonymity and privacy in order to provide the most honest and reflective data possible.
  • 14. 2. Analysis This is one of the key areas of turning raw data into information. BI makes it easier for the user to explore the data and turn it into useful information. There are three common types of analysis: Spreadsheet Analysis - probably the oldest form of analysis where data from a spreadsheet application is translated into tables, pivot tables and graphs in order to identify specific trends and inconsistencies. Software that allows users to develop their own specific data queries - where data has been collected it may be automatically analysed by software or on importation - for example results from a SurveyMonkey public survey. Visualisation Tools – graphs and charts that take raw data and create visualisations that users can read and understand - legacy programs like Crystal Reports and new technologies like Power BI are good examples of visualisation tools. 3. Reporting Once data has been analysed it needs to be reported on. Reporting is the act of taking the analysed data and presenting it in a way that makes a human connection, or some sort of focus where advantages are to be gained through actions. Depending on the tools involved, reporting can happen as an extension of the analysis phase, but for BI to be effective it must be reported on after being filtered or defined during the analysis stage before being presented as a report. Reports may be presented as tables of data on screen or paper, but can also be shown as pivot tables, graphs, or as an executive summary in a corporate report. 4. Monitoring and Prediction Business Intelligence is a circular process, and therefore the forth stage of monitoring and prediction can flow on back to the first stage, being information gathering. Monitoring allows the user to monitor data and information in real-time. Monitoring provides snapshots between reporting periods or when making decisions. The three main types of monitoring are: Dashboard – A central location where all useful and actionable metrics and data are contained. They are usually represented graphically to make it easier for users to read. Key Performance Indicators (KPIs) – KPIs measure the performance of selected key drivers from the organisation.
  • 15. Business Performance Management – Also known as a Balanced Scorecard is a system that is designed to ensure that performance goals for your organisation or projects are being met and results are being delivered. Prediction helps management predict what will happen based on the data currently available and other trends. Prediction can be an incredibly complex form of BI, and uses a combination of insights gathered during the analysis and monitor/predict stages in order to make decisions on future outcomes, or on what data to focus on for the next Information Gathering stage. TYPES OF BUSINESS INTELLIGENCE Business intelligence combines a broad set of data analysis applications. Depending on your needs, available data, tech stack, and the type of the task at hand here are the most common deliverables of Business Intelligence implementation:  Ad hoc analytics helps you answer a single business question. Focusing on a specific issue, this tool can either generate a report that does not already exist or dig deeper into a static report to get additional details about a particular business process or part of operations.  Online analytical processing (OLAP) allows users to extract and query certain data in order to analyze it from different points of view. It is typically used to analyze trends, financial reporting, sales forecasting, or other planning purposes.  Real-time BI. Real-time business intelligence enables users to get up-to-the-minute data by accessing operational systems or feeding business information into a real-time data warehouse and/or BI system.  Operational BI. Operational intelligence is an approach to data analysis that enables business operations decisions and actions to be based on real-time data as it's generated or collected by companies. Typically, the data analysis process is automated, and the resulting information is integrated into operational systems for immediate use by business managers and workers.  Collaborative BI emerged through combining business intelligence software with collaboration tools to support improved data-driven decision making.  BI dashboards and data visualization display key business metrics at a glance. Four Most Common Components of a Business Intelligence System Business intelligence systems are used for intelligent exploration, integration, aggregation, and a multidimensional analysis of data originating from various information resources and the data is treated as a highly valuable corporate resource (Kronos & Yeoh, 2010).
  • 16. For a business intelligence system, it is require, at least, four specific components to produce business intelligence. They include (a) data warehouses, (b) ETL tools, (c) OLAP techniques and (d) data mining (Olszak & Ziemba, 2006). 1. Data warehouses. The data warehouse is considered the core component of a business intelligence system. This collection of data is used to support the management decision-making Process. In addition to providing the snapshot of historical data, a data warehouse also provides room for the thematic storing of aggregated information, data that has been analyzed by an ETL tool then loaded into the appropriate data warehouse. A well implemented data warehouse is easy to use, allows for quick information recovery, stores more information, improves productivity, allows for better decisions, increases an organization's competitive advantage. Hevner and March (2005) conclude that the key role of a data warehouse is to provide an understanding of business problems, opportunities, and performance based on compelling business intelligence facilitating decision making. Data Warehousing Data Warehouse: This term was coined in 1980s. When the amount of data being collected continued to grow significantly, there arise a requirement to store the data to help in transforming data coming from operational systems into decision-making support systems. Data Warehouses are normally part of an organization‘s mainframe server. A Data Warehouse is normally optimized for a quick response time to queries. In a data warehouse, data is often stored using a timestamp, Process of saving data based on time and date. If all sales transactions were stored using timestamps, an organization could use a Data Warehouse to compare the sales trends of each month. A data warehousing is defined as a technique for collecting and managing data from varied sources to provide meaningful business insights. It is a blend of technologies and components which aids the strategic use of data. It is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing. It is a process of transforming data into information and making it available to users in a timely manner to make a difference. By merging all of this information in one place, an organization can analyze its customers more holistically. This helps to ensure that it has considered all the information available. Data warehousing makes data mining possible. Data mining is looking for patterns in the data that may lead to higher sales and profits. Types of Data Warehouse: Three main types of Data Warehouses are: 1. Enterprise Data Warehouse: Enterprise Data Warehouse is a centralized warehouse. It provides decision support service across the enterprise. It offers a unified approach for organizing and representing data. It also provide the ability to classify data according to the subject and give access according to those divisions. 2. Operational Data Store: Operational Data Store, which is also called ODS, are nothing but data store required when neither Data warehouse nor OLTP systems support
  • 17. organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees. 3. Data Mart:A data mart is a subset of the data warehouse, allows access rights for specific functional teams or user groups and speed up the process of query, data transfer and analysis at individual department level. It specially designed for a particular line of business, such as sales, finance, Marketing etc. In an independent data mart, data can collect directly from sources. Data Lakes are similar to Data Warehouse (DW) as both are used to store data. However data lakes store data in raw form in large scale as is captured from the source unlike DW where it is stored methodically to facilitate analytical processes. In many organisations, business leaders are using a hybrid solution for their analytical needs. The raw data, whether unstructured data, text, audio, video, web data, sensor data are all stored together in a data lake. It can compartmentalise the data depending on the source from where it is received or the requirement of the business team. Simple analysis on this data can provide insights that may be of interest to the business teams. General stages of Data Warehousing Earlier, organizations started relatively simple use of data warehousing. However, over time, more sophisticated use of data warehousing begun.The following are general stages of use of the data warehouse: 1. Offline Operational Database: In this stage, data is just copied from an operational system to another server. In this way, loading, processing, and reporting of the copied data do not impact the operational system's performance. 2. Offline Data Warehouse: Data in the Data warehouse is regularly updated from the Operational Database. The data in Data warehouse is mapped and transformed to meet the Data warehouse objectives. 3. Real time Data Warehouse: In this stage, Data warehouses are updated whenever any transaction takes place in operational database. For example, Airline or railway booking system. 4. Integrated Data Warehouse: In this stage, Data Warehouses are updated continuously when the operational system performs a transaction. The Datawarehouse then generates transactions which are passed back to the operational system.
  • 18. Components of Data warehouse: Four components of Data Warehouses are: 1. Load manager: Load manager is also called the front component. It performs with all the operations associated with the extraction and load of data into the warehouse. These operations include transformations to prepare the data for entering into the Data warehouse. 2. Warehouse Manager: Warehouse manager performs operations associated with the management of the data in the warehouse. It performs operations like analysis of data to ensure consistency, creation of indexes and views, generation of denormalization and aggregations, transformation and merging of source data and archiving and baking-up data. 3. Query Manager: Query manager is also known as backend component. It performs all the operation operations related to the management of user queries. The operations of this Data warehouse components are direct queries to the appropriate tables for scheduling the execution of queries. 4. End-user access tools: This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3. Application development tools 4. EIS tools, 5. OLAP tools and data mining tools. Who needs Data warehouse? Data warehouse is needed for all types of users like:  Decision makers who rely on mass amount of data  Users who use customized, complex processes to obtain information from multiple data sources.  It is also used by the people who want simple technology to access the data  It also essential for those people who want a systematic approach for making decisions.  If the user wants fast performance on a huge amount of data which is a necessity for reports, grids or charts, then Data warehouse proves useful.  Data warehouse is a first step If you want to discover 'hidden patterns' of data-flows and groupings. Here, are most common sectors where Data warehouse is used: Airline: In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability, frequent flyer program promotions, etc. Banking: It is widely used in the banking sector to manage the resources available on desk effectively. Few banks also used for the market research, performance analysis of the product and operations. Healthcare:
  • 19. Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient's treatment reports, share data with tie-in insurance companies, medical aid services, etc. Public sector: In the public sector, data warehouse is used for intelligence gathering. It helps government agencies to maintain and analyze tax records, health policy records, for every individual. Investment and Insurance sector: In this sector, the warehouses are primarily used to analyze data patterns, customer trends, and to track market movements. Retain chain: In retail chains, Data warehouse is widely used for distribution and marketing. It also helps to track items, customer buying pattern, promotions and also used for determining pricing policy. Telecommunication: A data warehouse is used in this sector for product promotions, sales decisions and to make distribution decisions. Hospitality Industry: This Industry utilizes warehouse services to design as well as estimate their advertising and promotion campaigns where they want to target clients based on their feedback and travel patterns. Steps to Implement Data Warehouse The best way to address the business risk associated with a Data warehouse implementation is to employ a three-prong strategy as below 1. Enterprise strategy: Here we identify technical including current architecture and tools. We also identify facts, dimensions, and attributes. Data mapping and transformation is also passed. 2. Phased delivery: Datawarehouse implementation should be phased based on subject areas. Related business entities like booking and billing should be first implemented and then integrated with each other. 3. Iterative Prototyping: Rather than a big bang approach to implementation, the Data warehouse should be developed and tested iteratively. Best practices to implement a Data Warehouse 1. Decide a plan to test the consistency, accuracy, and integrity of the data. 2. The data warehouse must be well integrated, well defined and time stamped.
  • 20. 3. While designing Data warehouse make sure you use right tool, stick to life cycle, take care about data conflicts and ready to learn you're your mistakes. 4. Never replace operational systems and reports 5. Don't spend too much time on extracting, cleaning and loading data. 6. Ensure to involve all stakeholders including business personnel in Data warehouse implementation process. Establish that Data warehousing is a joint/ team project. You don't want to create Data warehouse that is not useful to the end users. 7. Prepare a training plan for the end users. Why We Need Data Warehouse? Advantages & Disadvantages Advantages of Data Warehouse: 1. Data warehouse allows business users to quickly access critical data from some sources all in one place. 2. Data warehouse provides consistent information on various cross-functional activities. It is also supporting ad-hoc reporting and query. 3. Data Warehouse helps to integrate many sources of data to reduce stress on the production system. 4. Data warehouse helps to reduce total turnaround time for analysis and reporting. 5. Restructuring and Integration make it easier for the user to use for reporting and analysis. 6. Data warehouse allows users to access critical data from the number of sources in a single place. Therefore, it saves user's time of retrieving data from multiple sources. 7. Data warehouse stores a large amount of historical data. This helps users to analyze different time periods and trends to make future predictions. Disadvantages of Data Warehouse: 1. Not an ideal option for unstructured data. 2. Creation and Implementation of Data Warehouse is surely time confusing affair. 3. Data Warehouse can be outdated relatively quickly 4. Difficult to make changes in data types and ranges, data source schema, indexes, and queries. 5. The data warehouse may seem easy, but actually, it is too complex for the average users. 6. Despite best efforts at project management, data warehousing project scope will always increase. 7. Sometime warehouse users will develop different business rules. 8. Organisations need to spend lots of their resources for training and Implementation purpose. The Future of Data Warehousing 1. Change in Regulatory constrains may limit the ability to combine source of disparate data. These disparate sources may include unstructured data which is difficult to store. 2. As the size of the databases grows, the estimates of what constitutes a very large database continue to grow. It is complex to build and run data warehouse systems
  • 21. which are always increasing in size. The hardware and software resources are available today do not allow to keep a large amount of data online. 3. Multimedia data cannot be easily manipulated as text data, whereas textual information can be retrieved by the relational software available today. This could be a research subject. Data Warehouse Tools There are many Data Warehousing tools are available in the market. Here, are some most prominent one: 1. MarkLogic: MarkLogic is useful data warehousing solution that makes data integration easier and faster using an array of enterprise features. This tool helps to perform very complex search operations. It can query different types of data like documents, relationships, and metadata. 2. Oracle: Oracle is the industry-leading database. It offers a wide range of choice of data warehouse solutions for both on-premises and in the cloud. It helps to optimize customer experiences by increasing operational efficiency. 3. Amazon RedShift: Amazon Redshift is Data warehouse tool. It is a simple and cost-effective tool to analyze all types of data using standard SQL and existing BI tools. It also allows running complex queries against petabytes of structured data, using the technique of query optimization. Conclusion:  The data warehouse works as a central repository where information is coming from one or more data sources.  Three main types of Data warehouses are Enterprise Data Warehouse, Operational Data Store, and Data Mart.  General state of a data warehouse are Offline Operational Database, Offline Data Warehouse, Real time Data Warehouse and Integrated Data Warehouse.  Four main components of Data warehouse are Load manager, Warehouse Manager, Query Manager, End-user access tools  Data warehouse is used in diverse industries like Airline, Banking, Healthcare, Insurance, Retail etc.  Implementing Data warehouse is a 3 prong strategy viz. Enterprise strategy, Phased delivery and Iterative Prototyping.  Data warehouse allows business users to quickly access critical data from some sources all in one place.
  • 22. 2. Extract-Transform-Load (ETL) ETL tools and processes are responsible for the extraction of data from one or many source systems, as they transform data from many different formats into a common format and then load that data into a data warehouse. ETL tools are tasked with extracting information deemed central to the business. They manipulate and present that data into information that is then used for managerial decision making. The early in the history of business intelligence systems, ETL design and implementation was considered a supporting task for the data warehouse and thus was not viewed as a piece of the business intelligence puzzle but as a subset of the data warehousing problem. ETL solutions are divided into three distinct stages that find and convert data from various sources and inserts the resulting product into a data warehouse. The three stages of ETL are: 1.The extraction stage: This stage involves obtaining access to data originating from different, often heterogeneous sources. These sources are often distributed across multiple platforms and can be part of a customer's information system. 2. The transformation stage: This stage transforms the extracted data and is considered the most complex stage of the ETL process. The transformation stage converts the data into the same schema of the data warehouse to which it is to be loaded. The transformation phase is usually performed by means of traditional programming languages, script languages or the SQL language. 3. The load stage: The load stage pushes the transformed data and loads the data warehouses with data that are aggregated and filtered (Olszak & Ziemba, 2007). The requirement of a business intelligence system to be able to extract data in different formats from disperse sources, transform them into like formats, and then load them into the appropriate data warehouse has traditionally made the ETL process the most expensive aspect of a business intelligence system . Generally there are four categories that ETL tools fall under: 1. ETL: tools that address the extraction and loading aspects of the ETL process.
  • 23. 2. ETL: tools that provide a preference for the data type and format to be extracted and loaded. 3. ETL: tools that offer a balance across all tool functions; the lack of emphasis may cause this aspect to result in poorer handling of a large volume of data formats. 4. ETL: tools that emphasize the integration of data into data warehouses. 3. OLAP Techniques The origins of On-Line Analytical Processing are rooted in the difficulties encountered when performing data analysis on databases that are constantly being updated during transactions via other information systems. OLAP attempts to analyze complex data in real time on a database that is constantly updated with transactional data. The OLAP optimizes the searching of huge data files by means of automatic generation of SQL queries. OLAP allows user access, analysis and modeling of business problems and sharing of information that is stored in data warehouses. OLAP tools use data mining techniques and statistical methods to create readable, fast report generation that is used for forecasting that can further assist in strategic decision making. These reports are generated based on a manager‘s pre-defined criteria (dimensions). OLAP is an improvement to earlier single dimensional analysis tools that allowed managers to analyze data from only one perspective at a time. By providing managers with a multi-dimensional tool, OLAP enables managers to analyze data from multiple perspectives and explore it in order to discover hidden information (Matei, 2010). 4. Data mining. Data mining techniques are designed to identify relationships and rules within a data warehouse, then create a report of these relationships and rules. The data mining process involves discovering various patterns, generalizations, regularities and rules in data resources. Knowledge from data mining may be used to predict an outcome of a decision and can also describe reality. The predictions generated by data mining use known variables to predict the outcome of a situation, while reality is measured by graphing, tabling, and creating formulas based on the existing data. Strategies for Data Mining There are several basic strategies for data mining. The most common are: These strategies can be aligned with the needs of an organization and help decision making by discovering various patterns, generalizations, regularities and rules in data resources. Examples of these strategies in business include using market basket analysis to model retail sales or classification to classify unstructured data, such as email, as spam or a legitimate piece of correspondence, such as business or personal information. Data Mining Case: Nisha is not much an internet buyer, so she prefer to do her shopping by going to a physical store. She received a call from her childhood friend that she would be visiting her over the weekend. Nisha was scheduled to be on an overseas assignment all of that week and would be back in India just in time to receive her friend at home. Her friend is from Maharashtrian
  • 24. family. She wanted to gift her statute of God Ganesha. With very little time on hand, she reluctant decided to check it out online. To her surprise, she was able to find numerous statues in varied sizes, forms and made of different materials, many more options that she could have found in any physical store. It made her job so simple to click, compare and decide based on several parameters that would define her final purchase. The website also provided guidance in the form of other customers profiled in Nisha‘s segment who had purchased a statue she had viewed and simultaneously also purchased or viewed a similar item. There was a listing of items that were frequently bought together to assist her in the purchasing decision. What else did she need? From the cosy sofa of her drawing room, she got a choice of multiple products and recommendations for purchase. Product description, price, delivery options, discount, flexible payment methods – all bundled together. She was tired looking around the websites that evening and hence decided to defer her purchase on next morning. Next day she was flooded with digital advertisements of Ganesha idol across websites. In the described above, the website has used data mining techniques to provide customers like Nisha with the best surfing and shopping experience online. Understanding her requirements as soon as possible and then working towards providing her and similar customers the ease of shopping to decide, select and buy the product of their choice from innumerable options is a key feature of data mining techniques. A large number of data mining techniques are being used by organisations to manage their customers by getting a better understanding about their needs and purchase behaviour for improving organisational processes for efficient operations, managing employees and other stakeholders, financial decisions and strategic needs. Data mining is used to describe the method of discovering or mining knowledge from large reserves of data. It is a term used to describe the process through which previously unknown patterns in data are discovered. According to Fayyad, et.al (1996) data mining is defines as ―the nontrivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data stored in structured databases‖. Data Mining Techniques/Methods 1.Classification: This analysis is used to retrieve important and relevant information about data, and metadata. This data mining method helps to classify data in different classes. 2. Time series analysis: Series of data points indexed in time order. Most commonly a time series is a sequence taken at successive equally spaced points in time. Thus it a sequence of discrete time data. 3. Market Basket Analysis (MBA): It is a modelling technique based up on the theory that if you buy a certain group of items you are more likely to buy a another group of items. These set of items a customer buys is
  • 25. referred as an item set. And market basket analysis seeks to find relationships between purchases. It helps to understand customer purchase and its pattern. 4. Clustering: Clustering analysis is a data mining technique to identify data that are like each other. This process helps to understand the differences and similarities between the data. 5. Regression: Regression analysis is the data mining method of identifying and analyzing the strength of relationship between one dependent variable and a series of other changing variables (independent variables). The term regression is coined by Francis Galton in the 19th century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average. Types: Linear regression and Logistic regression. 6. Association Rules: This data mining technique helps to find the association between two or more Items. It discovers a hidden pattern in the data set. 7. Outer detection: This type of data mining technique refers to observation of data items in the dataset which do not match an expected pattern or expected behavior. This technique can be used in a variety of domains, such as intrusion, detection, fraud or fault detection, etc. Outer detection is also called Outlier Analysis or Outlier mining. 8. Sequential Patterns: This data mining technique helps to discover or identify similar patterns or trends in transaction data for certain period. 9. Estimation: This is the process of finding an approximation which has a value that is usable for some purpose even if input data may be incomplete, uncertain or unusable. 10. Prediction: It is a technique used to predict future outcomes. Eg: predictive analysis can be used to detect incidents that led to the crime and identify the criminals behind them as well. Prediction has used a combination of the other data mining techniques like trends, sequential patterns, clustering, classification, etc. It analyzes past events or instances in a right sequence for predicting a future event. Benefits of Data Mining:
  • 26. 1. Data mining technique helps companies to get knowledge-based information. 2. Data mining helps organizations to make the profitable adjustments in operation and production. 3. The data mining is a cost-effective and efficient solution compared to other statistical data applications. 4. Data mining helps with the decision-making process. 5. Facilitates automated prediction of trends and behaviors as well as automated discovery of hidden patterns. 6. It can be implemented in new systems as well as existing platforms 7. It is the speedy process which makes it easy for the users to analyze huge amount of data in less time. Disadvantages of Data Mining 1. There are chances of companies may sell useful information of their customers to other companies for money. For example, American Express has sold credit card purchases of their customers to the other companies. 2. Many data mining analytics software is difficult to operate and requires advance training to work on. 3. Different data mining tools work in different manners due to different algorithms employed in their design. Therefore, the selection of correct data mining tool is a very difficult task. 4. The data mining techniques are not accurate, and so it can cause serious consequences in certain conditions. Summary: 1. Data Mining is all about explaining the past and predicting the future for analysis. 2. Data mining helps to extract information from huge sets of data. It is the procedure of mining knowledge from data. 3. Data mining process includes business understanding, Data Understanding, Data Preparation, Modelling, Evolution, Deployment. 4. Important Data mining techniques are Classification, clustering, Regression, Association rules, Outer detection, Sequential Patterns, and prediction 5. R-language and Oracle Data mining are prominent data mining tools. 6. Data mining technique helps companies to get knowledge-based information. 7. The main drawback of data mining is that many analytics software is difficult to operate and requires advance training to work on. 8. Data mining is used in diverse industries such as Communications, Insurance, Education, Manufacturing, Banking, Retail, Service providers, eCommerce, Supermarkets Bioinformatics. Data Mining Applications Applications Usage
  • 27. Communications Data mining techniques are used in communication sector to predict customer behavior to offer highly targetted and relevant campaigns. Insurance Data mining helps insurance companies to price their products profitable and promote new offers to their new or existing customers. Education Data mining benefits educators to access student data, predict achievement levels and find students or groups of students which need extra attention. For example, students who are weak in maths subject. Manufacturing With the help of Data Mining Manufacturers can predict wear and tear of production assets. They can anticipate maintenance which helps them reduce them to minimize downtime. Banking Data mining helps finance sector to get a view of market risks and manage regulatory compliance. It helps banks to identify probable defaulters to decide whether to issue credit cards, loans, etc. Retail Data Mining techniques help retail malls and grocery stores identify and arrange most sellable items in the most attentive positions. It helps store owners to comes up with the offer which encourages customers to increase their spending. Service Providers Service providers like mobile phone and utility industries use Data Mining to predict the reasons when a customer leaves their company. They analyze billing details, customer service interactions, complaints made to the company to assign each customer a probability score and offers incentives. E-Commerce E-commerce websites use Data Mining to offer cross-sells and up-sells through their websites. One of the most famous names is Amazon, who use Data mining techniques to get more customers into their eCommerce store. Super Markets Data Mining allows supermarket's develope rules to predict if their shoppers were likely to be expecting. By evaluating their buying pattern, they could find woman customers who are most likely pregnant. They can start targeting products like baby powder, baby shop, diapers and so on. Crime Data Mining helps crime investigation agencies to deploy police workforce (where
  • 28. Investigation is a crime most likely to happen and when?), who to search at a border crossing etc. Bioinformatics Data Mining helps to mine biological data from massive datasets gathered in biology and medicine. The Specific Role of Each Component in a Business Intelligence System Business intelligence systems as a means to exploit information in order to help managers solve their structured and unstructured problems. Each component of a business intelligence system can be used to exploit information in one or more of these selected managerial decision-making actions: (a) acquiring information (b) searching/gathering information (c) analyzing information and (d) delivery of information. By analyzing historical data, business intelligence systems strive to eliminate communication barriers that exist at the different organizational levels within a company. These barriers are considered noise during the decision-making process. By allowing decisions to be made using consistent information, this method of analysis enables managers to evaluate former activities and direct future actions. BI System Components Aligned with Managerial Decision-Making Actions Business Intelligence System Component Managerial Information Actions ETL Tools Acquiring/Searching information Data Warehouses Acquiring/Searching information OLAP Techniques Analyzing and Delivery Data Mining Analyzing and Delivery Business Intelligence System Component Acquiring/gathering information: Acquiring information has become increasingly more difficult as modern organizations adopt more distributed information systems in which to store their business critical data.This action is used to find the business issue. This action utilizes ETL tools, directing the processes to find what information is needed and into which data warehouse to deposit that information. Searching information: After the data are extracted from operational databases the newly loaded high quality data are mined using data mining techniques and processes. This action is performed at different levels of data quality. Lower quality data are searched by utilizing ETL tools. The more refined or mature an ETL tool, the higher the data quality of the data being loaded into a data warehouse. Analyzing information: Managers need to create data models to understand and address business issues. Through data pre-processing and applying OLAP and data mining techniques
  • 29. managers can analyze information from multiple dimension at varying degrees of granularity, and tasked with a different level of analysis. For example, information derived through analysis directly affects decisions related to promotional campaigns, forecasting sales and financial results and, in some cases, can be used in fraud detection. OLAP summarizes data and makes forecasts based on historical data. Data mining discovers hidden patterns in data. Data mining operates at a detail level instead of a summary level. In other words, data mining predicts, while OLAP forecasts. Data mining and OLAP can be used to analyze: (a) financial data: analyzing and reporting on costs, revenue and profitability (b) marketing data: analyzing sales receipts, sales profitability, sales target, actions taken by competitors (c) customer data: analyzing time of contact, customer profitability, customer behavior, customer satisfaction, and customer loyalty (d) production data: analyzing production bottle necks, delayed orders, in-process materials, tool up-time (e) logistical data: analyzing relationships in a supply chain and delivery partnerships (f) wage related data: analyzing wage types, payroll surcharges, payroll collections, employee contributions, and average wages (g) personal data: analyzing employee turnover, employee type, presentation of information related to individual data Delivery of information: Data mining is also used in the delivery of information within an organization. In business intelligence systems, data mining can not only interpret, and evaluate results generated from the analysis performed on data stored in a data warehouse, but it can also display reports enabling decision makers to discover various patterns, generalizations, and regularities . In the same way, OLAP creates ad hoc report generation using simpler data mining techniques by summarizing data without the pattern matching that is unique to the data mining process. How Business Intelligence Systems can be used to Better Facilitate Business Decision Making at Each Level of Management By utilizing business intelligence systems organizations are collecting, treating and diffusing information with the objective of reducing uncertainty in the making of decisions. These decisions are often made under pressure, almost always at critical times in which businesses need real-time data. A business intelligence system allows managers to make decisions using real time data by monitoring competition, carrying out constant analysis of numerous data and considering different variants of organization performance. Data is extracted from operational databases, customer databases, and from data collected pertaining to the competition. The business intelligence system extracts this data from these various data sources, transforms it into specified formats, and then loads the newly formatted data into specially designated data warehouses that are available to all three levels of decision making within the organization: operational, strategic, and tactical . Each level of the organization will utilize different OLAP techniques and data mining process to analyze data and report information that is most relevant to them. The information generated from the business intelligence system will be used in all decision-making processes. At the strategic level, decisions set objectives and push the decision direction to
  • 30. the tactical level of the organization. At the tactical level information is mined from the business intelligence system to develop tactics to realize the strategic objectives and, in-turn, will push a decision down to the operational level of the organization. Both the tactical and operational levels of management are reactive to the strategic decisions of the organization. Even with a shared objective, different levels of the organization will utilize information for different purposes. At strategic and tactical levels, information provides input to senior managers; at the operational levels, information provides input to lower level managers. Operational level decisions. At the operational level, decisions affect or are related to the ongoing operations of an organization. These decisions are generally based on up-to-date financial data, sales and co-operation with suppliers and customers . Data are the life blood of daily operations in an organization and business intelligence takes that data and presents it to decision makers in the form of information. Business intelligence systems provide information used at the operational level of an organization to address the following specific actions : 1. identify problems and ‗bottlenecks‘ 2. provide analysis of ―the best‖ and ―the worst‖ 3. provide analysis of products 4. provide analysis of employees 5. provide analysis of regions (using measurable metrics such as sales, costs or quantifiable results) 6. perform ad-hoc analysis and answer questions related to departments ongoing operations, up to date financial standing and sales. Operational level decisions are noted as being the decisions that allow an organization to run its day-to-day activities (Esat et al., 2007). The information provided by the business intelligence system is at a summary level and the data feed into the business intelligence system from the operational level of an organization is analyzed and combined with other external information to create direction and allow for strategic planning to occur. Tactical level decisions. Decisions made at the tactical level are related to planning and rely on real-time data and forecasting to direct the future actions of marketing, sales, finance and capital management. Tactical decisions are often used to support strategic decisions. The literature details these related tactical decision-making activities as being supported by business intelligence systems: 1. analyses of deviations from the realization of plans for particular organizational units, individuals or indicators 2. decisions related to the direction of marketing, sales, finance and capital management 3. forecasting of demand for a given product or service The information derived through these activities allows for optimizing future actions and for modifying organizational aspects of the company's performance. Strategic level decisions. Strategic level decisions set objectives as well as ensure that those objectives are realized. Business intelligence systems provide information in support of strategic decision related to the development of future results based on historical results, profitability of offers (made or received) and the effectiveness of distribution channels. asserts strategic decisions use business information systems to create forecasts based on historical data from the past, combining it with current performance and then to estimate how
  • 31. conditions will play out in the future. Based on the literature, information provided by business intelligence systems inform these kinds of decisions made at the strategic level: 1. whether to enter new markets 2. the possibility of changing a company's orientation from product-centric to customer centric 3. the launch of a new product (Watson & Wixom, 2007, p.97) 4. what objectives to set and to follow through on the realization of such established objectives (Olszak & Ziemba, 2007)