SlideShare ist ein Scribd-Unternehmen logo
1 von 73
KIT601
DATA ANALYTICS
P R AV E E N S A C H A N
UNIT I
• Introduction to Data Analytics:
• Sources and nature of data,
• classification of data (structured, semi-structured, unstructured),
• characteristics of data,
• introduction to Big Data platform,
• need of data analytics,
• evolution of analytic scalability,
• analytic process and tools,
• analysis vs reporting,
• modern data analytic tools,
• applications of data analytics.
UNIT I
• Data Analytics Lifecycle:
• Need,
• key roles for successful analytic projects,
• various phases of data analytics lifecycle –
– discovery,
– data preparation,
– model planning,
– model building,
– communicating results,
– operationalization.
1. INTRODUCTION TO DATA
ANALYTICS:
Data Analytics refers to the techniques used to analyze data to enhance productivity
and business gain.
Data is extracted from various sources and is cleaned and categorized to analyze
various behavioral patterns. The techniques and the tools used vary according to the
organization or individual.
Why is Data Analytics important?
• Data Analytics has a key role in improving your business as it is used to gather hidden
insights, generate reports, perform market analysis, and improve business
requirements.
With today’s technology, companies are able to collect tremendous amounts of
data with relative ease. Indeed, many companies now have more data than they
can handle. However, the data are usually meaningless until they are analyzed for
trends, patterns, relationships, and other useful information.
WHAT IS THE ROLE OF DATA
ANALYTICS?
• Gather Hidden Insights – Hidden insights from data are gathered and
then analyzed with respect to business requirements.
• Generate Reports – Reports are generated from the data and are
passed on to the respective teams and individuals to deal with further
actions for a high rise in business.
• Perform Market Analysis – Market Analysis can be performed to
understand the strengths and weaknesses of competitors.
• Improve Business Requirement – Analysis of Data allows
improving Business to customer requirements and experience.
WHAT ARE THE TOOLS USED IN DATA
ANALYTICS?
• With the increasing demand for Data Analytics in the market, many tools have
emerged with various functionalities for this purpose. Either open-source or user-
friendly, the top tools in the data analytics market are as follows.
• R programming
• Python
• Tableau Public
• Qlik View
• SAS
• Microsoft Excel
WHAT ARE THE TOOLS USED IN DATA
ANALYTICS?
• RapidMiner
• KNIME – Konstanz Information Miner (KNIME)
• OpenRefine
• Apache Spark
1.1 SOURCES AND NATURE OF DATA
Different Sources of Data for Data Analysis
• Data collection is the process of acquiring, collecting, extracting, and storing the
voluminous amount of data which may be in the structured or unstructured form like text,
video, audio, XML files, records, or other image files used in later stages of data analysis.
• In the process of big data analysis, “Data collection” is the initial step before starting to
analyze the patterns or useful information in data. The data which is to be analyzed must
be collected from different valid sources.
• The data which is collected is known as raw data which is not useful now but on cleaning
the impure and utilizing that data for further analysis forms information, the information
obtained is known as “knowledge”. Knowledge has many meanings like business
knowledge or sales of enterprise products, disease treatment, etc. The main goal of data
collection is to collect information-rich data.
SOURCES AND NATURE OF DATA
• Data collection starts with asking some questions such as what type of data is
to be collected and what is the source of collection. Most of the data collected
are of two types known as
• “qualitative data“ which is a group of non-numerical data such as words,
sentences mostly focus on behavior and actions of the group and another one
is
• “quantitative data” which is in numerical forms and can be calculated using
different scientific tools and sampling data.
The actual data is then further divided mainly into two types known as:
• Primary data
• Secondary data
METHODS OF COLLECTING PRIMARY
DATA:
1. Interview method:
2. Survey method:
3. Observation method:
4. Experimental method:
• The experimental method is the process of collecting data through performing
experiments, research, and investigation. The most frequently used experiment
methods are CRD, RBD, LSD, FD.
– CRD- Completely Randomized design
– RBD- Randomized Block Design
– LSD – Latin Square Design
– FD- Factorial design
SECONDARY DATA:
Secondary data is the data which has already been collected and reused
again for some valid purpose. This type of data is previously recorded
from primary data and it has two types of sources named internal source
and external source.
• Internal source:
• External source:
• Other sources:
– Sensors data:
– Satellites data:
– Web traffic:
1.2 CLASSIFICATION OF DATA
• Structured data –
Structured data is data whose elements are addressable for effective analysis. It has
been organized into a formatted repository that is typically a
database. Example: Relational data.
• Semi-Structured data –
Semi-structured data is information that does not reside in a relational database but
that have some organizational properties that make it easier to analyze. Example:
XML data.
• Unstructured data –
Unstructured data is a data which is not organized in a predefined manner or does not
have a predefined data model, thus it is not a good fit for a mainstream relational
database. Example: Word, PDF, Text, Media, logs.
1.3 CHARACTERISTICS OF DATA
The seven characteristics that define data quality are:
– Accuracy and Precision
– Legitimacy and Validity
– Reliability and Consistency
– Timeliness and Relevance
– Completeness and Comprehensiveness
– Availability and Accessibility
– Granularity and Uniqueness
1.4 INTRODUCTION TO BIG DATA
PLATFORM
• Big data platform generally consists of servers, database,
business intelligence and other management utilities and
tools.
• It also supports custom development, querying and integration
with other systems.
• The primary benefit behind a big data platform is to reduce the
complexity of multiple vendors/ solutions into a one cohesive
solution.
• Big data platform are also delivered through cloud where the
provider provides an all inclusive big data solutions and services.
ESSENTIAL COMPONENTS OF BIG DATA
PLATFORM
There are many essential components which are given as follows:
• Data Ingestion, Management, ETL, and Warehouse – It provides
these resources for effective data management and effective data
warehousing, and this manages data as a valuable resource.
• Stream Computing – Helps compute the streaming data that is
used for real-time analytics.
• Analytics/ Machine Learning – Features for advanced analytics
and machine learning.
• Integration – It provides its user with features like integrating big
data from any source with ease.
• Data Governance – It also provides comprehensive security, data
governance, and solutions to protect the data.
ESSENTIAL COMPONENTS OF BIG
DATA PLATFORM
• Provides Accurate Data – It delivers with analytic tools which in turn helps to omit any
inaccurate data that has not been analyzed. This also helps the business to make the
right decision by utilizing accurate information.
• Scalability – It also helps scale the application to analyze all time climbing data; it
sizes to provide efficient analysis. It offers scalable storage capacity.
• Price Optimization – Data analytics with the help of a big data platform provides
insight for B2C and B2B enterprises which helps the business to optimize the prices
they charge accordingly.
• Reduced Latency – With the set of the warehouse, analytics tools, and efficient Data
transformation, it helps to reduce the data latency and provide high throughput.
1.5 NEED OF DATA ANALYTICS
• “Data Analytics refers to qualitative and quantitative techniques and processes
used to enhance productivity and business gain.”
• Data is extracted, acknowledged and bifurcated to identify and analyze behavioral
data, techniques and patterns can be dynamic according to a particular business’s
need or requirement.
• Data Analytics is a broader term that has analysis as a subhead and analytics is
basically the concepts used to do the analysis.
WHAT IS DATA ANALYTICS?
• The term data analytics refers to the process of examining datasets to draw conclusions
about the information they contain. Data analytic techniques enable you to take raw data
and uncover patterns to extract valuable insights from it.
• Today, many data analytics techniques use specialized systems and software that
integrate machine learning algorithms, automation and other capabilities.
• Data Scientists and Analysts use data analytics techniques in their research, and
businesses also use it to inform their decisions. Data analysis can help companies better
understand their customers, evaluate their ad campaigns, personalize content, create
content strategies and develop products. Ultimately, businesses can use data analytics to
boost business performance and improve their bottom line.
WHAT IS DATA ANALYTICS?
• For businesses, the data they use may include historical data or new information
they collect for a particular initiative. They may also collect it first-hand from their
customers and site visitors or purchase it from other organizations.
• Data a company collects about its own customers is called first-party data, data a
company obtains from a known organization that collected it is called second-party
data, and aggregated data a company buys from a marketplace is called third-party
data. The data a company uses may include information about an audience’s
demographics, their interests, behaviors and more.
4 WAYS TO USE DATA ANALYTICS
1. Improved Decision Making
2. More Effective Marketing
3. Better Customer Service
4. More Efficient Operations
DATA ANALYTICS TECHNOLOGY
Machine learning:
• Artificial intelligence (AI) is the field of developing and using computer systems that can
simulate human intelligence to complete tasks.
• Machine learning (ML) is a subset of AI that is significant for data analytics and involves
algorithms that can learn on their own.
• ML enables applications to take in data and analyze it to predict outcomes without
someone explicitly programming the system to reach that conclusion.
DATA ANALYTICS TECHNOLOGY
Data management:
• Before you can analyze data, you need to have procedures in place
for managing the flow of data in and out of your systems and keeping
your data organized.
• You also need to ensure that your data is high-quality and that you
collect it in a central data management platform (DMP) where it’s
available for use when needed. Establishing a data management
program can help ensure that your organization is on the same page
regarding how to organize and handle data.
DATA ANALYTICS TECHNOLOGY
Data mining:
• The term data mining refers to the process of sorting through large amounts of
data to identify patterns and discover relationships between data points.
• It enables you to sift through large datasets and figure out what’s relevant.
• You can then use this information to conduct analyses and inform your
decisions.
• Today’s data mining technologies allow you to complete these tasks
exceptionally quickly.
DATA ANALYTICS TECHNOLOGY
Predictive analytics:
• Predictive analytics technology helps you analyze historical data to
predict future outcomes and the likelihood of various outcomes
occurring.
• These technologies typically use statistical algorithms and machine
learning.
• More accurate predictions means businesses can make better
decisions moving forward and position themselves to succeed. It
allows them to anticipate their customers’ needs and concerns,
predict future trends and stay ahead of the competition.
1.6 EVOLUTION OF ANALYTIC
SCALABILITY
Analytics 1.0 → Need for Business Intelligence:
• This was the uprising of Data warehouse where customer (Business) and production
processes (Transactions) were centralized into one huge repository like eCDW
(Enterprise Consolidated Data Warehouse) . A real progress was established in gaining
an objective, deep understanding of important business phenomena — thereby giving
managers the fact-based comprehension to go beyond intuition when making decisions.
• The data surrounding eCDW was captured, transformed and queried using ETL & BI
tools. The type of analytics exploited during this phase was mainly classified
as Descriptive (what happened) and Diagnostic (why something happened).
EVOLUTION OF ANALYTIC
SCALABILITY
Analytics 2.0 → Big Data:
•
Analytics 3.0 → Data Enriched Offerings:
• The pioneering big data firms began investing in analytics to support customer-facing
products, services, and features.
• They attracted viewers to their websites through better search algorithms,
recommendations , suggestions for products to buy, and highly targeted ads, all driven
by analytics rooted in enormous amounts of data.
EVOLUTION OF ANALYTIC
SCALABILITY
• Analytics 4.0 → Automated Capabilities:
• There have always been four types of analytics:
– Descriptive, which reports on the past;
– Diagnostic, which uses the data of the past to study the present;
– Predictive, which uses insights based on past data to predict the future; and
– Prescriptive, which uses models to specify optimal behaviors and actions
• Neural Machine Translation, Smart Reply, Chat-bots, and Meeting Assistants
EVOLUTION OF ANALYTIC
SCALABILITY
• Analytics 5.0 → Future of Analytics and Whats Next ???:
• We could reframe the threat of automation as an opportunity
for augmentation: combining smart humans and smart machines to achieve an
overall better result.
• Now, instead of pondering “What tasks currently employed by humans will soon
be replaced by machines?”
• Most organizations that are exploring “cognitive” technologies — smart machines
that automate aspects of decision-making processes — are just putting a toe in the
water.
• There will be no more manual interventions necessary with just an AI-powered
system to steer your personal day-to-day activities.
1.7 ANALYTIC PROCESS AND TOOLS
• Making Smarter and More Efficient Organization
• Optimize Business Operations by Analyzing Customer Behavior
• Cost Reduction
• New Generation Products
STAGES IN DATA ANALYTICS
• These are the following stages involved in the Data Analytics process:
TYPES OF DATA ANALYTICS
• Descriptive Analytics: It uses data aggregation and data mining to provide insight
into the past and answer: “What has happened?” The descriptive analytics does
exactly what the name implies they “describe” or summarize raw data and make it
interpretable by humans.
• Predictive Analytics: It uses statistical models and forecasts techniques to
understand the future and answer: “What could happen?” Predictive analytics
provides companies with actionable insights based on data. It provides estimates
about the likelihood of a future outcome.
TYPES OF DATA ANALYTICS
• Prescriptive Analytics: It uses optimization and simulation algorithms to advice on
possible outcomes and answers: “What should we do?” It allows users to
“prescribe” a number of different possible actions and guide them towards a solution.
In a nutshell, this analytics is all about providing advice.
• Diagnostic Analytics: It is used to determine why something happened in the past. It
is characterized by techniques such as drill-down, data discovery, data mining and
correlations. Diagnostic analytics takes a deeper look at data to understand the root
causes of the events.
BIG DATA TOOLS
• These are some of the following tools used for Data Analytics:
• Hadoop,
• Pig,
• Apache HBase,
• Apache Spark,
• Talend,
• Splunk,
• Apache Hive,
• Kafka.
1.8 ANALYSIS VS. REPORTING
Reporting and analysis are very different in terms of their purpose, tasks, outputs, delivery,
and value.
• Reporting: The process of organizing data into informational summaries in order to monitor
how different areas of a business are performing
• Analysis: The process of exploring data and reports in order to extract meaningful insights,
which can be used to better understand and improve business performance.
• Reporting translates raw data into information.
• Analysis transforms data and information into insights.
ANALYSIS VS. REPORTING
1.9 MODERN DATA ANALYTIC TOOLS
• Data Analysis is the technique by which raw data is transformed into useful
statistics, insights, and explanations to make Data-driven business decisions.
Data Analysis has become the cornerstone of modern business operations.
•
• It is a daunting task to choose the best Data analytics tool since no tool fits every need.
Let’s look at the key factors for choosing between the Data analytics tools and then
explore some of the most popular Data analytics tools available in the market today.
1) WHAT ARE DATA ANALYST TOOLS?
• The term ‘Data analytics tools’ is used to classify software and applications used by
Data Analysts to create and execute analytic processes that help businesses make
smarter, more informed business decisions while minimizing cost and boosting profits.
2) HOW TO CHOOSE A DATA ANALYST
TOOL?
• How do you find one amongst several Data analytics tools that’s a good fit for
your company?
• Start by considering your company’s business requirements and learning that who
will be using the Data analytics tools. Will it be used by seasoned Data Analysts and
Data Scientists or non-technical users who need an intuitive interface?
• Some Data analytics tools provide an immersive experience in code creation, generally
with SQL, while others are more concerned with click-and-point review best suited for
fresher’s. The Data analytics software should also offer support for visualizations
relevant to your business goals.
2) HOW TO CHOOSE A DATA ANALYST
TOOL?
• Consider the ability of Data analytics software to model data. Some support a
syntactic and semantic layer or can perform data modeling themselves. If you do not
wish to use one that does, you’ll have to use SQL or Data analytics tools like the data
build tool (dbt) to model your data before analysis.
• Finally, take price and licensing into consideration. Some Data analytics tools charge
license or subscription fees, while some Data analytics tools are free. The most
expensive Data analytics tools are not always the most comprehensive, and there
many robust and free Data analytics tools available in the market that shouldn’t be
overlooked.
25 MOST PROMINENT DATA ANALYTICS
TOOLS NEEDED TO BE AN EXPERT DATA
ANALYST
1. R
• R is now one of the most popular analytics tools in the industry. It has surpassed SAS
in usage and is now the Data analytics tool of choice, even for companies that can
easily afford SAS. Over the years, R has become a lot more robust. It handles large
data sets much better than it used to, say even a decade earlier. It has also become a
lot more versatile.
• 1800 new packages were introduced in R between April 2015 and April 2016. The total
number of R packages is now over 8000. There are some concerns about the sheer
number of packages, but this has certainly added a lot to R’s capabilities. R also
integrates very well with many Big Data platforms, which have contributed to its
success.
DATA ANALYTICS TOOLS
2. Python
• Python has been one of the favorite languages of programmers since its inception. The
main reason for its fame is the fact that it’s an easy-to-learn language that is also quite
fast. However, it developed into one of the powerful Data analytics tools with the
development of analytical and statistical libraries like NumPy, SciPy etc. Today, it
offers comprehensive coverage of statistical and mathematical functions.
• Increasingly, we are seeing programmers and other tech folks moving into analytics.
Most of these guys are already familiar with Python, and therefore, it has become a
Data analytics tool of choice for many data scientists.
DATA ANALYTICS TOOLS
3. Apache Spark
• Spark is another open-source processing engine that is built with a focus on analytics,
especially on unstructured data or huge volumes of data. Spark has become one of the
tremendously popular Data analytics tools in the last couple of years. This is because
of various reasons – easy integration with the Hadoop ecosystem being one of them.
Spark has its own machine learning library, which makes it ideal for analytics as well.
DATA ANALYTICS TOOLS
4. Apache Storm
• Storm is the Big Data tool of choice for moving data or when the data comes in as a
continuous stream. Spark works on static data. Storm is ideal for real-time analytics or
stream processing.
5. PIG and HIVE
• Pig and Hive are integral Data analytics tools in the Hadoop ecosystem that reduce the
complexity of writing MapReduce queries. Both these languages are like SQL (Hive
more so than Pig). Most companies that work with Big Data and leverage the Hadoop
platform use Pig and/or Hive.
DATA ANALYTICS TOOLS
6. SAS
• SAS continues to be one of the widely used Data analytics tools in the industry. Some
flexibility on pricing from the SAS Institute has helped its cause. SAS continues to be a
robust, versatile and easy to learn tool. SAS has added tons of new modules. Some of the
specialized modules that have been added in the recent past are – SAS analytics for IoT,
SAS Anti-money Laundering, and SAS Analytics Pro for Midsize Business.
7. Tableau
• Tableau is among the most easy-to-learn Data analytics tools that perform an effective job
of slicing and dicing your data and creating great visualizations and dashboards. Tableau
can create better visualizations than Excel and can most definitely handle much more data
than Excel can. If you want interactivity in your plots, then Tableau is surely the way to go.
DATA ANALYTICS TOOLS
8. Excel
• Excel is, of course, the most widely used Data analytics software in the world. Whether
you are an expert in R or Tableau, you will still use Excel for the grunt work. Non-
analytics professionals will usually not have access to tools like SAS or R on their
systems. But everyone has Excel. Excel becomes vital when the analytics team
interfaces with the business steam.
9. QlikView
• Qlikview and Tableau are essentially vying for the top spot amongst the data
visualization giants. Qlikview is supposed to be slightly faster than Tableau and gives
experienced users a bit more flexibility. Tableau has a more intuitive GUI and is easier
to learn.
DATA ANALYTICS TOOLS
• 10. Splunk
• Splunk is more popular than some of the more known Data analytics tools like Cloudera
and Hortonworks. It started as a ‘Google for log files’, which means its primary use was to
process machine log files data. It has now become much more than that. Splunk has great
visualization options, and a web interface makes it easy to use.
• 11. Microsoft Power BI
• Microsoft Power BI is a top business intelligence platform that offers support for dozens of
data sources. This Data analytics software allows users to create reports, displays and
dashboards and post them. For quick delivery, users may combine a group of dashboards
and reports into a Power BI app. The Power BI helps users create and implement
automatic models by applying Machine Learning with Azure Machine Learning.
DATA ANALYTICS TOOLS
• 12. SAP BusinessObjects
• SAP BusinessObjects provides a suite of Data analytics tools for data discovery, analysis,
and reporting. The tools are designed for novice technical users but also for carrying out
complex analyses. BusinessObjects incorporates Microsoft Office products, enabling
Business Analysts to easily reverse and switch between applications, like Excel and reports
from BusinessObjects. It also enables self-service predictive analytics.
• 13. Sisense
• Sisense is a Data analytics software aimed at aiding both technical developers and the
Business Analytics process and visualizing all of their business data. It offers a wide variety
of drag-and-drop software and interactive dashboards for collaboration. The Sisense
platform’s unique feature is its custom in-chip technology, which optimizes calculation to
utilize CPU caching instead of slower RAM. This can lead to 10-100 times faster
computation for certain workflows.
DATA ANALYTICS TOOLS
• 14. TIBCO Spotfire
• TIBCO Spotfire is a Data analytics software that provides natural language search and
AI-powered data insights. This is a comprehensive platform for viewing reports for both
mobile and desktop applications. Spotfire also offers point-and-click tools for predictive
analytics models.
• 15. Thoughtspot
• Thoughtspot is a Data analytics software that allows users to explore Data from
various sources through reports and natural language searches. The SpotIQ, its AI-
powered system, automatically seeks insights to help users discover trends they didn’t
know to search. It also enables users to automatically link tables from various Data
sources to break down Data silos.
DATA ANALYTICS TOOLS
• 16. Google Data Studio
• Google Data Studio is one of the popular free Data analytics tools for dashboarding and data
visualization that automatically integrates with most other Google applications, such as Google
Analytics, Google Ads, and Google BigQuery. Data Studio is perfect for those who need to evaluate
their Google data due to its convergence with other Google services. For example, marketers could
create dashboards to help analyze consumer conversion and retention for their Google Advertising
and Analytics results. Data Studio can run with Data from several other sources as long as the Data is
replicated first to BigQuery using a Data pipeline such as Stitch.
• 17. Grafana
• Grafana is another free, open-source Data analytics software for monitoring and observing metrics
across diverse databases and applications. It offers a real-time view into external processes and
warns users when such incidents occur. Grafana is widely used for tracking their applications by tech
and DevOps engineers.
DATA ANALYTICS TOOLS
• 18. Redash
• Redash is a light weight and cost-effective Data analytics software for querying data sources and
building visualizations. The code is open source, and for organizations that want to begin quickly, an
inexpensive host version is available. Redash’s heart is a query editor, which offers a quick interface
for requests, schemes and integration management. Search results are cached in Redash, and users
can automatically schedule updates.
• 19. Jupyter Notebook
• Jupyter Notebook is one of the robust free, open-source online Data analytics tools that can be
administered in a browser after installation using the Anaconda platform or Python’s package
manager, pip. It enables developers to generate reports with Live Code Data and views. This Data
analytics software supports more than 40 programming languages. Formerly known as IPython
Notebook, Jupyter Notebook was initially developed using Python. It enables developers to make use
of Python’s wide variety of analytics and visualization packages. The tool has a large group of users
who also use other languages.
DATA ANALYTICS TOOLS
• 20. IBM Cognos
• IBM Cognos is a Data analytics software for business intelligence with built-in AI tools to
show and clarify information concealed in plain English. It has automated Data preparation
software to automatically clean and aggregate Data sources, enabling the fast integration
and analysis of Data sources.
• 21. Mode
• Mode is a Data analytics software aimed at providing Data Scientists an easy and iterative
environment. It offers an interactive SQL editor and notebook environment for analysis and
visualization, and collaboration tools for novice users. Mode has a unique Helix Data
engine that streams and stores Data from external databases to allow swift and interactive
analysis. The Data Analysis supports up to ten GB of data in-memory.
DATA ANALYTICS TOOLS
• 22. KNIME
• KNIME is the abbreviation for the Konstanz Information Miner and is a free, open-source
Data analytics software that supports Data integration, processing, visualization, and
reporting. It integrates Machine Learning and Data mining libraries with minimal or no
programming requirements. KNIME is excellent for Data Scientists who do not inherently
have proficient programming skills and need to incorporate & process Data for building
Machine Learning and other statistical models. Its graphical interface facilitates point-and-
click analysis and modeling.
• 23. Looker
• Looker is one of the cloud-based business intelligence and Data analytics tools. It
automatically generates Data model to scan Data schemas and connect tables with Data
sources. Through an integrated code editor, it allows Data engineers to modify the created
models.
DATA ANALYTICS TOOLS
• 24. RapidMiner
• RapidMiner is a Data analytics software that caters to all the technology users need, from
integration, cleaning to Data transformation before they run predictive analytics and build
statistical models. Nearly all this is done by the users through a simple graphical interface.
RapidMiner can also be expanded by using R and Python and various third-party plugins
available on the organization’s marketplace.
• 25. Oracle Analytics Cloud
• Oracle Analytics Cloud is another suite of Cloud-based business intelligence and Data
analytics tools. It focuses on helping big corporations to transform their legacy systems into
a digital cloud platform. Users leverage its wide range of analytical features, from basic
visualizations to Machine Learning algorithms for deriving Data insights.
1.10 APPLICATIONS OF DATA
ANALYTICS
• APPLICATION OF ANALYTICS IN DIFFERENT FIELDS
• Not just one or two, the use of data analytics is in every field you can see around. Be it from
Online shopping, or Hi-tech industries, or the government, everyone uses data analytics to
help them in decision making, budgeting, planning, etc. The data analytics are employed in
various places like:
1. Transportation
2. Logistics and Delivery
3. Web Search or Internet Web Results
4. Manufacturing
5. Security
6. Education
7. Healthcare
8. Military
APPLICATIONS OF
DATA ANALYTICS
1.11 DATA ANALYTICS LIFECYCLE
• Data Analytics Lifecycle defines the roadmap of how information is generated,
collected, processed, used, and analyzed to achieve business goals.
• It offers a systematic way to manage data for converting it into information that can be
used to fulfill organizational and project goals.
• The process provides the direction and methods to extract information from the data
and proceed in the right direction to accomplish business goals.
1.12 NEED OF DATA ANALYTICS
LIFECYCLE
• The Data analytic lifecycle is designed for Big Data problems and data science
projects.
• The cycle is iterative to represent real project. To address the distinct requirements for
performing analysis on Big Data, step – by – step methodology is needed to organize
the activities and tasks involved with acquiring, processing, analyzing, and repurposing
data.
1.13 KEY ROLES FOR SUCCESSFUL
ANALYTIC PROJECTS
• Key Roles for a Data analytics project :
• Business User :The business user is the one who understands the main area of the
project and is also basically benefited from the results.
• This user gives advice and consult the team working on the project about the value of
the results obtained and how the operations on the outputs are done.
• The business manager, line manager, or deep subject matter expert in the project
mains fulfills this role.
KEY ROLES FOR A DATA
ANALYTICS PROJECT
• Project Sponsor :
– The Project Sponsor is the one who is responsible to initiate the project. Project Sponsor
provides the actual requirements for the project and presents the basic business issue.
– He generally provides the funds and measures the degree of value from the final output of
the team working on the project.
– This person introduce the prime concern and brooms the desired output.
•
KEY ROLES FOR A DATA
ANALYTICS PROJECT
• Project Manager :This person ensures that key milestone and purpose of the project
is met on time and of the expected quality.
• Business Intelligence Analyst :Business Intelligence Analyst provides business
domain perfection based on a detailed and deep understanding of the data, key
performance indicators (KPIs), key matrix, and business intelligence from a reporting
point of view.
• This person generally creates fascia and reports and knows about the data feeds and
sources.
KEY ROLES FOR A DATA
ANALYTICS PROJECT
• Database Administrator (DBA) :DBA facilitates and arrange the database
environment to support the analytics need of the team working on a project.
• His responsibilities may include providing permission to key databases or tables and
making sure that the appropriate security stages are in their correct places related to
the data repositories or not.
KEY ROLES FOR A DATA
ANALYTICS PROJECT
• Data Engineer :Data engineer grasps deep technical skills to assist with tuning SQL
queries for data management and data extraction and provides support for data intake
into the analytic sandbox.
• The data engineer works jointly with the data scientist to help build data in correct
ways for analysis.
KEY ROLES FOR A DATA
ANALYTICS PROJECT
• Data Scientist :
– Data scientist facilitates with the subject matter expertise for analytical techniques, data
modelling, and applying correct analytical techniques for a given business issues.
– He ensures overall analytical objectives are met.
– Data scientists outline and apply analytical methods and proceed towards the data available
for the concerned project.
1.14 VARIOUS PHASES OF DATA ANALYTICS LIFECYCLE
• Data discovery,
• Data preparation,
• Data model planning,
• Data model building,
• Communicating results, and
• Operationalization.
PHASE 1: DATA DISCOVERY AND
FORMATION
• During this process, the team learns about the business domain and checks whether
the business unit or organization has worked on similar projects to refer to any
learning’s.
• In this phase, the team also evaluates technology, people, data, and time.
• For example, while dealing with a small dataset, the team can use Excel.
PHASE 2: DATA PREPARATION AND
PROCESSING
• In this phase, the experts’ focus shifts from business requirements to information
requirements.
• One of the essential aspects of this phase is ensuring data availability for processing.
• The stage encompasses the collection, processing, and cleansing of the
accumulated data.
PHASE 3: DESIGN A MODEL
• This phase needs the availability of an analytic sandbox for the team to work with
data and perform analytics throughout the project duration. The team can load data in
several ways.
– Extract, Transform, Load (ETL) – It transforms the data based on a set of business rules
loading it into the sandbox.
– Extract, Load, Transform (ELT) – It loads the data into the sandbox and then transforms it
based on a set of business rules.
– Extract, Transform, Load, Transform (ETLT) – It’s the combination of ETL and ELT and
transformation levels.
PHASE 4: MODEL BUILDING
• They use various statistical modeling methods such as
• regression techniques,
• decision trees,
• random forest modeling, and
• neural networks and perform a trial run to determine whether it corresponds to the
datasets.
PHASE 5: RESULT COMMUNICATION
AND PUBLICATION
• This phase aims to determine whether the project results are a success or failure and
start collaborating with significant stakeholders. The team identifies the vital findings
of their analysis, measures the associated business value, and creates a
summarized narrative to convey the stakeholders’ results.
PHASE 6: MEASURING OF
EFFECTIVENESS
• In this final step, the team presents an in-depth report with coding, briefing, key
findings, and technical documents and papers to the stakeholders. Besides this, the
data is moved to a live environment and monitored to measure the analysis’s
effectiveness. If the findings are in line with the objective, the results and reports are
finalized. On the other hand, if they deviate from the set intent, the team moves
backward in the lifecycle to any previous phase to change the input and get a different
outcome.

Weitere ähnliche Inhalte

Ähnlich wie KIT601 Unit I.pptx

Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
AbdulrahimShaibuIssa
 

Ähnlich wie KIT601 Unit I.pptx (20)

Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
 
ERP technology Areas.pptx
ERP technology Areas.pptxERP technology Areas.pptx
ERP technology Areas.pptx
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
Business Intelligence and Analytics .pptx
Business Intelligence and Analytics .pptxBusiness Intelligence and Analytics .pptx
Business Intelligence and Analytics .pptx
 
lec1.pdf
lec1.pdflec1.pdf
lec1.pdf
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptx
 
Data Warehousing , Data Mining and BI.pptx
Data Warehousing , Data Mining and BI.pptxData Warehousing , Data Mining and BI.pptx
Data Warehousing , Data Mining and BI.pptx
 
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjnWHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
WHAT IS BUSINESS ANALYTICS um hj mnjh nit 1 ppt only kjjn
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data Analytics course.pptx
Data Analytics course.pptxData Analytics course.pptx
Data Analytics course.pptx
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Big data
Big dataBig data
Big data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Data Analytics Role in Digital Business & Business Process Management
Data Analytics Role in Digital Business & Business Process ManagementData Analytics Role in Digital Business & Business Process Management
Data Analytics Role in Digital Business & Business Process Management
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data Analytics
 
How to find new ways to add value to your audits
How to find new ways to add value to your auditsHow to find new ways to add value to your audits
How to find new ways to add value to your audits
 
Modern Information Systems
Modern Information SystemsModern Information Systems
Modern Information Systems
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand words
 
Data mining
Data miningData mining
Data mining
 

Kürzlich hochgeladen

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 

Kürzlich hochgeladen (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 

KIT601 Unit I.pptx

  • 1. KIT601 DATA ANALYTICS P R AV E E N S A C H A N
  • 2. UNIT I • Introduction to Data Analytics: • Sources and nature of data, • classification of data (structured, semi-structured, unstructured), • characteristics of data, • introduction to Big Data platform, • need of data analytics, • evolution of analytic scalability, • analytic process and tools, • analysis vs reporting, • modern data analytic tools, • applications of data analytics.
  • 3. UNIT I • Data Analytics Lifecycle: • Need, • key roles for successful analytic projects, • various phases of data analytics lifecycle – – discovery, – data preparation, – model planning, – model building, – communicating results, – operationalization.
  • 4. 1. INTRODUCTION TO DATA ANALYTICS: Data Analytics refers to the techniques used to analyze data to enhance productivity and business gain. Data is extracted from various sources and is cleaned and categorized to analyze various behavioral patterns. The techniques and the tools used vary according to the organization or individual. Why is Data Analytics important? • Data Analytics has a key role in improving your business as it is used to gather hidden insights, generate reports, perform market analysis, and improve business requirements. With today’s technology, companies are able to collect tremendous amounts of data with relative ease. Indeed, many companies now have more data than they can handle. However, the data are usually meaningless until they are analyzed for trends, patterns, relationships, and other useful information.
  • 5. WHAT IS THE ROLE OF DATA ANALYTICS? • Gather Hidden Insights – Hidden insights from data are gathered and then analyzed with respect to business requirements. • Generate Reports – Reports are generated from the data and are passed on to the respective teams and individuals to deal with further actions for a high rise in business. • Perform Market Analysis – Market Analysis can be performed to understand the strengths and weaknesses of competitors. • Improve Business Requirement – Analysis of Data allows improving Business to customer requirements and experience.
  • 6. WHAT ARE THE TOOLS USED IN DATA ANALYTICS? • With the increasing demand for Data Analytics in the market, many tools have emerged with various functionalities for this purpose. Either open-source or user- friendly, the top tools in the data analytics market are as follows. • R programming • Python • Tableau Public • Qlik View • SAS • Microsoft Excel
  • 7. WHAT ARE THE TOOLS USED IN DATA ANALYTICS? • RapidMiner • KNIME – Konstanz Information Miner (KNIME) • OpenRefine • Apache Spark
  • 8. 1.1 SOURCES AND NATURE OF DATA Different Sources of Data for Data Analysis • Data collection is the process of acquiring, collecting, extracting, and storing the voluminous amount of data which may be in the structured or unstructured form like text, video, audio, XML files, records, or other image files used in later stages of data analysis. • In the process of big data analysis, “Data collection” is the initial step before starting to analyze the patterns or useful information in data. The data which is to be analyzed must be collected from different valid sources. • The data which is collected is known as raw data which is not useful now but on cleaning the impure and utilizing that data for further analysis forms information, the information obtained is known as “knowledge”. Knowledge has many meanings like business knowledge or sales of enterprise products, disease treatment, etc. The main goal of data collection is to collect information-rich data.
  • 9. SOURCES AND NATURE OF DATA • Data collection starts with asking some questions such as what type of data is to be collected and what is the source of collection. Most of the data collected are of two types known as • “qualitative data“ which is a group of non-numerical data such as words, sentences mostly focus on behavior and actions of the group and another one is • “quantitative data” which is in numerical forms and can be calculated using different scientific tools and sampling data. The actual data is then further divided mainly into two types known as: • Primary data • Secondary data
  • 10.
  • 11. METHODS OF COLLECTING PRIMARY DATA: 1. Interview method: 2. Survey method: 3. Observation method: 4. Experimental method: • The experimental method is the process of collecting data through performing experiments, research, and investigation. The most frequently used experiment methods are CRD, RBD, LSD, FD. – CRD- Completely Randomized design – RBD- Randomized Block Design – LSD – Latin Square Design – FD- Factorial design
  • 12. SECONDARY DATA: Secondary data is the data which has already been collected and reused again for some valid purpose. This type of data is previously recorded from primary data and it has two types of sources named internal source and external source. • Internal source: • External source: • Other sources: – Sensors data: – Satellites data: – Web traffic:
  • 13. 1.2 CLASSIFICATION OF DATA • Structured data – Structured data is data whose elements are addressable for effective analysis. It has been organized into a formatted repository that is typically a database. Example: Relational data. • Semi-Structured data – Semi-structured data is information that does not reside in a relational database but that have some organizational properties that make it easier to analyze. Example: XML data. • Unstructured data – Unstructured data is a data which is not organized in a predefined manner or does not have a predefined data model, thus it is not a good fit for a mainstream relational database. Example: Word, PDF, Text, Media, logs.
  • 14. 1.3 CHARACTERISTICS OF DATA The seven characteristics that define data quality are: – Accuracy and Precision – Legitimacy and Validity – Reliability and Consistency – Timeliness and Relevance – Completeness and Comprehensiveness – Availability and Accessibility – Granularity and Uniqueness
  • 15. 1.4 INTRODUCTION TO BIG DATA PLATFORM • Big data platform generally consists of servers, database, business intelligence and other management utilities and tools. • It also supports custom development, querying and integration with other systems. • The primary benefit behind a big data platform is to reduce the complexity of multiple vendors/ solutions into a one cohesive solution. • Big data platform are also delivered through cloud where the provider provides an all inclusive big data solutions and services.
  • 16. ESSENTIAL COMPONENTS OF BIG DATA PLATFORM There are many essential components which are given as follows: • Data Ingestion, Management, ETL, and Warehouse – It provides these resources for effective data management and effective data warehousing, and this manages data as a valuable resource. • Stream Computing – Helps compute the streaming data that is used for real-time analytics. • Analytics/ Machine Learning – Features for advanced analytics and machine learning. • Integration – It provides its user with features like integrating big data from any source with ease. • Data Governance – It also provides comprehensive security, data governance, and solutions to protect the data.
  • 17. ESSENTIAL COMPONENTS OF BIG DATA PLATFORM • Provides Accurate Data – It delivers with analytic tools which in turn helps to omit any inaccurate data that has not been analyzed. This also helps the business to make the right decision by utilizing accurate information. • Scalability – It also helps scale the application to analyze all time climbing data; it sizes to provide efficient analysis. It offers scalable storage capacity. • Price Optimization – Data analytics with the help of a big data platform provides insight for B2C and B2B enterprises which helps the business to optimize the prices they charge accordingly. • Reduced Latency – With the set of the warehouse, analytics tools, and efficient Data transformation, it helps to reduce the data latency and provide high throughput.
  • 18. 1.5 NEED OF DATA ANALYTICS • “Data Analytics refers to qualitative and quantitative techniques and processes used to enhance productivity and business gain.” • Data is extracted, acknowledged and bifurcated to identify and analyze behavioral data, techniques and patterns can be dynamic according to a particular business’s need or requirement. • Data Analytics is a broader term that has analysis as a subhead and analytics is basically the concepts used to do the analysis.
  • 19.
  • 20.
  • 21. WHAT IS DATA ANALYTICS? • The term data analytics refers to the process of examining datasets to draw conclusions about the information they contain. Data analytic techniques enable you to take raw data and uncover patterns to extract valuable insights from it. • Today, many data analytics techniques use specialized systems and software that integrate machine learning algorithms, automation and other capabilities. • Data Scientists and Analysts use data analytics techniques in their research, and businesses also use it to inform their decisions. Data analysis can help companies better understand their customers, evaluate their ad campaigns, personalize content, create content strategies and develop products. Ultimately, businesses can use data analytics to boost business performance and improve their bottom line.
  • 22. WHAT IS DATA ANALYTICS? • For businesses, the data they use may include historical data or new information they collect for a particular initiative. They may also collect it first-hand from their customers and site visitors or purchase it from other organizations. • Data a company collects about its own customers is called first-party data, data a company obtains from a known organization that collected it is called second-party data, and aggregated data a company buys from a marketplace is called third-party data. The data a company uses may include information about an audience’s demographics, their interests, behaviors and more.
  • 23. 4 WAYS TO USE DATA ANALYTICS 1. Improved Decision Making 2. More Effective Marketing 3. Better Customer Service 4. More Efficient Operations
  • 24. DATA ANALYTICS TECHNOLOGY Machine learning: • Artificial intelligence (AI) is the field of developing and using computer systems that can simulate human intelligence to complete tasks. • Machine learning (ML) is a subset of AI that is significant for data analytics and involves algorithms that can learn on their own. • ML enables applications to take in data and analyze it to predict outcomes without someone explicitly programming the system to reach that conclusion.
  • 25. DATA ANALYTICS TECHNOLOGY Data management: • Before you can analyze data, you need to have procedures in place for managing the flow of data in and out of your systems and keeping your data organized. • You also need to ensure that your data is high-quality and that you collect it in a central data management platform (DMP) where it’s available for use when needed. Establishing a data management program can help ensure that your organization is on the same page regarding how to organize and handle data.
  • 26. DATA ANALYTICS TECHNOLOGY Data mining: • The term data mining refers to the process of sorting through large amounts of data to identify patterns and discover relationships between data points. • It enables you to sift through large datasets and figure out what’s relevant. • You can then use this information to conduct analyses and inform your decisions. • Today’s data mining technologies allow you to complete these tasks exceptionally quickly.
  • 27. DATA ANALYTICS TECHNOLOGY Predictive analytics: • Predictive analytics technology helps you analyze historical data to predict future outcomes and the likelihood of various outcomes occurring. • These technologies typically use statistical algorithms and machine learning. • More accurate predictions means businesses can make better decisions moving forward and position themselves to succeed. It allows them to anticipate their customers’ needs and concerns, predict future trends and stay ahead of the competition.
  • 28. 1.6 EVOLUTION OF ANALYTIC SCALABILITY Analytics 1.0 → Need for Business Intelligence: • This was the uprising of Data warehouse where customer (Business) and production processes (Transactions) were centralized into one huge repository like eCDW (Enterprise Consolidated Data Warehouse) . A real progress was established in gaining an objective, deep understanding of important business phenomena — thereby giving managers the fact-based comprehension to go beyond intuition when making decisions. • The data surrounding eCDW was captured, transformed and queried using ETL & BI tools. The type of analytics exploited during this phase was mainly classified as Descriptive (what happened) and Diagnostic (why something happened).
  • 29. EVOLUTION OF ANALYTIC SCALABILITY Analytics 2.0 → Big Data: • Analytics 3.0 → Data Enriched Offerings: • The pioneering big data firms began investing in analytics to support customer-facing products, services, and features. • They attracted viewers to their websites through better search algorithms, recommendations , suggestions for products to buy, and highly targeted ads, all driven by analytics rooted in enormous amounts of data.
  • 30. EVOLUTION OF ANALYTIC SCALABILITY • Analytics 4.0 → Automated Capabilities: • There have always been four types of analytics: – Descriptive, which reports on the past; – Diagnostic, which uses the data of the past to study the present; – Predictive, which uses insights based on past data to predict the future; and – Prescriptive, which uses models to specify optimal behaviors and actions • Neural Machine Translation, Smart Reply, Chat-bots, and Meeting Assistants
  • 31. EVOLUTION OF ANALYTIC SCALABILITY • Analytics 5.0 → Future of Analytics and Whats Next ???: • We could reframe the threat of automation as an opportunity for augmentation: combining smart humans and smart machines to achieve an overall better result. • Now, instead of pondering “What tasks currently employed by humans will soon be replaced by machines?” • Most organizations that are exploring “cognitive” technologies — smart machines that automate aspects of decision-making processes — are just putting a toe in the water. • There will be no more manual interventions necessary with just an AI-powered system to steer your personal day-to-day activities.
  • 32. 1.7 ANALYTIC PROCESS AND TOOLS • Making Smarter and More Efficient Organization • Optimize Business Operations by Analyzing Customer Behavior • Cost Reduction • New Generation Products
  • 33. STAGES IN DATA ANALYTICS • These are the following stages involved in the Data Analytics process:
  • 34. TYPES OF DATA ANALYTICS • Descriptive Analytics: It uses data aggregation and data mining to provide insight into the past and answer: “What has happened?” The descriptive analytics does exactly what the name implies they “describe” or summarize raw data and make it interpretable by humans. • Predictive Analytics: It uses statistical models and forecasts techniques to understand the future and answer: “What could happen?” Predictive analytics provides companies with actionable insights based on data. It provides estimates about the likelihood of a future outcome.
  • 35. TYPES OF DATA ANALYTICS • Prescriptive Analytics: It uses optimization and simulation algorithms to advice on possible outcomes and answers: “What should we do?” It allows users to “prescribe” a number of different possible actions and guide them towards a solution. In a nutshell, this analytics is all about providing advice. • Diagnostic Analytics: It is used to determine why something happened in the past. It is characterized by techniques such as drill-down, data discovery, data mining and correlations. Diagnostic analytics takes a deeper look at data to understand the root causes of the events.
  • 36. BIG DATA TOOLS • These are some of the following tools used for Data Analytics: • Hadoop, • Pig, • Apache HBase, • Apache Spark, • Talend, • Splunk, • Apache Hive, • Kafka.
  • 37. 1.8 ANALYSIS VS. REPORTING Reporting and analysis are very different in terms of their purpose, tasks, outputs, delivery, and value. • Reporting: The process of organizing data into informational summaries in order to monitor how different areas of a business are performing • Analysis: The process of exploring data and reports in order to extract meaningful insights, which can be used to better understand and improve business performance. • Reporting translates raw data into information. • Analysis transforms data and information into insights.
  • 39. 1.9 MODERN DATA ANALYTIC TOOLS • Data Analysis is the technique by which raw data is transformed into useful statistics, insights, and explanations to make Data-driven business decisions. Data Analysis has become the cornerstone of modern business operations. • • It is a daunting task to choose the best Data analytics tool since no tool fits every need. Let’s look at the key factors for choosing between the Data analytics tools and then explore some of the most popular Data analytics tools available in the market today.
  • 40. 1) WHAT ARE DATA ANALYST TOOLS? • The term ‘Data analytics tools’ is used to classify software and applications used by Data Analysts to create and execute analytic processes that help businesses make smarter, more informed business decisions while minimizing cost and boosting profits.
  • 41. 2) HOW TO CHOOSE A DATA ANALYST TOOL? • How do you find one amongst several Data analytics tools that’s a good fit for your company? • Start by considering your company’s business requirements and learning that who will be using the Data analytics tools. Will it be used by seasoned Data Analysts and Data Scientists or non-technical users who need an intuitive interface? • Some Data analytics tools provide an immersive experience in code creation, generally with SQL, while others are more concerned with click-and-point review best suited for fresher’s. The Data analytics software should also offer support for visualizations relevant to your business goals.
  • 42. 2) HOW TO CHOOSE A DATA ANALYST TOOL? • Consider the ability of Data analytics software to model data. Some support a syntactic and semantic layer or can perform data modeling themselves. If you do not wish to use one that does, you’ll have to use SQL or Data analytics tools like the data build tool (dbt) to model your data before analysis. • Finally, take price and licensing into consideration. Some Data analytics tools charge license or subscription fees, while some Data analytics tools are free. The most expensive Data analytics tools are not always the most comprehensive, and there many robust and free Data analytics tools available in the market that shouldn’t be overlooked.
  • 43. 25 MOST PROMINENT DATA ANALYTICS TOOLS NEEDED TO BE AN EXPERT DATA ANALYST 1. R • R is now one of the most popular analytics tools in the industry. It has surpassed SAS in usage and is now the Data analytics tool of choice, even for companies that can easily afford SAS. Over the years, R has become a lot more robust. It handles large data sets much better than it used to, say even a decade earlier. It has also become a lot more versatile. • 1800 new packages were introduced in R between April 2015 and April 2016. The total number of R packages is now over 8000. There are some concerns about the sheer number of packages, but this has certainly added a lot to R’s capabilities. R also integrates very well with many Big Data platforms, which have contributed to its success.
  • 44. DATA ANALYTICS TOOLS 2. Python • Python has been one of the favorite languages of programmers since its inception. The main reason for its fame is the fact that it’s an easy-to-learn language that is also quite fast. However, it developed into one of the powerful Data analytics tools with the development of analytical and statistical libraries like NumPy, SciPy etc. Today, it offers comprehensive coverage of statistical and mathematical functions. • Increasingly, we are seeing programmers and other tech folks moving into analytics. Most of these guys are already familiar with Python, and therefore, it has become a Data analytics tool of choice for many data scientists.
  • 45. DATA ANALYTICS TOOLS 3. Apache Spark • Spark is another open-source processing engine that is built with a focus on analytics, especially on unstructured data or huge volumes of data. Spark has become one of the tremendously popular Data analytics tools in the last couple of years. This is because of various reasons – easy integration with the Hadoop ecosystem being one of them. Spark has its own machine learning library, which makes it ideal for analytics as well.
  • 46. DATA ANALYTICS TOOLS 4. Apache Storm • Storm is the Big Data tool of choice for moving data or when the data comes in as a continuous stream. Spark works on static data. Storm is ideal for real-time analytics or stream processing. 5. PIG and HIVE • Pig and Hive are integral Data analytics tools in the Hadoop ecosystem that reduce the complexity of writing MapReduce queries. Both these languages are like SQL (Hive more so than Pig). Most companies that work with Big Data and leverage the Hadoop platform use Pig and/or Hive.
  • 47. DATA ANALYTICS TOOLS 6. SAS • SAS continues to be one of the widely used Data analytics tools in the industry. Some flexibility on pricing from the SAS Institute has helped its cause. SAS continues to be a robust, versatile and easy to learn tool. SAS has added tons of new modules. Some of the specialized modules that have been added in the recent past are – SAS analytics for IoT, SAS Anti-money Laundering, and SAS Analytics Pro for Midsize Business. 7. Tableau • Tableau is among the most easy-to-learn Data analytics tools that perform an effective job of slicing and dicing your data and creating great visualizations and dashboards. Tableau can create better visualizations than Excel and can most definitely handle much more data than Excel can. If you want interactivity in your plots, then Tableau is surely the way to go.
  • 48. DATA ANALYTICS TOOLS 8. Excel • Excel is, of course, the most widely used Data analytics software in the world. Whether you are an expert in R or Tableau, you will still use Excel for the grunt work. Non- analytics professionals will usually not have access to tools like SAS or R on their systems. But everyone has Excel. Excel becomes vital when the analytics team interfaces with the business steam. 9. QlikView • Qlikview and Tableau are essentially vying for the top spot amongst the data visualization giants. Qlikview is supposed to be slightly faster than Tableau and gives experienced users a bit more flexibility. Tableau has a more intuitive GUI and is easier to learn.
  • 49. DATA ANALYTICS TOOLS • 10. Splunk • Splunk is more popular than some of the more known Data analytics tools like Cloudera and Hortonworks. It started as a ‘Google for log files’, which means its primary use was to process machine log files data. It has now become much more than that. Splunk has great visualization options, and a web interface makes it easy to use. • 11. Microsoft Power BI • Microsoft Power BI is a top business intelligence platform that offers support for dozens of data sources. This Data analytics software allows users to create reports, displays and dashboards and post them. For quick delivery, users may combine a group of dashboards and reports into a Power BI app. The Power BI helps users create and implement automatic models by applying Machine Learning with Azure Machine Learning.
  • 50. DATA ANALYTICS TOOLS • 12. SAP BusinessObjects • SAP BusinessObjects provides a suite of Data analytics tools for data discovery, analysis, and reporting. The tools are designed for novice technical users but also for carrying out complex analyses. BusinessObjects incorporates Microsoft Office products, enabling Business Analysts to easily reverse and switch between applications, like Excel and reports from BusinessObjects. It also enables self-service predictive analytics. • 13. Sisense • Sisense is a Data analytics software aimed at aiding both technical developers and the Business Analytics process and visualizing all of their business data. It offers a wide variety of drag-and-drop software and interactive dashboards for collaboration. The Sisense platform’s unique feature is its custom in-chip technology, which optimizes calculation to utilize CPU caching instead of slower RAM. This can lead to 10-100 times faster computation for certain workflows.
  • 51. DATA ANALYTICS TOOLS • 14. TIBCO Spotfire • TIBCO Spotfire is a Data analytics software that provides natural language search and AI-powered data insights. This is a comprehensive platform for viewing reports for both mobile and desktop applications. Spotfire also offers point-and-click tools for predictive analytics models. • 15. Thoughtspot • Thoughtspot is a Data analytics software that allows users to explore Data from various sources through reports and natural language searches. The SpotIQ, its AI- powered system, automatically seeks insights to help users discover trends they didn’t know to search. It also enables users to automatically link tables from various Data sources to break down Data silos.
  • 52. DATA ANALYTICS TOOLS • 16. Google Data Studio • Google Data Studio is one of the popular free Data analytics tools for dashboarding and data visualization that automatically integrates with most other Google applications, such as Google Analytics, Google Ads, and Google BigQuery. Data Studio is perfect for those who need to evaluate their Google data due to its convergence with other Google services. For example, marketers could create dashboards to help analyze consumer conversion and retention for their Google Advertising and Analytics results. Data Studio can run with Data from several other sources as long as the Data is replicated first to BigQuery using a Data pipeline such as Stitch. • 17. Grafana • Grafana is another free, open-source Data analytics software for monitoring and observing metrics across diverse databases and applications. It offers a real-time view into external processes and warns users when such incidents occur. Grafana is widely used for tracking their applications by tech and DevOps engineers.
  • 53. DATA ANALYTICS TOOLS • 18. Redash • Redash is a light weight and cost-effective Data analytics software for querying data sources and building visualizations. The code is open source, and for organizations that want to begin quickly, an inexpensive host version is available. Redash’s heart is a query editor, which offers a quick interface for requests, schemes and integration management. Search results are cached in Redash, and users can automatically schedule updates. • 19. Jupyter Notebook • Jupyter Notebook is one of the robust free, open-source online Data analytics tools that can be administered in a browser after installation using the Anaconda platform or Python’s package manager, pip. It enables developers to generate reports with Live Code Data and views. This Data analytics software supports more than 40 programming languages. Formerly known as IPython Notebook, Jupyter Notebook was initially developed using Python. It enables developers to make use of Python’s wide variety of analytics and visualization packages. The tool has a large group of users who also use other languages.
  • 54. DATA ANALYTICS TOOLS • 20. IBM Cognos • IBM Cognos is a Data analytics software for business intelligence with built-in AI tools to show and clarify information concealed in plain English. It has automated Data preparation software to automatically clean and aggregate Data sources, enabling the fast integration and analysis of Data sources. • 21. Mode • Mode is a Data analytics software aimed at providing Data Scientists an easy and iterative environment. It offers an interactive SQL editor and notebook environment for analysis and visualization, and collaboration tools for novice users. Mode has a unique Helix Data engine that streams and stores Data from external databases to allow swift and interactive analysis. The Data Analysis supports up to ten GB of data in-memory.
  • 55. DATA ANALYTICS TOOLS • 22. KNIME • KNIME is the abbreviation for the Konstanz Information Miner and is a free, open-source Data analytics software that supports Data integration, processing, visualization, and reporting. It integrates Machine Learning and Data mining libraries with minimal or no programming requirements. KNIME is excellent for Data Scientists who do not inherently have proficient programming skills and need to incorporate & process Data for building Machine Learning and other statistical models. Its graphical interface facilitates point-and- click analysis and modeling. • 23. Looker • Looker is one of the cloud-based business intelligence and Data analytics tools. It automatically generates Data model to scan Data schemas and connect tables with Data sources. Through an integrated code editor, it allows Data engineers to modify the created models.
  • 56. DATA ANALYTICS TOOLS • 24. RapidMiner • RapidMiner is a Data analytics software that caters to all the technology users need, from integration, cleaning to Data transformation before they run predictive analytics and build statistical models. Nearly all this is done by the users through a simple graphical interface. RapidMiner can also be expanded by using R and Python and various third-party plugins available on the organization’s marketplace. • 25. Oracle Analytics Cloud • Oracle Analytics Cloud is another suite of Cloud-based business intelligence and Data analytics tools. It focuses on helping big corporations to transform their legacy systems into a digital cloud platform. Users leverage its wide range of analytical features, from basic visualizations to Machine Learning algorithms for deriving Data insights.
  • 57. 1.10 APPLICATIONS OF DATA ANALYTICS • APPLICATION OF ANALYTICS IN DIFFERENT FIELDS • Not just one or two, the use of data analytics is in every field you can see around. Be it from Online shopping, or Hi-tech industries, or the government, everyone uses data analytics to help them in decision making, budgeting, planning, etc. The data analytics are employed in various places like: 1. Transportation 2. Logistics and Delivery 3. Web Search or Internet Web Results 4. Manufacturing 5. Security 6. Education 7. Healthcare 8. Military
  • 59. 1.11 DATA ANALYTICS LIFECYCLE • Data Analytics Lifecycle defines the roadmap of how information is generated, collected, processed, used, and analyzed to achieve business goals. • It offers a systematic way to manage data for converting it into information that can be used to fulfill organizational and project goals. • The process provides the direction and methods to extract information from the data and proceed in the right direction to accomplish business goals.
  • 60. 1.12 NEED OF DATA ANALYTICS LIFECYCLE • The Data analytic lifecycle is designed for Big Data problems and data science projects. • The cycle is iterative to represent real project. To address the distinct requirements for performing analysis on Big Data, step – by – step methodology is needed to organize the activities and tasks involved with acquiring, processing, analyzing, and repurposing data.
  • 61. 1.13 KEY ROLES FOR SUCCESSFUL ANALYTIC PROJECTS • Key Roles for a Data analytics project : • Business User :The business user is the one who understands the main area of the project and is also basically benefited from the results. • This user gives advice and consult the team working on the project about the value of the results obtained and how the operations on the outputs are done. • The business manager, line manager, or deep subject matter expert in the project mains fulfills this role.
  • 62. KEY ROLES FOR A DATA ANALYTICS PROJECT • Project Sponsor : – The Project Sponsor is the one who is responsible to initiate the project. Project Sponsor provides the actual requirements for the project and presents the basic business issue. – He generally provides the funds and measures the degree of value from the final output of the team working on the project. – This person introduce the prime concern and brooms the desired output. •
  • 63. KEY ROLES FOR A DATA ANALYTICS PROJECT • Project Manager :This person ensures that key milestone and purpose of the project is met on time and of the expected quality. • Business Intelligence Analyst :Business Intelligence Analyst provides business domain perfection based on a detailed and deep understanding of the data, key performance indicators (KPIs), key matrix, and business intelligence from a reporting point of view. • This person generally creates fascia and reports and knows about the data feeds and sources.
  • 64. KEY ROLES FOR A DATA ANALYTICS PROJECT • Database Administrator (DBA) :DBA facilitates and arrange the database environment to support the analytics need of the team working on a project. • His responsibilities may include providing permission to key databases or tables and making sure that the appropriate security stages are in their correct places related to the data repositories or not.
  • 65. KEY ROLES FOR A DATA ANALYTICS PROJECT • Data Engineer :Data engineer grasps deep technical skills to assist with tuning SQL queries for data management and data extraction and provides support for data intake into the analytic sandbox. • The data engineer works jointly with the data scientist to help build data in correct ways for analysis.
  • 66. KEY ROLES FOR A DATA ANALYTICS PROJECT • Data Scientist : – Data scientist facilitates with the subject matter expertise for analytical techniques, data modelling, and applying correct analytical techniques for a given business issues. – He ensures overall analytical objectives are met. – Data scientists outline and apply analytical methods and proceed towards the data available for the concerned project.
  • 67. 1.14 VARIOUS PHASES OF DATA ANALYTICS LIFECYCLE • Data discovery, • Data preparation, • Data model planning, • Data model building, • Communicating results, and • Operationalization.
  • 68. PHASE 1: DATA DISCOVERY AND FORMATION • During this process, the team learns about the business domain and checks whether the business unit or organization has worked on similar projects to refer to any learning’s. • In this phase, the team also evaluates technology, people, data, and time. • For example, while dealing with a small dataset, the team can use Excel.
  • 69. PHASE 2: DATA PREPARATION AND PROCESSING • In this phase, the experts’ focus shifts from business requirements to information requirements. • One of the essential aspects of this phase is ensuring data availability for processing. • The stage encompasses the collection, processing, and cleansing of the accumulated data.
  • 70. PHASE 3: DESIGN A MODEL • This phase needs the availability of an analytic sandbox for the team to work with data and perform analytics throughout the project duration. The team can load data in several ways. – Extract, Transform, Load (ETL) – It transforms the data based on a set of business rules loading it into the sandbox. – Extract, Load, Transform (ELT) – It loads the data into the sandbox and then transforms it based on a set of business rules. – Extract, Transform, Load, Transform (ETLT) – It’s the combination of ETL and ELT and transformation levels.
  • 71. PHASE 4: MODEL BUILDING • They use various statistical modeling methods such as • regression techniques, • decision trees, • random forest modeling, and • neural networks and perform a trial run to determine whether it corresponds to the datasets.
  • 72. PHASE 5: RESULT COMMUNICATION AND PUBLICATION • This phase aims to determine whether the project results are a success or failure and start collaborating with significant stakeholders. The team identifies the vital findings of their analysis, measures the associated business value, and creates a summarized narrative to convey the stakeholders’ results.
  • 73. PHASE 6: MEASURING OF EFFECTIVENESS • In this final step, the team presents an in-depth report with coding, briefing, key findings, and technical documents and papers to the stakeholders. Besides this, the data is moved to a live environment and monitored to measure the analysis’s effectiveness. If the findings are in line with the objective, the results and reports are finalized. On the other hand, if they deviate from the set intent, the team moves backward in the lifecycle to any previous phase to change the input and get a different outcome.