SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Data science andbusiness analytics
Dr.M.Inbavalli
Vice Principal & Head Research Department of Computer Science
Marudhar Kesari Jain College for Women
Vaniyambadi-635751
Overview
• Evolution of Data
• Data Science
• Business Analytics
• Applications
• AI, ML, DL, Data science – Relationship
• Tools for Data Science
• Life cycle of data science with case study
• Algorithms for Data Science
• Data Science Research Areas
• Future of Data Science
Data All Around
• Data has become the most abundant thing today
• Explosion of data, in pretty much every domain
• Lots of data is being collected
and warehoused
• Web data, e-commerce
• Financial transactions, bank/credit transactions
• Online trading and purchasing
• Social Network
•Data All Around
• Sensing devices and sensor networks that can monitor everything 24/7 from
temperature to pollution to vital signs
• Increasingly sophisticated smart phones
• Internet, social networks makes it easy to publish data
• Scientific experiments and simulations produce astronomical volumes of data
• Internet of Things(IOT)
• Dataification: taking all aspects of life and turning them into data (e.g., what you
like/enjoy has been turned into a stream of your "likes")
• Data Science – Why all the excitement?
• How Much Data Do We have?
• Data volumes expected to get much worse
• Over 2.5 quintillion bytes of data are created every single day.
How Much Data Do We have?
What can you do with the Traffic Prediction data?
9
Crowdsourcing + physical modeling + sensing + data assimilation
From Institute for Transportation Studies
• How to handle that data?
• Data is just like crude oil. It’s valuable, but if unrefined it cannot really be
used. It has to be changed into gas, plastic, chemicals, etc to create a
valuable entity that drives profitable activity; so data must be broken
down, analyzed for it to have value.
• How to extract interesting actionable insights and scientific knowledge?
•Data Science why excitement?
• Data Science is the science
which uses computer science, statistics
and machine learning, visualization
and human-computer interactions to
collect, clean, integrate, analyze,
visualize, interact with data to create
data products.
• Turn data into data products.
• Data Science why excitement?
Theories and techniques from many fields and disciplines are used to
investigate and analyze a large amount of data to help decision
makers in many industries such as science, engineering, economics,
politics, finance, and education
Computer Science
Pattern recognition, visualization, data warehousing, High performance computing,
Databases, AI
Mathematics
Mathematical Modeling
Statistics
Statistical and Stochastic modeling, Probability.
Data science (DS) is a multidisciplinary field of study with goal
to address the challenges in big data
• Data Science why excitement?(cont)
• Data Science blend of tools, algorithms, and machine learning principles with the goal to discover
hidden patterns from the raw data.
• focus on statistical modeling, machine learning, management and analysis of data sets, and data
acquisition.
• Data Science makes use of several statistical procedures
• These procedures range from data transformations, data modeling, statistical operations
(descriptive and inferential statistics) and machine learning modeling.
• In order to gain predictive responses from the models, it is an essential requirement to understand
the underlying patterns of the data model. Furthermore, optimization techniques can be utilized to
meet the business requirements of the user.
•Data Science why excitement?(cont)
• Using various statistical tools, a Data Scientist has to develop models. With the help of
these models, they help their clients in the decision-making process. Furthermore,
these models support demand generation initiatives.
Data Science also covers:
• Data Integration.
• Distributed Architecture.
• Automating Machine learning.
• Data Visualization.
• Dashboards and BI.
• Data Engineering.
• Deployment in production mode
• Automated, data-driven decisions.
Example Search
• Google revenue around $50 bn/year from marketing, 97% of the companies
revenue.
• Sponsored search uses an action – a pure competition for marketers trying to
win access to consumers.
• In other words, a competition for models of consumers – their likelihood of
responding to the ad – and of determining the right bid for the item.
• There are around 30 billion search requests a month. Perhaps a trillion events
of history between search providers.
• Google Adwords and Adsense
Data Science Applications
• Transaction Databases  Recommender systems (NetFlix), Fraud Detection
(Security and Privacy)
• Wireless Sensor Data  Smart Home, Real-time Monitoring, Internet of Things
• Text Data, Social Media Data  Product Review and Consumer Satisfaction
(Facebook, Twitter, LinkedIn), E-discovery
• Software Log Data  Automatic Trouble Shooting (Splunk)
• Genotype and Phenotype Data  Epic, 23andme, Patient-Centered Care,
Personalized Medicine
• Other Applications
• Bank -make smarter decisions through fraud detection, management of
customer data, risk modeling, real-time predictive analytics, customer
segmentation, etc.
• In case of fraud detection -- a credit card, insurance, and accounting.
• able to analyze investment patterns and cycles of customers and suggest you
several offers that suit you accordingly.
• ability to risk modeling through data science through which they can assess their
overall performance.
• In real-time and predictive analytics, banks use machine learning algorithms to
improve their analytics strategy
Other Applications
• customer sentiment analysis techniques
can boost the social media interaction, boost their feedback and analyze
customer reviews.
Manufacturing-IOT
enabled the companies to predict potential problems, monitor systems
and analyze the continuous stream of data.
Uber is using data science for price optimization and providing better
experiences to their customers.
Using powerful predictive tools, they accurately predict the price based
on parameters like a weather pattern, availability of transport,
customers, etc.
Data
• Measureable units of information gathered or captured from activity of people, places
and things.
• data is generated from different sources like financial logs, text files, multimedia forms,
sensors, and instruments.
• need to understand
• which data to use
• how to organize the data, and so on.
• prepare the structured, and the unstructured data to be used by the Analytics team for
model building purpose.
• Types of Data
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
What do we do with the Data ?
• Aggregation and Statistics
• Data warehousing and OLAP
• Indexing, Searching, and Querying
• Keyword based search
• Pattern matching (XML/RDF)
• Knowledge discovery
• Data Mining
• Statistical Modeling
• Example –Data Science
• Companies learn your secrets, shopping patterns, and preferences
• Eg. can we know if a child likes animation games , even if they doesn’t
want us to know?
• Building, and maintain a Data warehouse is a key skill which a Data
Engineer must have.
• They build pipelines which extract data from multiple sources and then manipulates
it to make it usable.
• Business analytics (BA) is the practice of iterative, methodical exploration of an
organization's data, with an emphasis on statistical analysis.
Business analytics is used by companies committed to data-driven decision-making.
• BA activities must be anchored to a strategically relevant business question to be
answered by using data analysis.
• Data Science and Business Analytics
• Data science or analytics is the process of deriving insights from data in order to
make optimal decisions.
• data science and analytics techniques such as basic statistics, regressions, simulation
and optimization modeling, data mining and machine learning, text analytics,
artificial intelligence and visualizations.
• Data science focuses on data modelling and data warehousing to track the ever-
growing data set. The information extracted through data science applications are
used to guide business processes and reach organisational goals.
Databases Data Science
Data Volume Modest Massive
Examples Bank records,
Personnel records,
Census,
Medical records
Online clicks,
GPS logs,
Tweets,
Building sensor readings
Priorities Consistency,
Error recovery,
Auditability
Speed,
Availability,
Query richness
Structured Strongly (Schema) Weakly or none (Text)
Properties Transactions, ACID* CAP* theorem (2/3),
eventual consistency
Realizations SQL NoSQL:
MongoDB, CouchDB,
Hbase, Cassandra, Riak, Memcached,
Apache River, …
Features Business Intelligence (BI) Data Science
Data Sources
Structured
(Usually SQL, often Data Warehouse)
Both Structured and Unstructured
( logs, cloud data, SQL, NoSQL, text)
Approach Statistics and Visualization
Statistics, Machine Learning, Graph Analysis,
Neuro- linguistic Programming (NLP)
Focus Past and Present Present and Future
Tools Pentaho, Microsoft BI, QlikView, R Rapid Miner, BigML, Weka, R
Data Science ML AI
Tools -1. SAS2. Tableau3. Apache
Spark4. MATLAB, SQL,
1. Amazon Lex2. IBM Watson
Studio3. Microsoft Azure ML Studio
1.TensorFlow2. Scikit Learn
3. Keras, Amazon lex, Google cloud
platform, Data robot.
Data Science deals with structured
and unstructured data.
Machine Learning uses statistical
models.
Artificial Intelligence uses logic and
decision trees.
Fraud Detection and Healthcare
analysis are popular examples of
Data Science.
Recommendation Systems such as
Spotify, and Facial Recognition are
popular examples.
Chatbots, and Voice assistants are
popular applications of AI.
The main applications of Data
Science are credit card fraud, ATM
theft, disease prediction, pattern
identification etc.
The main applications of machine
learning are Online recommender
system, Google search
algorithms, Facebook auto friend
tagging suggestions, etc.
The main applications of AI are Siri,
customer support using catboats,
Expert System, Online game playing,
intelligent humanoid robot, etc.
• Relationship between Data Science, Artificial Intelligence and Machine
Learning
• Machine Learning for Predictive Reporting
• to study transactional data to make valuable predictions .
• Also known as supervised learning
• implemented to suggest the most effective courses of action for any company.
Machine Learning for Pattern Discovery
• set parameters in various data reports
• unsupervised learning where there are no pre-decided parameters.
Artificial Intelligence represents an action planned feedback of
perception.
Perception > Planning > Action > Feedback of Perception
Data Science uses different parts of this pattern or loop to solve specific
problems
• For instance, in the first step, i.e. Perception,
• data scientists try to identify patterns with the help of the data.
• planning, there are two aspects:
• Finding all possible solutions
• Finding the best solution among all solutions
• machine learning by taking it as a standalone subject- understood in the context
of its environment.
AI is the tool that helps data science get results and the solutions for specific
problems. However, machine learning is what helps in achieving that goal
Example : Google’s search engine is a product of data science
It uses predictive analysis, a system used by artificial intelligence, to deliver
intelligent results to the users
• Tools for Data Science
• Reporting and Business Intelligence
• Predictive Modelling and Machine Learning
• Artificial Intelligence
• Data Science Tools for Big Data(Volume)
• Data 1GB to 10 GB - Traditional DB Excel, Access, SQl etc.
• >10 GB – Haddop, Hive
• Tools for Handling Variety
• Voluminous
• customer feedback may vary in length, sentiments, and other factors.
• Example for SQL are Oracle, MySQL, SQLite, whereas NoSQL consists of popular
databases like MongoDB, Cassandra, etc.
• These NoSQL databases are seeing huge adoption numbers because of their ability
to scale and handle dynamic data.
.
• Tools for Handling Velocity
• speed at which the data is captured.
• includes both real-time and non-real-time data.
• Example for realtime data
• sensor data collected by self-driving cars- automatic actions
• CCTV
• Stock trading
• Fraud detection for credit card transaction
• Network data – social media (Facebook, Twitter, etc.)
Tools -Apache Kafka- real-time data pipelines.
Apache Storm- process up to 1 Million tuples per second and it is highly scalable
Amazon Kinesis-Licensed and powerful
Apache Flink- high performance, fault tolerance, and efficient memory
management.
Reporting and BI Tools Predictive Analytics and
Machine Learning Tools
Frameworks for Deep
Learning
AI Tools
Excel, QlikView, Tableau ,
Microstrategy, powerBI,
Google
Analytics,Dundas,SISENSE
etc
Python , R, Apache spark,
Julia, Jupyter Notebooks
TensorFlow, Pytroch,
Keras and Caffe
AutoKeras, Google Cloud
AutoML, IBM Watson,
DataRobot, H20’s Driverless
AI, and Amazon’s Lex
SAS, SPSS,MATLAB- Licensed
Lifecycle of Data Science
• Role of Data Scientist
• Identifying the data-analytics problems that offer the greatest opportunities to the
organization
• Determining the correct data sets and variables
• Collecting large sets of structured and unstructured data from disparate sources
• Cleaning and validating the data to ensure accuracy, completeness, and uniformity
• Devising and applying models and algorithms to mine the stores of big data
• Analyzing the data to identify patterns and trends
• Interpreting the data to discover solutions and opportunities
• Communicating findings to stakeholders using visualization and other means
• Phase 1—Discovery
• various specifications, requirements, priorities and required budget.
• the ability to ask the right questions.
• need to frame the business problem and formulate initial hypotheses (IH) to test.
• Phase 2—Data preparation
• data cleaning, transformation, and visualization. This will help you to spot the outliers
and establish a relationship between the variables.----R
• Phase 3—Model planning
• methods and techniques to draw the relationships between variables
• These relationships will set the base for the algorithms in next phase
• apply Exploratory Data Analytics (EDA) using various statistical formulas and
visualization tools.
• R has a complete set of modeling capabilities and provides a good environment for
building interpretive models.
• SQL Analysis services can perform in-database analytics using common data mining
functions and basic predictive models.
• SAS/ACCESS can be used to access data from Hadoop and is used for creating
repeatable and reusable model flow diagrams.
• Phase 4—Model building
• develop datasets for training and testing purposes
• various learning techniques like classification, association and clustering to build the
model.
Example :
1. Classification (decision trees)
2. Clustering (K-means, Fuzzy C-means, Hierarchical Clustering, DBSCAN)
3. Association rules
4. Advanced supervised machine learning algorithms (Naive Bayes, k-NN, SVM)
5. Intro to ensemble learning algorithms (Random Forest, Gradient Boosting)
• Phase 5—Operationalize
• Analyzing the data to identify patterns and trends
• Interpreting results
• deliver final reports, briefings, code and technical documents
• pilot project
• Phase 6—Communicate results
• identify all the key findings, communicate to the stakeholders and determine if the
results of the project are a success or a failure
• Basic statistics
• 1. Random variables, sampling
• 2. Distributions and statistical measures
• 3. Hypothesis testing
Overview of linear algebra
1. Linear algebra and matrix computations
2. Functions, derivatives, convexity
Modeling techniques regression
1. Mathematical modeling process 2. Linear regression 3. Logistic regression
• Data visualization and visual analytics
• 1. Visual analytics 2. Visualizations in Python and visual analytics in IBM Watson Analytics
• Data visualization and visual analytics
• 1. Visual analytics 2. Visualizations in Python and visual analytics in IBM Watson
Analytics
• Data mining and machine learning
• 1. Classification (decision trees) 2. Clustering (K-means, Fuzzy C-means,
Hierarchical Clustering, DBSCAN) 3. Association rules 4. Advanced supervised
machine learning algorithms (Naive Bayes, k-NN, SVM) 5. Intro to ensemble
learning algorithms (Random Forest, Gradient Boosting)
• Simulation modeling 1. Random number generation 2. Monte Carlo simulations 3.
Simulation in Ipython
• Real time example
• Case Study: Diabetes Prevention
• What if we could predict the occurrence of diabetes and take appropriate
measures beforehand to prevent it?
• 1. You can refer to the sample data below.
• Step 1: Discovery
• Attributes:
• npreg – Number of times pregnant
• glucose – Plasma glucose concentration
• bp – Blood pressure
• skin – Triceps skinfold thickness
• bmi – Body mass index
• ped – Diabetes pedigree function
• age – Age
• income – Income
• Step 2 Data Preparation
• once we have the data, we need to clean and prepare the data for data
analysis.
• data has a lot of inconsistencies like missing values, blank columns,
abrupt values and incorrect data format which need to be cleaned.
• we have organized the data into a single table under different
attributes – making it look more structured.
• Step 2(Cont)
• This data has a lot of inconsistencies.
• In the column npreg, “one” is written in words, whereas it should be in the numeric
form like 1.
• In column bp one of the values is 6600 which is impossible (at least for humans) as bp
cannot go up to such huge value.
• Income column is blank and also makes no sense in predicting diabetes.
• Therefore, it is redundant to have it here and should be removed from the table.
• clean and preprocess this data by removing the outliers, filling up the null values and
normalizing the data type. -data preprocessing.
• Finally, we get the clean data which can be used for analysis.
• Step 3 Model Planning
• load the data into the analytical sandbox and apply various statistical functions
• R has functions like describe which gives us the number of missing values and unique
values.
• We can also use the summary function which will give us statistical information like
mean, median, range, min and max values.
• Then, we use visualization techniques like histograms, line graphs, box plots to get a
fair idea of the distribution of data.
• Step 4 Model Building
• supervised learning technique to build a model here.
• Step 5 Deliver the Model
• Check with sample data.
Data :Data tables and data types
○ Operations on tables
○ Basic plotting
○ Tidy data / the ER model
○ Relational Operations
○ SQL
wrangling
○ Data acquisition (load and scrape)
○ EDA Vis / grammar of graphics
○ Data cleaning (text, dates)
○ EDA: Summary statistics
○ Data analysis with optimization (derivatives)
○ Data transformations
○ Missing data
• Modeling
○ Univariate probability and statistics
○ Hypothesis testing
○ Multivariate probablity and statistics (joint and conditional probability, Bayes
thm)
○ Data Analysis with geometry (vectors, inner products, gradients and matrices)
○ Linear regression
○ Logistic regression
○ Gradient descent (batch and stochastic)
○ Trees and random forests
○ K-NN
○ Naïve Bayes
○ Clustering
○ PCA
• Sample Algorithms for Data Science analytics
Regression
• The most popular technique for this algorithm is least of squares. This method
calculates the best-fitting line.
• Based on historical data
Example :
• Weather forecasting
• Assessing risk
Tools
• TensorFlow and PyTorch
• Logistic Regression
• Logistic regression is similar to linear regression, but it is used when the output is
binary (i.e. when outcome can have only two possible values). The prediction for this
final output will be a non-linear S-shaped function called the logistic function, g().
• Graph of a logistic regression curve showing probability of passing an exam versus
hours studying
• Decision Trees
• Decision Trees can be used for both regression and classification tasks.
• Categorical Variable Decision Tree-predict whether a customer will pay his
renewal premium with an insurance company (yes/ no).
• Continuous Variable Decision Tree.-predict customer income based on occupation,
product, and various other variables.
• Example C4.5, CART
• Naive Bayes
• classification technique
• It measures the probability of each class, and the conditional probability for each
class give values of x. This algorithm is used for classification problems to reach a
binary yes/no outcome.
Example:
Text classification/ Spam Filtering/ Sentiment Analysis
Recommendation System
Types
Gaussian Naive Bayes
Multinomial Naive Bayes
Bernoulli
SVM
KNN
Kmeans
Dimensionality Reduction
• ANN
• Feed forward -multilayer perceptrons
• convolution neural networks-classification, object detection, or even
image segmentation,
• hierarchical object extractors.
What do Data Scientists do?
• National Security
• Cyber Security
• Business Analytics
• Engineering
• Healthcare
• And more ….
Data Scientist must posses
• Mathematics and Applied
Mathematics
• Applied Statistics/Data Analysis
• Solid Programming Skills (R,
Python, Julia, SQL)
• Data Mining
• Data Base Storage and
Management
• Machine Learning and
discovery
• Data Science Research Areas
• machine learning.
• artificial intelligence.
• Deep learning
• databases.
• statistics.
• optimization.
• natural language processing.
• computer vision.
• speech processing.
• Privacy
• Ethics
• Energy consumption
• Cloud computing
• IOT
• Cloud
• Social Media
• Block Chain etc.
• Future of Data Science and Analytics
Thank You
?

Weitere ähnliche Inhalte

Was ist angesagt?

Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Simplilearn
 
Data Analytics and Business Intelligence
Data Analytics and Business IntelligenceData Analytics and Business Intelligence
Data Analytics and Business Intelligence
Chris Ortega, MBA
 

Was ist angesagt? (20)

Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 
Business Intelligence Presentation - Data Mining (2/2)
Business Intelligence Presentation - Data Mining (2/2)Business Intelligence Presentation - Data Mining (2/2)
Business Intelligence Presentation - Data Mining (2/2)
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Implementing business intelligence
Implementing business intelligenceImplementing business intelligence
Implementing business intelligence
 
Data analytics
Data analyticsData analytics
Data analytics
 
Introduction To Predictive Analytics Part I
Introduction To Predictive Analytics   Part IIntroduction To Predictive Analytics   Part I
Introduction To Predictive Analytics Part I
 
Business Intelligence - Conceptual Introduction
Business Intelligence - Conceptual IntroductionBusiness Intelligence - Conceptual Introduction
Business Intelligence - Conceptual Introduction
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Ppt
PptPpt
Ppt
 
An introduction to Business intelligence
An introduction to Business intelligenceAn introduction to Business intelligence
An introduction to Business intelligence
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?
 
[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 
Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015
 
Business analytics awareness presentation
Business analytics  awareness presentationBusiness analytics  awareness presentation
Business analytics awareness presentation
 
BUSINESS INTELLIGENCE
BUSINESS INTELLIGENCEBUSINESS INTELLIGENCE
BUSINESS INTELLIGENCE
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Data Analytics and Business Intelligence
Data Analytics and Business IntelligenceData Analytics and Business Intelligence
Data Analytics and Business Intelligence
 

Ähnlich wie Data science and business analytics

Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
ssuser0413ec
 

Ähnlich wie Data science and business analytics (20)

Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyfBig Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
 
Gse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-sharedGse uk-cedrinemadera-2018-shared
Gse uk-cedrinemadera-2018-shared
 
Introductions to Business Analytics
Introductions to Business Analytics Introductions to Business Analytics
Introductions to Business Analytics
 
Data Science
Data ScienceData Science
Data Science
 
A picture is worth a thousand words
A picture is worth a thousand wordsA picture is worth a thousand words
A picture is worth a thousand words
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

Data science and business analytics

  • 1. Data science andbusiness analytics Dr.M.Inbavalli Vice Principal & Head Research Department of Computer Science Marudhar Kesari Jain College for Women Vaniyambadi-635751
  • 2. Overview • Evolution of Data • Data Science • Business Analytics • Applications • AI, ML, DL, Data science – Relationship • Tools for Data Science • Life cycle of data science with case study • Algorithms for Data Science • Data Science Research Areas • Future of Data Science
  • 3. Data All Around • Data has become the most abundant thing today • Explosion of data, in pretty much every domain • Lots of data is being collected and warehoused • Web data, e-commerce • Financial transactions, bank/credit transactions • Online trading and purchasing • Social Network
  • 4. •Data All Around • Sensing devices and sensor networks that can monitor everything 24/7 from temperature to pollution to vital signs • Increasingly sophisticated smart phones • Internet, social networks makes it easy to publish data • Scientific experiments and simulations produce astronomical volumes of data • Internet of Things(IOT) • Dataification: taking all aspects of life and turning them into data (e.g., what you like/enjoy has been turned into a stream of your "likes")
  • 5. • Data Science – Why all the excitement?
  • 6.
  • 7.
  • 8. • How Much Data Do We have? • Data volumes expected to get much worse • Over 2.5 quintillion bytes of data are created every single day.
  • 9. How Much Data Do We have? What can you do with the Traffic Prediction data? 9 Crowdsourcing + physical modeling + sensing + data assimilation From Institute for Transportation Studies
  • 10. • How to handle that data? • Data is just like crude oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so data must be broken down, analyzed for it to have value. • How to extract interesting actionable insights and scientific knowledge?
  • 11. •Data Science why excitement? • Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products. • Turn data into data products.
  • 12. • Data Science why excitement? Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education Computer Science Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI Mathematics Mathematical Modeling Statistics Statistical and Stochastic modeling, Probability. Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data
  • 13. • Data Science why excitement?(cont) • Data Science blend of tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data. • focus on statistical modeling, machine learning, management and analysis of data sets, and data acquisition. • Data Science makes use of several statistical procedures • These procedures range from data transformations, data modeling, statistical operations (descriptive and inferential statistics) and machine learning modeling. • In order to gain predictive responses from the models, it is an essential requirement to understand the underlying patterns of the data model. Furthermore, optimization techniques can be utilized to meet the business requirements of the user.
  • 14. •Data Science why excitement?(cont) • Using various statistical tools, a Data Scientist has to develop models. With the help of these models, they help their clients in the decision-making process. Furthermore, these models support demand generation initiatives. Data Science also covers: • Data Integration. • Distributed Architecture. • Automating Machine learning. • Data Visualization. • Dashboards and BI. • Data Engineering. • Deployment in production mode • Automated, data-driven decisions.
  • 15. Example Search • Google revenue around $50 bn/year from marketing, 97% of the companies revenue. • Sponsored search uses an action – a pure competition for marketers trying to win access to consumers. • In other words, a competition for models of consumers – their likelihood of responding to the ad – and of determining the right bid for the item. • There are around 30 billion search requests a month. Perhaps a trillion events of history between search providers. • Google Adwords and Adsense
  • 16. Data Science Applications • Transaction Databases  Recommender systems (NetFlix), Fraud Detection (Security and Privacy) • Wireless Sensor Data  Smart Home, Real-time Monitoring, Internet of Things • Text Data, Social Media Data  Product Review and Consumer Satisfaction (Facebook, Twitter, LinkedIn), E-discovery • Software Log Data  Automatic Trouble Shooting (Splunk) • Genotype and Phenotype Data  Epic, 23andme, Patient-Centered Care, Personalized Medicine
  • 17. • Other Applications • Bank -make smarter decisions through fraud detection, management of customer data, risk modeling, real-time predictive analytics, customer segmentation, etc. • In case of fraud detection -- a credit card, insurance, and accounting. • able to analyze investment patterns and cycles of customers and suggest you several offers that suit you accordingly. • ability to risk modeling through data science through which they can assess their overall performance. • In real-time and predictive analytics, banks use machine learning algorithms to improve their analytics strategy
  • 18. Other Applications • customer sentiment analysis techniques can boost the social media interaction, boost their feedback and analyze customer reviews. Manufacturing-IOT enabled the companies to predict potential problems, monitor systems and analyze the continuous stream of data. Uber is using data science for price optimization and providing better experiences to their customers. Using powerful predictive tools, they accurately predict the price based on parameters like a weather pattern, availability of transport, customers, etc.
  • 19. Data • Measureable units of information gathered or captured from activity of people, places and things. • data is generated from different sources like financial logs, text files, multimedia forms, sensors, and instruments. • need to understand • which data to use • how to organize the data, and so on. • prepare the structured, and the unstructured data to be used by the Analytics team for model building purpose. • Types of Data • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data • Social Network, Semantic Web (RDF), … • Streaming Data
  • 20. What do we do with the Data ? • Aggregation and Statistics • Data warehousing and OLAP • Indexing, Searching, and Querying • Keyword based search • Pattern matching (XML/RDF) • Knowledge discovery • Data Mining • Statistical Modeling • Example –Data Science • Companies learn your secrets, shopping patterns, and preferences • Eg. can we know if a child likes animation games , even if they doesn’t want us to know? • Building, and maintain a Data warehouse is a key skill which a Data Engineer must have.
  • 21. • They build pipelines which extract data from multiple sources and then manipulates it to make it usable. • Business analytics (BA) is the practice of iterative, methodical exploration of an organization's data, with an emphasis on statistical analysis. Business analytics is used by companies committed to data-driven decision-making. • BA activities must be anchored to a strategically relevant business question to be answered by using data analysis.
  • 22. • Data Science and Business Analytics • Data science or analytics is the process of deriving insights from data in order to make optimal decisions. • data science and analytics techniques such as basic statistics, regressions, simulation and optimization modeling, data mining and machine learning, text analytics, artificial intelligence and visualizations. • Data science focuses on data modelling and data warehousing to track the ever- growing data set. The information extracted through data science applications are used to guide business processes and reach organisational goals.
  • 23.
  • 24. Databases Data Science Data Volume Modest Massive Examples Bank records, Personnel records, Census, Medical records Online clicks, GPS logs, Tweets, Building sensor readings Priorities Consistency, Error recovery, Auditability Speed, Availability, Query richness Structured Strongly (Schema) Weakly or none (Text) Properties Transactions, ACID* CAP* theorem (2/3), eventual consistency Realizations SQL NoSQL: MongoDB, CouchDB, Hbase, Cassandra, Riak, Memcached, Apache River, …
  • 25. Features Business Intelligence (BI) Data Science Data Sources Structured (Usually SQL, often Data Warehouse) Both Structured and Unstructured ( logs, cloud data, SQL, NoSQL, text) Approach Statistics and Visualization Statistics, Machine Learning, Graph Analysis, Neuro- linguistic Programming (NLP) Focus Past and Present Present and Future Tools Pentaho, Microsoft BI, QlikView, R Rapid Miner, BigML, Weka, R
  • 26.
  • 27.
  • 28. Data Science ML AI Tools -1. SAS2. Tableau3. Apache Spark4. MATLAB, SQL, 1. Amazon Lex2. IBM Watson Studio3. Microsoft Azure ML Studio 1.TensorFlow2. Scikit Learn 3. Keras, Amazon lex, Google cloud platform, Data robot. Data Science deals with structured and unstructured data. Machine Learning uses statistical models. Artificial Intelligence uses logic and decision trees. Fraud Detection and Healthcare analysis are popular examples of Data Science. Recommendation Systems such as Spotify, and Facial Recognition are popular examples. Chatbots, and Voice assistants are popular applications of AI. The main applications of Data Science are credit card fraud, ATM theft, disease prediction, pattern identification etc. The main applications of machine learning are Online recommender system, Google search algorithms, Facebook auto friend tagging suggestions, etc. The main applications of AI are Siri, customer support using catboats, Expert System, Online game playing, intelligent humanoid robot, etc.
  • 29. • Relationship between Data Science, Artificial Intelligence and Machine Learning • Machine Learning for Predictive Reporting • to study transactional data to make valuable predictions . • Also known as supervised learning • implemented to suggest the most effective courses of action for any company. Machine Learning for Pattern Discovery • set parameters in various data reports • unsupervised learning where there are no pre-decided parameters. Artificial Intelligence represents an action planned feedback of perception. Perception > Planning > Action > Feedback of Perception Data Science uses different parts of this pattern or loop to solve specific problems
  • 30. • For instance, in the first step, i.e. Perception, • data scientists try to identify patterns with the help of the data. • planning, there are two aspects: • Finding all possible solutions • Finding the best solution among all solutions • machine learning by taking it as a standalone subject- understood in the context of its environment. AI is the tool that helps data science get results and the solutions for specific problems. However, machine learning is what helps in achieving that goal Example : Google’s search engine is a product of data science It uses predictive analysis, a system used by artificial intelligence, to deliver intelligent results to the users
  • 31.
  • 32. • Tools for Data Science • Reporting and Business Intelligence • Predictive Modelling and Machine Learning • Artificial Intelligence • Data Science Tools for Big Data(Volume) • Data 1GB to 10 GB - Traditional DB Excel, Access, SQl etc. • >10 GB – Haddop, Hive • Tools for Handling Variety
  • 33. • Voluminous • customer feedback may vary in length, sentiments, and other factors. • Example for SQL are Oracle, MySQL, SQLite, whereas NoSQL consists of popular databases like MongoDB, Cassandra, etc. • These NoSQL databases are seeing huge adoption numbers because of their ability to scale and handle dynamic data. .
  • 34. • Tools for Handling Velocity • speed at which the data is captured. • includes both real-time and non-real-time data. • Example for realtime data • sensor data collected by self-driving cars- automatic actions • CCTV • Stock trading • Fraud detection for credit card transaction • Network data – social media (Facebook, Twitter, etc.) Tools -Apache Kafka- real-time data pipelines. Apache Storm- process up to 1 Million tuples per second and it is highly scalable Amazon Kinesis-Licensed and powerful Apache Flink- high performance, fault tolerance, and efficient memory management.
  • 35. Reporting and BI Tools Predictive Analytics and Machine Learning Tools Frameworks for Deep Learning AI Tools Excel, QlikView, Tableau , Microstrategy, powerBI, Google Analytics,Dundas,SISENSE etc Python , R, Apache spark, Julia, Jupyter Notebooks TensorFlow, Pytroch, Keras and Caffe AutoKeras, Google Cloud AutoML, IBM Watson, DataRobot, H20’s Driverless AI, and Amazon’s Lex SAS, SPSS,MATLAB- Licensed
  • 36. Lifecycle of Data Science
  • 37. • Role of Data Scientist • Identifying the data-analytics problems that offer the greatest opportunities to the organization • Determining the correct data sets and variables • Collecting large sets of structured and unstructured data from disparate sources • Cleaning and validating the data to ensure accuracy, completeness, and uniformity • Devising and applying models and algorithms to mine the stores of big data • Analyzing the data to identify patterns and trends • Interpreting the data to discover solutions and opportunities • Communicating findings to stakeholders using visualization and other means
  • 38. • Phase 1—Discovery • various specifications, requirements, priorities and required budget. • the ability to ask the right questions. • need to frame the business problem and formulate initial hypotheses (IH) to test. • Phase 2—Data preparation • data cleaning, transformation, and visualization. This will help you to spot the outliers and establish a relationship between the variables.----R • Phase 3—Model planning • methods and techniques to draw the relationships between variables • These relationships will set the base for the algorithms in next phase • apply Exploratory Data Analytics (EDA) using various statistical formulas and visualization tools.
  • 39. • R has a complete set of modeling capabilities and provides a good environment for building interpretive models. • SQL Analysis services can perform in-database analytics using common data mining functions and basic predictive models. • SAS/ACCESS can be used to access data from Hadoop and is used for creating repeatable and reusable model flow diagrams.
  • 40. • Phase 4—Model building • develop datasets for training and testing purposes • various learning techniques like classification, association and clustering to build the model. Example : 1. Classification (decision trees) 2. Clustering (K-means, Fuzzy C-means, Hierarchical Clustering, DBSCAN) 3. Association rules 4. Advanced supervised machine learning algorithms (Naive Bayes, k-NN, SVM) 5. Intro to ensemble learning algorithms (Random Forest, Gradient Boosting)
  • 41.
  • 42. • Phase 5—Operationalize • Analyzing the data to identify patterns and trends • Interpreting results • deliver final reports, briefings, code and technical documents • pilot project • Phase 6—Communicate results • identify all the key findings, communicate to the stakeholders and determine if the results of the project are a success or a failure
  • 43. • Basic statistics • 1. Random variables, sampling • 2. Distributions and statistical measures • 3. Hypothesis testing Overview of linear algebra 1. Linear algebra and matrix computations 2. Functions, derivatives, convexity Modeling techniques regression 1. Mathematical modeling process 2. Linear regression 3. Logistic regression • Data visualization and visual analytics • 1. Visual analytics 2. Visualizations in Python and visual analytics in IBM Watson Analytics
  • 44. • Data visualization and visual analytics • 1. Visual analytics 2. Visualizations in Python and visual analytics in IBM Watson Analytics • Data mining and machine learning • 1. Classification (decision trees) 2. Clustering (K-means, Fuzzy C-means, Hierarchical Clustering, DBSCAN) 3. Association rules 4. Advanced supervised machine learning algorithms (Naive Bayes, k-NN, SVM) 5. Intro to ensemble learning algorithms (Random Forest, Gradient Boosting) • Simulation modeling 1. Random number generation 2. Monte Carlo simulations 3. Simulation in Ipython
  • 45. • Real time example • Case Study: Diabetes Prevention • What if we could predict the occurrence of diabetes and take appropriate measures beforehand to prevent it? • 1. You can refer to the sample data below. • Step 1: Discovery • Attributes: • npreg – Number of times pregnant • glucose – Plasma glucose concentration • bp – Blood pressure • skin – Triceps skinfold thickness • bmi – Body mass index • ped – Diabetes pedigree function • age – Age • income – Income
  • 46. • Step 2 Data Preparation • once we have the data, we need to clean and prepare the data for data analysis. • data has a lot of inconsistencies like missing values, blank columns, abrupt values and incorrect data format which need to be cleaned. • we have organized the data into a single table under different attributes – making it look more structured.
  • 47. • Step 2(Cont) • This data has a lot of inconsistencies. • In the column npreg, “one” is written in words, whereas it should be in the numeric form like 1. • In column bp one of the values is 6600 which is impossible (at least for humans) as bp cannot go up to such huge value. • Income column is blank and also makes no sense in predicting diabetes. • Therefore, it is redundant to have it here and should be removed from the table. • clean and preprocess this data by removing the outliers, filling up the null values and normalizing the data type. -data preprocessing. • Finally, we get the clean data which can be used for analysis.
  • 48. • Step 3 Model Planning • load the data into the analytical sandbox and apply various statistical functions • R has functions like describe which gives us the number of missing values and unique values. • We can also use the summary function which will give us statistical information like mean, median, range, min and max values. • Then, we use visualization techniques like histograms, line graphs, box plots to get a fair idea of the distribution of data.
  • 49. • Step 4 Model Building • supervised learning technique to build a model here.
  • 50.
  • 51. • Step 5 Deliver the Model • Check with sample data. Data :Data tables and data types ○ Operations on tables ○ Basic plotting ○ Tidy data / the ER model ○ Relational Operations ○ SQL wrangling ○ Data acquisition (load and scrape) ○ EDA Vis / grammar of graphics ○ Data cleaning (text, dates) ○ EDA: Summary statistics ○ Data analysis with optimization (derivatives) ○ Data transformations ○ Missing data
  • 52. • Modeling ○ Univariate probability and statistics ○ Hypothesis testing ○ Multivariate probablity and statistics (joint and conditional probability, Bayes thm) ○ Data Analysis with geometry (vectors, inner products, gradients and matrices) ○ Linear regression ○ Logistic regression ○ Gradient descent (batch and stochastic) ○ Trees and random forests ○ K-NN ○ Naïve Bayes ○ Clustering ○ PCA
  • 53. • Sample Algorithms for Data Science analytics Regression • The most popular technique for this algorithm is least of squares. This method calculates the best-fitting line. • Based on historical data Example : • Weather forecasting • Assessing risk Tools • TensorFlow and PyTorch
  • 54. • Logistic Regression • Logistic regression is similar to linear regression, but it is used when the output is binary (i.e. when outcome can have only two possible values). The prediction for this final output will be a non-linear S-shaped function called the logistic function, g(). • Graph of a logistic regression curve showing probability of passing an exam versus hours studying
  • 55. • Decision Trees • Decision Trees can be used for both regression and classification tasks. • Categorical Variable Decision Tree-predict whether a customer will pay his renewal premium with an insurance company (yes/ no). • Continuous Variable Decision Tree.-predict customer income based on occupation, product, and various other variables. • Example C4.5, CART • Naive Bayes • classification technique • It measures the probability of each class, and the conditional probability for each class give values of x. This algorithm is used for classification problems to reach a binary yes/no outcome.
  • 56. Example: Text classification/ Spam Filtering/ Sentiment Analysis Recommendation System Types Gaussian Naive Bayes Multinomial Naive Bayes Bernoulli SVM KNN Kmeans Dimensionality Reduction
  • 57. • ANN • Feed forward -multilayer perceptrons • convolution neural networks-classification, object detection, or even image segmentation, • hierarchical object extractors.
  • 58. What do Data Scientists do? • National Security • Cyber Security • Business Analytics • Engineering • Healthcare • And more ….
  • 59. Data Scientist must posses • Mathematics and Applied Mathematics • Applied Statistics/Data Analysis • Solid Programming Skills (R, Python, Julia, SQL) • Data Mining • Data Base Storage and Management • Machine Learning and discovery
  • 60. • Data Science Research Areas • machine learning. • artificial intelligence. • Deep learning • databases. • statistics. • optimization. • natural language processing. • computer vision. • speech processing. • Privacy • Ethics • Energy consumption • Cloud computing • IOT • Cloud • Social Media • Block Chain etc.
  • 61.
  • 62. • Future of Data Science and Analytics