SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Talk on Big Data Driven Solutions to
Combat Covid’ 19
National Level Webinar at
Ethiraj College for Women (Autonomous),
Chennai.
Dr.S.Balakrishnan,
Professor and Head,
Department of Computer Science and Business Systems,
Sri Krishna College of Engineering and Technology,
Coimbatore, Tamilnadu.
1
OUTLINE
 Introduction
 Big Data Preparation
 Types of Tools Used in Big-data
 Top Big Data Technologies
 Research Projects
INTRODUCTION
 Big Data may well be the Next Big Thing in the IT world.
 Big data burst upon the scene in the first decade of the
21st century.
 The first organizations to embrace it were online and startup
firms. Firms like Google, eBay, LinkedIn, and Facebook
were built around big data from the beginning.
 Like many new information technologies,
 big data can bring about dramatic cost reductions,
 Substantial improvements in the time required to perform a
computing task, or
 new product and service offerings.
3
WHAT IS BIG DATA?
 ‘Big Data’ is similar to ‘small data’, but bigger in size.
 but having data bigger it requires different approaches:
 Techniques, tools and architecture
 an aim to solve new problems or old problems in a better way
 Big Data generates value from the storage and processing of
very large quantities of digital information that cannot be
analyzed with traditional computing techniques.
4
WHAT IS BIG DATA
 Walmart handles more than 1 million customer
transactions every hour.
• Facebook handles 40 billion photos from its user base.
• Decoding the human genome originally took 10 years to process;
now it can be achieved in one week.
5
SOME MAKE IT 4V’S
6
7
8
WHY BIG DATA
9
 Growth of Big Data is needed
– Increase of storage capacities
– Increase of processing power
– Availability of data(different data types)
WHY BIG DATA
10
• FB generates 10TB
daily
• Twitter generates 7TB of
data Daily
• IBM claims 90% of
today’s stored data was
generated in just the
last two years.
HOW IS BIG DATA DIFFERENT?
11
1) Automatically generated by a machine (e.g. Sensor
embedded in an engine)
2) Typically an entirely new source of data (e.g. Use of the
internet)
3) Not designed to be friendly (e.g. Text
streams)
14
BIG DATA SOURCES
12
Users
Application
Systems
Sensors
Large and growing files
(Big data files)
DATA GENERATION POINTS
EXAMPLES
13
Mobile Devices
Microphones
Readers/Scanners
Science facilities
Programs/ Software
Social Media
Cameras
THE STRUCTURE OF BIG DATA
14
 Structured
• Most traditional data
sources
 Semi-structured
• Many sources of big data
 Unstructured
• Video data, audio data
11
BIG DATA TECHNOLOGY
15
BIG DATA PREPARATION
Perception vs. Reality
 The perception is that you spend most of your time on
analytics.
 But in reality, you will devote much more time and effort
on importing, profiling, cleansing, repairing,
standardizing, and enriching your data.
16
DATA CLEANSING
 Data cleaning is the process of detecting and
correcting (or removing) corrupt or inaccurate records
from a record set, table, or database.
 Data cleansing may be performed interactively with
data wrangling tools, or as batch processing through
scripting.
 After cleansing, a data set will be consistent with other
similar data sets in the system
17
DATA CLEANSING VS VALIDATION
Cleansing
 Data cleansing may
involve removing
typographical errors or
validating and
correcting values
against a known list of
entities
Validation
 The validation may be
strict (such as rejecting any
address that does not have
a valid postal code) or
fuzzy (such as correcting
records that partially match
existing, known records).
18
 Data cleansing may also involve activities like,
harmonization of data, and standardization of data.
 For example, harmonization of short codes (st, rd,
etc.) to actual words (street, road, etcetera).
 Standardization of data is a means of changing a
reference data set to a new standard, ex, use of
standard codes.
19
WHY DATA PREPARATION IS NEEDED
 It significantly reduces the amount of time needed to
ingest and prepare new data sets for multiple
downstream processes.
 Also it shapes and improves your business data, and
render your ecosystem simple, scalable, and
automated.
20
DATA BEFORE PROCESSING
 You work with a mishmash of data sources.
 Your content will be inconsistent, incomplete and in a
variety of formats.
 It takes you weeks to process your data and write
custom scripts to clean up the mess.
 It needs efficient strategy to harvest and analyze data
from social media and sales transactions.
 You will have only a vague idea of the categories of
information that your data might provide.
21
DATA AFTER PROCESSING
 Provides you with a large set of data repair,
transformation, and enrichment options that require zero
coding or scripting.
 Enables you to see data transformations and the result
of script automation in real time with a set of smart and
interactive tools and features
22
PREPARATION LIFE CYCLE
23
INGEST
 What are your data sources?
 Are they office documents, social media, or click stream
logs? If so, you need to ingest your data before you can
effectively analyze and enrich it.
 To make sense of all the data you have, you must
define a structure and correlate the disparate data sets.
 This important step involves both understanding and
standardizing your data.
24
WHAT YOU CAN DO TO INGEST AND MEND YOUR DATA
 Statistical Profiling: Create standard statistical
analysis of numerical data, and frequency and term
analysis of text data.
 Process: Handle multiple formats of data sources,
whether their content is structured, semi-structured, or
unstructured.
 Cleanse: Remove nonessential characters and
standardize date formats.
 Repair: Find and fix inconsistencies.
 Detect Schema: Identify schema and metadata that is
explicitly defined in headers, fields, or tags.
 Identify Duplicates: Find and flag duplicates in your
data so you can reduce the size of your data pool. 25
 After you’ve cleansed your data, you can leverage any
patterns and knowledge-based classifications to understand
the domains found in your data sets.
 Use the wide variety of known categories and vast array of
reference data to analyze and recognize content without
relying on any metadata.
 After classification of data sets, enrich your data sets with
related entities from the reference knowledge service, and
extract embedded entities found in your data. This
semantically enriches and correlates your data.
ENRICH
26
PUBLISH
27
GOVERN
 As you ingest, enrich, and publish your data, Cloud Service providers
provides a user interface driven, intuitive Dashboard page to monitor
all transform activity on your data sets.
28
AUTOMATE
Automate the Process
 To automate the process.
 First, you can use the scheduler to set your transformations to
run on a daily, weekly, or monthly basis against a pre-
determined data source.
 Second, By using APIs you can automate the entire data
preparation process, from file movement to preparation to
publishing.
29
TYPES OF TOOLS USED IN BIG-DATA
30
 Where processing is hosted?
Distributed Servers / Cloud (e.g. Amazon EC2)
 Where data is stored?
Distributed Storage (e.g. Amazon S3)
 What is the programming model?
Distributed Processing (e.g. MapReduce)
 How data is stored & indexed?
High-performance schema-free databases (e.g. MongoDB)
 What operations are performed on data?
Analytic / Semantic Processing
TYPES OF BIG DATA TECHNOLOGIES
 Big Data Technology is mainly classified into
two types:
 Operational Big Data Technologies
 all about the normal day to day data that we generate.
 Analytical Big Data Technologies
 like the advanced version of Big Data Technologies.
 little complex than the Operational Big Data.
31
A FEW EXAMPLES OF OPERATIONAL BIG DATA
TECHNOLOGIES ARE AS FOLLOWS:
32
FEW EXAMPLES OF ANALYTICAL BIG DATA
TECHNOLOGIES ARE AS FOLLOWS:
33
34
BIG DATA ANALYTICS AND DATA SCIENCES
TOP BIG DATA TECHNOLOGIES
 Top big data technologies are divided into 4 fields which
are classified as follows:
 Data Storage
 Data Mining
 Data Analytics
 Data Visualization
35
36
DATA STORAGE - HADOOP
 Hadoop Framework was designed to store and process data
in a Distributed Data Processing Environment with
commodity hardware with a simple programming model.
 Store and Analyse the data present in different machines with
High Speeds and Low Costs.
 Developed by: Apache Software Foundation in the year
2011 10th of Dec.
 Written in: JAVA
 Current stable version: Hadoop 3.11 37
COMPANIES USING HADOOP:
38
COMPANIES USING MONGODB:
COMPANIES USING RAINSTOR:
DATA MINING - PRESTO
 open source Distributed SQL Query Engine for
running Interactive Analytic Queries against data
sources of all sizes ranging from Gigabytes to
Petabytes.
 allows querying data in Hive, Cassandra, Relational
Databases and Proprietary Data Stores.
Developed by: Apache Foundation in the year 2013.
Written in: JAVA
Current stable version: Presto 0.22

39
COMPANIES USING PRESTO:
40
DATA ANALYTICS - KAFKA
 Apache Kafka is a Distributed Streaming platform.
 A streaming platform has Three Key Capabilities that are as
follows:
 Publisher
 Subscriber
 Consumer
 This is similar to a Message Queue or an Enterprise
Messaging System.
 Developed by: Apache Software Foundation in the year 2011
 Written in: Scala, JAVA
 Current stable version: Apache Kafka 2.2.0
41
COMPANIES USING KAFKA:
42
OTHER DATA ANALYTICS TOOLS
SPLUNK
KNIME
SPARK
R-LANGUAGE
BLOCKCHAIN
DATA VISUALIZATION - TABLEAU
 Tableau is a Powerful and Fastest growing Data
Visualization tool used in the Business Intelligence Industry.
 Data analysis is very fast with Tableau and the
Visualizations created are in the form of Dashboards and
Worksheets.
 Developed by: TableAU 2013 May 17th
 Written in: JAVA, C++, Python, C
 Current stable version: TableAU 8.2
43
COMPANIES USING TABLEAU:
44
OTHER DATA VISUALIZATION TOOL:
PLOTLY
EMERGING BIG DATA TECHNOLOGIES -
TENSORFLOW
 has a Comprehensive, Flexible Ecosystem of tools,
Libraries and Community resources that lets Researchers
push the state-of-the-art in Machine Learning and
Developers
 can easily build and deploy Machine Learning powered
applications.
 Developed by: Google Brain Team in the year 2019
 Written in: Python, C++, CUDA
 Current stable version: TensorFlow 2.0 beta
45
COMPANIES USING TENSORFLOW:
46
OTHER EMERGING TECHNOLGY TOOLS
BEAM
DOCKER
AIRFLOW
KUBERNETES
APPLICATION OF BIG DATA ANALYTICS
47
Homeland
Security
Smarter
Healthcare Multi-channel
sales
Telecom
Manufacturing
TrafficControl Trading
Analytics
Search
Quality
RISKS OF BIG DATA
48
• Will be so overwhelmed
• Need the right people and solve the right problems
• Costs escalate too fast
• Is n’t necessary to capture 100%
• Many sources of big data
is privacy
• self-regulation (data compression)
• Legal regulation
20
HOW BIG DATA IMPACTS ON IT
49
 Big data is a troublesome force presenting
opportunities with challenges to IT organizations.
 By 2015 4.4 million IT jobs in Big Data ; 1.9 million is in
US itself
 In 2017, Data scientist’s was No. 1 Job in the
Harvard’s ranking.
BENEFITS OF BIG DATA
50
• Real-time big data isn’t just a process for storing
petabytes or exabytes of data in a data warehouse,
It’s about the ability to make better decisions and take
meaningful actions at the right time.
• Fast forward to the present and technologies like Hadoop
give you the scale and flexibility to store data before you
know how you are going to process it.
• Technologies such as MapReduce,Hive and Impala enable
you to run queries without changing the data structures
underneath.
RESEARCH PROJECTS RELATED TO COVID’19
51
•Project 1:
•World-wide COVID-19 Outbreak Data Analysis and
Prediction
•Methods:
 Real-time data query is done and visualized, then the
queried data is used for Susceptible-Exposed-Infectious-
Recovered (SEIR) predictive modelling.
 SEIR modelling to forecast COVID-19 outbreak within and
outside of China based on daily observations.
 Also analyze the queried news, and classify the news into
negative and positive sentiments, to understand the influence
of the news to people’s behavior both politically and
economically.
RESEARCH PROJECTS
52
•Project 2:
• Short-Term Applications of Artificial Intelligence and
Big Data: Tracking and Diagnosing COVID-19 Cases.
•Project 3:
•Short-Term Applications of Artificial Intelligence and Big
Data: A Quick and Effective Pandemic Alert.
•Project 4:
•Modeling of disease activity, potential growth and areas
of spread.
•Project 5:
•Modeling of the utility of operating theaters and clinics
with manpower projections
THANK YOU
53

Weitere ähnliche Inhalte

Was ist angesagt?

Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data AnalyticsTUSHAR GARG
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworksAmal Targhi
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopRemas Ittahir
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 

Was ist angesagt? (20)

Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
1
11
1
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Sina Sohangir Presentation on IWMC 2015
Sina Sohangir Presentation on IWMC 2015Sina Sohangir Presentation on IWMC 2015
Sina Sohangir Presentation on IWMC 2015
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 

Ähnlich wie Big Data Driven Solutions to Combat Covid-19 Webinar

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Denodo
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfrajsharma159890
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxVaishnavGhadge1
 
Big data seminor
Big data seminorBig data seminor
Big data seminorberasrujana
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022Kavika Roy
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 

Ähnlich wie Big Data Driven Solutions to Combat Covid-19 Webinar (20)

Big data-ppt-
Big data-ppt-Big data-ppt-
Big data-ppt-
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Big data
Big dataBig data
Big data
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
Big Data
Big DataBig Data
Big Data
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
25 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 202225 Best Data Mining Tools in 2022
25 Best Data Mining Tools in 2022
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 

Kürzlich hochgeladen

Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 

Kürzlich hochgeladen (20)

Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 

Big Data Driven Solutions to Combat Covid-19 Webinar

  • 1. Talk on Big Data Driven Solutions to Combat Covid’ 19 National Level Webinar at Ethiraj College for Women (Autonomous), Chennai. Dr.S.Balakrishnan, Professor and Head, Department of Computer Science and Business Systems, Sri Krishna College of Engineering and Technology, Coimbatore, Tamilnadu. 1
  • 2. OUTLINE  Introduction  Big Data Preparation  Types of Tools Used in Big-data  Top Big Data Technologies  Research Projects
  • 3. INTRODUCTION  Big Data may well be the Next Big Thing in the IT world.  Big data burst upon the scene in the first decade of the 21st century.  The first organizations to embrace it were online and startup firms. Firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning.  Like many new information technologies,  big data can bring about dramatic cost reductions,  Substantial improvements in the time required to perform a computing task, or  new product and service offerings. 3
  • 4. WHAT IS BIG DATA?  ‘Big Data’ is similar to ‘small data’, but bigger in size.  but having data bigger it requires different approaches:  Techniques, tools and architecture  an aim to solve new problems or old problems in a better way  Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques. 4
  • 5. WHAT IS BIG DATA  Walmart handles more than 1 million customer transactions every hour. • Facebook handles 40 billion photos from its user base. • Decoding the human genome originally took 10 years to process; now it can be achieved in one week. 5
  • 6. SOME MAKE IT 4V’S 6
  • 7. 7
  • 8. 8
  • 9. WHY BIG DATA 9  Growth of Big Data is needed – Increase of storage capacities – Increase of processing power – Availability of data(different data types)
  • 10. WHY BIG DATA 10 • FB generates 10TB daily • Twitter generates 7TB of data Daily • IBM claims 90% of today’s stored data was generated in just the last two years.
  • 11. HOW IS BIG DATA DIFFERENT? 11 1) Automatically generated by a machine (e.g. Sensor embedded in an engine) 2) Typically an entirely new source of data (e.g. Use of the internet) 3) Not designed to be friendly (e.g. Text streams) 14
  • 13. DATA GENERATION POINTS EXAMPLES 13 Mobile Devices Microphones Readers/Scanners Science facilities Programs/ Software Social Media Cameras
  • 14. THE STRUCTURE OF BIG DATA 14  Structured • Most traditional data sources  Semi-structured • Many sources of big data  Unstructured • Video data, audio data 11
  • 16. BIG DATA PREPARATION Perception vs. Reality  The perception is that you spend most of your time on analytics.  But in reality, you will devote much more time and effort on importing, profiling, cleansing, repairing, standardizing, and enriching your data. 16
  • 17. DATA CLEANSING  Data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.  Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting.  After cleansing, a data set will be consistent with other similar data sets in the system 17
  • 18. DATA CLEANSING VS VALIDATION Cleansing  Data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities Validation  The validation may be strict (such as rejecting any address that does not have a valid postal code) or fuzzy (such as correcting records that partially match existing, known records). 18
  • 19.  Data cleansing may also involve activities like, harmonization of data, and standardization of data.  For example, harmonization of short codes (st, rd, etc.) to actual words (street, road, etcetera).  Standardization of data is a means of changing a reference data set to a new standard, ex, use of standard codes. 19
  • 20. WHY DATA PREPARATION IS NEEDED  It significantly reduces the amount of time needed to ingest and prepare new data sets for multiple downstream processes.  Also it shapes and improves your business data, and render your ecosystem simple, scalable, and automated. 20
  • 21. DATA BEFORE PROCESSING  You work with a mishmash of data sources.  Your content will be inconsistent, incomplete and in a variety of formats.  It takes you weeks to process your data and write custom scripts to clean up the mess.  It needs efficient strategy to harvest and analyze data from social media and sales transactions.  You will have only a vague idea of the categories of information that your data might provide. 21
  • 22. DATA AFTER PROCESSING  Provides you with a large set of data repair, transformation, and enrichment options that require zero coding or scripting.  Enables you to see data transformations and the result of script automation in real time with a set of smart and interactive tools and features 22
  • 24. INGEST  What are your data sources?  Are they office documents, social media, or click stream logs? If so, you need to ingest your data before you can effectively analyze and enrich it.  To make sense of all the data you have, you must define a structure and correlate the disparate data sets.  This important step involves both understanding and standardizing your data. 24
  • 25. WHAT YOU CAN DO TO INGEST AND MEND YOUR DATA  Statistical Profiling: Create standard statistical analysis of numerical data, and frequency and term analysis of text data.  Process: Handle multiple formats of data sources, whether their content is structured, semi-structured, or unstructured.  Cleanse: Remove nonessential characters and standardize date formats.  Repair: Find and fix inconsistencies.  Detect Schema: Identify schema and metadata that is explicitly defined in headers, fields, or tags.  Identify Duplicates: Find and flag duplicates in your data so you can reduce the size of your data pool. 25
  • 26.  After you’ve cleansed your data, you can leverage any patterns and knowledge-based classifications to understand the domains found in your data sets.  Use the wide variety of known categories and vast array of reference data to analyze and recognize content without relying on any metadata.  After classification of data sets, enrich your data sets with related entities from the reference knowledge service, and extract embedded entities found in your data. This semantically enriches and correlates your data. ENRICH 26
  • 28. GOVERN  As you ingest, enrich, and publish your data, Cloud Service providers provides a user interface driven, intuitive Dashboard page to monitor all transform activity on your data sets. 28
  • 29. AUTOMATE Automate the Process  To automate the process.  First, you can use the scheduler to set your transformations to run on a daily, weekly, or monthly basis against a pre- determined data source.  Second, By using APIs you can automate the entire data preparation process, from file movement to preparation to publishing. 29
  • 30. TYPES OF TOOLS USED IN BIG-DATA 30  Where processing is hosted? Distributed Servers / Cloud (e.g. Amazon EC2)  Where data is stored? Distributed Storage (e.g. Amazon S3)  What is the programming model? Distributed Processing (e.g. MapReduce)  How data is stored & indexed? High-performance schema-free databases (e.g. MongoDB)  What operations are performed on data? Analytic / Semantic Processing
  • 31. TYPES OF BIG DATA TECHNOLOGIES  Big Data Technology is mainly classified into two types:  Operational Big Data Technologies  all about the normal day to day data that we generate.  Analytical Big Data Technologies  like the advanced version of Big Data Technologies.  little complex than the Operational Big Data. 31
  • 32. A FEW EXAMPLES OF OPERATIONAL BIG DATA TECHNOLOGIES ARE AS FOLLOWS: 32
  • 33. FEW EXAMPLES OF ANALYTICAL BIG DATA TECHNOLOGIES ARE AS FOLLOWS: 33
  • 34. 34 BIG DATA ANALYTICS AND DATA SCIENCES
  • 35. TOP BIG DATA TECHNOLOGIES  Top big data technologies are divided into 4 fields which are classified as follows:  Data Storage  Data Mining  Data Analytics  Data Visualization 35
  • 36. 36
  • 37. DATA STORAGE - HADOOP  Hadoop Framework was designed to store and process data in a Distributed Data Processing Environment with commodity hardware with a simple programming model.  Store and Analyse the data present in different machines with High Speeds and Low Costs.  Developed by: Apache Software Foundation in the year 2011 10th of Dec.  Written in: JAVA  Current stable version: Hadoop 3.11 37
  • 38. COMPANIES USING HADOOP: 38 COMPANIES USING MONGODB: COMPANIES USING RAINSTOR:
  • 39. DATA MINING - PRESTO  open source Distributed SQL Query Engine for running Interactive Analytic Queries against data sources of all sizes ranging from Gigabytes to Petabytes.  allows querying data in Hive, Cassandra, Relational Databases and Proprietary Data Stores. Developed by: Apache Foundation in the year 2013. Written in: JAVA Current stable version: Presto 0.22  39
  • 41. DATA ANALYTICS - KAFKA  Apache Kafka is a Distributed Streaming platform.  A streaming platform has Three Key Capabilities that are as follows:  Publisher  Subscriber  Consumer  This is similar to a Message Queue or an Enterprise Messaging System.  Developed by: Apache Software Foundation in the year 2011  Written in: Scala, JAVA  Current stable version: Apache Kafka 2.2.0 41
  • 42. COMPANIES USING KAFKA: 42 OTHER DATA ANALYTICS TOOLS SPLUNK KNIME SPARK R-LANGUAGE BLOCKCHAIN
  • 43. DATA VISUALIZATION - TABLEAU  Tableau is a Powerful and Fastest growing Data Visualization tool used in the Business Intelligence Industry.  Data analysis is very fast with Tableau and the Visualizations created are in the form of Dashboards and Worksheets.  Developed by: TableAU 2013 May 17th  Written in: JAVA, C++, Python, C  Current stable version: TableAU 8.2 43
  • 44. COMPANIES USING TABLEAU: 44 OTHER DATA VISUALIZATION TOOL: PLOTLY
  • 45. EMERGING BIG DATA TECHNOLOGIES - TENSORFLOW  has a Comprehensive, Flexible Ecosystem of tools, Libraries and Community resources that lets Researchers push the state-of-the-art in Machine Learning and Developers  can easily build and deploy Machine Learning powered applications.  Developed by: Google Brain Team in the year 2019  Written in: Python, C++, CUDA  Current stable version: TensorFlow 2.0 beta 45
  • 46. COMPANIES USING TENSORFLOW: 46 OTHER EMERGING TECHNOLGY TOOLS BEAM DOCKER AIRFLOW KUBERNETES
  • 47. APPLICATION OF BIG DATA ANALYTICS 47 Homeland Security Smarter Healthcare Multi-channel sales Telecom Manufacturing TrafficControl Trading Analytics Search Quality
  • 48. RISKS OF BIG DATA 48 • Will be so overwhelmed • Need the right people and solve the right problems • Costs escalate too fast • Is n’t necessary to capture 100% • Many sources of big data is privacy • self-regulation (data compression) • Legal regulation 20
  • 49. HOW BIG DATA IMPACTS ON IT 49  Big data is a troublesome force presenting opportunities with challenges to IT organizations.  By 2015 4.4 million IT jobs in Big Data ; 1.9 million is in US itself  In 2017, Data scientist’s was No. 1 Job in the Harvard’s ranking.
  • 50. BENEFITS OF BIG DATA 50 • Real-time big data isn’t just a process for storing petabytes or exabytes of data in a data warehouse, It’s about the ability to make better decisions and take meaningful actions at the right time. • Fast forward to the present and technologies like Hadoop give you the scale and flexibility to store data before you know how you are going to process it. • Technologies such as MapReduce,Hive and Impala enable you to run queries without changing the data structures underneath.
  • 51. RESEARCH PROJECTS RELATED TO COVID’19 51 •Project 1: •World-wide COVID-19 Outbreak Data Analysis and Prediction •Methods:  Real-time data query is done and visualized, then the queried data is used for Susceptible-Exposed-Infectious- Recovered (SEIR) predictive modelling.  SEIR modelling to forecast COVID-19 outbreak within and outside of China based on daily observations.  Also analyze the queried news, and classify the news into negative and positive sentiments, to understand the influence of the news to people’s behavior both politically and economically.
  • 52. RESEARCH PROJECTS 52 •Project 2: • Short-Term Applications of Artificial Intelligence and Big Data: Tracking and Diagnosing COVID-19 Cases. •Project 3: •Short-Term Applications of Artificial Intelligence and Big Data: A Quick and Effective Pandemic Alert. •Project 4: •Modeling of disease activity, potential growth and areas of spread. •Project 5: •Modeling of the utility of operating theaters and clinics with manpower projections