SlideShare ist ein Scribd-Unternehmen logo
1 von 22
DATA SCIENCE BIG DATA
Yazan abu al failat
AGENDA
• Data science and scientists
• Cycle of big data management
• Big data architecture
WHAT IS DATA SCIENCE
• Dealing with unstructured and structured data, Data
Science is a field that encompasses anything related
to data cleansing, preparation, and analysis.
• Data Science is an umbrella term for techniques used
when trying to extract insights and information from
data.
WHAT DATA SCIENTISTS DO ?
• Data scientists combine statistics,
mathematics, programming, problem-solving,
capturing data in ingenious ways, analyze to
find patterns, along with the activities of
cleansing, preparing, and aligning the data
THE CYCLE OF BIG DATA MANAGEMENT
(FUNCTIONAL REQUIREMENTS)
Implementation
WHAT OTHER IMPORTANT THINGS TO
SUPPORT THE FUNCTIONAL REQUIREMENTS
• Your needs will depend on the nature of the analysis
you are supporting.
• Performance
• The right amount of computational power and
speed.
• Your architecture also has to have the right amount of
redundancy so that you are protected from
unanticipated latency and downtime.In order to address these points you
need to consider some points in next
slide
ASK THE FOLLOWING
• How much data will my organization need to manage today
and in the future?
• How often will my organization need to manage data in
real time or near real time?
• How much risk can my organization afford? Is my industry
subject to strict security?
• How important is speed to my need to manage data?
• How certain or precise does the data need to be?
BIG DATA ARCHITECTURE
• big data management
architecture must
include a variety of
services that enable
companies to make
use of myriad data
sources in a fast and
effective manner.
INTERFACES
• What makes big data big is the fact that it relies on picking
up lots of data from lots of sources.
• Therefore, open application programming interfaces (APIs)
will be core to any big data architecture.
• Keep in mind that interfaces exist at every level and
between every layer of the stack.
• Without integration services, big data can’t happen.
REDUNDANT PHYSICAL INFRASTRUCTURE
• Without the availability of robust physical infrastructures, big data
would probably not have emerged as such an important trend.
• To support an unanticipated or unpredictable volume of data, a
physical infrastructure for big data has to be different than that for
traditional data.
• The physical infrastructure is based on a distributed computing
model. This means that data may be physically stored in many
different locations and can be linked together through networks,
the use of a distributed file system, and various big data analytic tools
and applications.
REDUNDANT PHYSICAL INFRASTRUCTURE
• Redundancy is important because we are dealing with so much data from
so many different sources.
• Redundancy comes in many forms. If your company has created a private
cloud, you will want to have redundancy built within the private
environment so that it can scale out to support changing workloads.
• If your company wants to contain internal IT growth, it may use external
cloud services to augment its internal resources. In some cases, this
redundancy may come in the form of a Software as a Service (SaaS) offering
that allows companies to do sophisticated data analysis as a service.
• The SaaS approach offers lower costs, quicker startup, and seamless
evolution of the underlying technology.
SECURITY INFRASTRUCTURE
• The more important big data analysis becomes to
companies, the more important it will be to secure that data.
• You will need to take into account
• who is allowed to see the data
• under what circumstances they are allowed to see data.
• You will need to be able to verify the identity of users
as well as protect the identity of patients.
• These types of security requirements need to be part of the
big data
OPERATIONAL DATA SOURCES
• Traditionally, an operational data source consisted of highly structured
data managed by company in a relational database
But as the world changes, it is important to
understand that operational data now has to
contain a broader set of data sources,
including unstructured sources
Social media
OPERATIONAL DATA SOURCES
• You find new emerging approaches to data
management in the big data world, including document,
graph, columnar, and geospatial database architectures.
• These are referred to as NoSQL, or not only SQL,
OPERATIONAL DATA SOURCES
CHARACTERISTICS
• All these operational data sources have several characteristics in
common:
• ✓ They represent systems of record that keep track of the critical
data required for real-time, day-to-day operation of the business.
• ✓ They are continually updated based on transactions happening
within business units and from the web.
• ✓ For these sources to provide an accurate representation of the
business, they must blend structured and unstructured data.
• ✓ These systems also must be able to scale to support thousands of
users on a consistent basis. These might include transactional e-
commerce systems, customer relationship management systems, or
call center applications.
PERFORMANCE
• Your data architecture also needs to perform in
concert with your organization’s supporting
infrastructure
• It might take days to run this model using a
traditional server con- figuration. However, using a
distributed computing model, what took days might
now take minutes.
• Performance might also determine the kind of
database you would use
PERFORMANCE – GRAPHING DATABASE
• A graphing database might be a better choice, as it is
specifically designed to separate the “nodes” or entities from
its “properties” or the information that defines that entity, and
the “edge” or relationship between nodes and properties.
• Using the right database will also improve performance.
Typically the graph database will be used in scientific and
technical applications.
GRAPH EXAMPLE
ORGANIZING DATA SERVICES AND
TOOLS
• Not all the data that organizations use is operational.
• A growing amount of data comes from a variety of sources
that aren’t quite as organized or straightforward, including
data that comes from machines or sensors, and massive
public and private data sources.
ORGANIZING DATA SERVICES AND TOOLS
• In the past, most companies weren’t able to either capture or store
this vast amount of data. It was simply too expensive or too
overwhelming.
• Even if companies were able to capture the data, they did not have
the tools to do anything about it. Very few tools could make sense of
these vast amounts of data. The tools that did exist were complex to
use and did not produce results in a reasonable time frame.
• In the end, those who really wanted to go to analyzing this data were
forced to work with snapshots of data. This has the undesirable
effect of missing important events because they were not in a
particular snapshot.
MAPREDUCE, HADOOP, AND BIG TABLE
• These emerging companies needed to find new technologies that
would allow them to store, access, and analyze huge amounts of
data in near real time
so that they could monetize the benefits of owning this much data
about participants in their networks.
• Their resulting solutions are transforming the data management
market.
• The innovations MapReduce, Hadoop, and Big Table proved a new
generation of data management.
• These technologies address one of the most fundamental
problems — the capability to process massive amounts of data
ACTIVITY
What is Map reduce, Big table
and Hadoop

Weitere ähnliche Inhalte

Was ist angesagt?

Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaStudent
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentationAASTHA PANDEY
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...yashbheda
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analyticsSanjeev Solanki
 
Intro to big data and applications - day 2
Intro to big data and applications - day 2Intro to big data and applications - day 2
Intro to big data and applications - day 2Parviz Vakili
 
Augmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data ScientistAugmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data ScientistWhereScape
 
Bp presentation business intelligence and advanced data analytics september ...
Bp presentation business intelligence  and advanced data analytics september ...Bp presentation business intelligence  and advanced data analytics september ...
Bp presentation business intelligence and advanced data analytics september ...Barrett Peterson
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Where HADOOP fits in and challenges
Where HADOOP fits in and challengesWhere HADOOP fits in and challenges
Where HADOOP fits in and challengesSuvradeep Rudra
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 

Was ist angesagt? (20)

Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by Jaseela
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
Bigdata (1) converted
Bigdata (1) convertedBigdata (1) converted
Bigdata (1) converted
 
Intro to big data and applications - day 2
Intro to big data and applications - day 2Intro to big data and applications - day 2
Intro to big data and applications - day 2
 
Augmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data ScientistAugmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data Scientist
 
Bp presentation business intelligence and advanced data analytics september ...
Bp presentation business intelligence  and advanced data analytics september ...Bp presentation business intelligence  and advanced data analytics september ...
Bp presentation business intelligence and advanced data analytics september ...
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Where HADOOP fits in and challenges
Where HADOOP fits in and challengesWhere HADOOP fits in and challenges
Where HADOOP fits in and challenges
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Big data storage
Big data storageBig data storage
Big data storage
 

Ähnlich wie Big data

Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONCisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONRenee Yao
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataUmair Shafique
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
data_blending
data_blendingdata_blending
data_blendingsubit1615
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry ReportRan Zhang
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1RUHULAMINHAZARIKA
 
Data Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentData Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentDenodo
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationDATAVERSITY
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Denodo
 

Ähnlich wie Big data (20)

Sgcp14dunlea
Sgcp14dunleaSgcp14dunlea
Sgcp14dunlea
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONCisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Big data.pptx
Big data.pptxBig data.pptx
Big data.pptx
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
 
data_blending
data_blendingdata_blending
data_blending
 
Total Data Industry Report
Total Data Industry ReportTotal Data Industry Report
Total Data Industry Report
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Data Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentData Virtualization for Compliance – Creating a Controlled Data Environment
Data Virtualization for Compliance – Creating a Controlled Data Environment
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Big data
Big dataBig data
Big data
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 

Kürzlich hochgeladen

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Kürzlich hochgeladen (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Big data

  • 1. DATA SCIENCE BIG DATA Yazan abu al failat
  • 2. AGENDA • Data science and scientists • Cycle of big data management • Big data architecture
  • 3. WHAT IS DATA SCIENCE • Dealing with unstructured and structured data, Data Science is a field that encompasses anything related to data cleansing, preparation, and analysis. • Data Science is an umbrella term for techniques used when trying to extract insights and information from data.
  • 4. WHAT DATA SCIENTISTS DO ? • Data scientists combine statistics, mathematics, programming, problem-solving, capturing data in ingenious ways, analyze to find patterns, along with the activities of cleansing, preparing, and aligning the data
  • 5. THE CYCLE OF BIG DATA MANAGEMENT (FUNCTIONAL REQUIREMENTS) Implementation
  • 6. WHAT OTHER IMPORTANT THINGS TO SUPPORT THE FUNCTIONAL REQUIREMENTS • Your needs will depend on the nature of the analysis you are supporting. • Performance • The right amount of computational power and speed. • Your architecture also has to have the right amount of redundancy so that you are protected from unanticipated latency and downtime.In order to address these points you need to consider some points in next slide
  • 7. ASK THE FOLLOWING • How much data will my organization need to manage today and in the future? • How often will my organization need to manage data in real time or near real time? • How much risk can my organization afford? Is my industry subject to strict security? • How important is speed to my need to manage data? • How certain or precise does the data need to be?
  • 8. BIG DATA ARCHITECTURE • big data management architecture must include a variety of services that enable companies to make use of myriad data sources in a fast and effective manner.
  • 9. INTERFACES • What makes big data big is the fact that it relies on picking up lots of data from lots of sources. • Therefore, open application programming interfaces (APIs) will be core to any big data architecture. • Keep in mind that interfaces exist at every level and between every layer of the stack. • Without integration services, big data can’t happen.
  • 10. REDUNDANT PHYSICAL INFRASTRUCTURE • Without the availability of robust physical infrastructures, big data would probably not have emerged as such an important trend. • To support an unanticipated or unpredictable volume of data, a physical infrastructure for big data has to be different than that for traditional data. • The physical infrastructure is based on a distributed computing model. This means that data may be physically stored in many different locations and can be linked together through networks, the use of a distributed file system, and various big data analytic tools and applications.
  • 11. REDUNDANT PHYSICAL INFRASTRUCTURE • Redundancy is important because we are dealing with so much data from so many different sources. • Redundancy comes in many forms. If your company has created a private cloud, you will want to have redundancy built within the private environment so that it can scale out to support changing workloads. • If your company wants to contain internal IT growth, it may use external cloud services to augment its internal resources. In some cases, this redundancy may come in the form of a Software as a Service (SaaS) offering that allows companies to do sophisticated data analysis as a service. • The SaaS approach offers lower costs, quicker startup, and seamless evolution of the underlying technology.
  • 12. SECURITY INFRASTRUCTURE • The more important big data analysis becomes to companies, the more important it will be to secure that data. • You will need to take into account • who is allowed to see the data • under what circumstances they are allowed to see data. • You will need to be able to verify the identity of users as well as protect the identity of patients. • These types of security requirements need to be part of the big data
  • 13. OPERATIONAL DATA SOURCES • Traditionally, an operational data source consisted of highly structured data managed by company in a relational database But as the world changes, it is important to understand that operational data now has to contain a broader set of data sources, including unstructured sources Social media
  • 14. OPERATIONAL DATA SOURCES • You find new emerging approaches to data management in the big data world, including document, graph, columnar, and geospatial database architectures. • These are referred to as NoSQL, or not only SQL,
  • 15. OPERATIONAL DATA SOURCES CHARACTERISTICS • All these operational data sources have several characteristics in common: • ✓ They represent systems of record that keep track of the critical data required for real-time, day-to-day operation of the business. • ✓ They are continually updated based on transactions happening within business units and from the web. • ✓ For these sources to provide an accurate representation of the business, they must blend structured and unstructured data. • ✓ These systems also must be able to scale to support thousands of users on a consistent basis. These might include transactional e- commerce systems, customer relationship management systems, or call center applications.
  • 16. PERFORMANCE • Your data architecture also needs to perform in concert with your organization’s supporting infrastructure • It might take days to run this model using a traditional server con- figuration. However, using a distributed computing model, what took days might now take minutes. • Performance might also determine the kind of database you would use
  • 17. PERFORMANCE – GRAPHING DATABASE • A graphing database might be a better choice, as it is specifically designed to separate the “nodes” or entities from its “properties” or the information that defines that entity, and the “edge” or relationship between nodes and properties. • Using the right database will also improve performance. Typically the graph database will be used in scientific and technical applications.
  • 19. ORGANIZING DATA SERVICES AND TOOLS • Not all the data that organizations use is operational. • A growing amount of data comes from a variety of sources that aren’t quite as organized or straightforward, including data that comes from machines or sensors, and massive public and private data sources.
  • 20. ORGANIZING DATA SERVICES AND TOOLS • In the past, most companies weren’t able to either capture or store this vast amount of data. It was simply too expensive or too overwhelming. • Even if companies were able to capture the data, they did not have the tools to do anything about it. Very few tools could make sense of these vast amounts of data. The tools that did exist were complex to use and did not produce results in a reasonable time frame. • In the end, those who really wanted to go to analyzing this data were forced to work with snapshots of data. This has the undesirable effect of missing important events because they were not in a particular snapshot.
  • 21. MAPREDUCE, HADOOP, AND BIG TABLE • These emerging companies needed to find new technologies that would allow them to store, access, and analyze huge amounts of data in near real time so that they could monetize the benefits of owning this much data about participants in their networks. • Their resulting solutions are transforming the data management market. • The innovations MapReduce, Hadoop, and Big Table proved a new generation of data management. • These technologies address one of the most fundamental problems — the capability to process massive amounts of data
  • 22. ACTIVITY What is Map reduce, Big table and Hadoop