SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Use of Big Data Architecture
Atif Farid Mohammad, PhD
Data Science Professor, Adjunct
UNC Charlotte
Big Data
• How Big it is?
• Does it matter?
• ?
• ?
• ?
• ?
• There can be many more questions…
Word of Caution
•Kindly Avoid Thinking in SQL Mode
•For this talk’s time period…
VS
Differences… RDBMS vs. Hadoop
RDBMS
• Schema
• Required on the Write
• Speed
• Reads are Fast
• Governance
• Standard and Structured
• Processing
• Limited, No Data Processing
• Data Types
• Structured
Hadoop
• Schema
• Required on the Read
• Speed
• Writes are Fast
• Governance
• Loosely Structured
• Processing
• Processing coupled with data
• Data Types
• Multi and Unstructured
RDBMS vs. Hadoop
• Schema
• Required on the Write
• Speed
• Reads are Fast
• Governance
• Standard and Structured
• Processing
• Limited, No Data Processing
• Data Types
• Structured
• Schema
• Required on the Read
• Speed
• Writes are Fast
• Governance
• Loosely Structured
• Processing
• Processing coupled with data
• Data Types
• Multi and Unstructured
Attributes IT Systems Hadoop
Data Size Gigabytes Peta/Zeta Bytes
Access Batch & Interactive Batch
CRUD Read & Write Many Times Write Once, Read Many
Times
Structure Static Dynamic
Integrity Normalization De-Normalization
Scalability Non-Linear Linear
Differences between IT Systems and Hadoop
A Scenario to Understand Big Data
•A Trucking Company collects… Using…???
A Scenario to Understand Big Data…
• GPS
• Speed
• Acceleration
• Stopping
• Normal
• To Quick
• Driving to Close to other Vehicles
What Standard Technologies You will use???
Hadoop EcoSystem Utilization
• Flume to get raw sensor data
• Sqoop to transport data to HDFS about
• Driver
• Vehicle
• Hcatalog to have all schema definition
• Hive to analyze Gas Milage
• Pig to compute Risk Factor for each Truck Driver based on his/her
related events
• Spark to create Data Sets by applying Machine Learning
Anothor Example - Bank
17
Data Acquisition
• Input
• Multiple user event feeds (browsing activities, search etc.) per time period
User Time Event Source
U1 T0 visited Bank Site Server logs
U1 T1 searched for “Credit Cards” Search logs
U1 T2 browsed Banking Services Web server logs
U1 T3 Saw an e-Mail sent link Link advertising logs
U1 T4 Used OLTP Web server logs
U1 T5 clicked on an ad for “some insurance” Ad logs, click server logs
18
Data Acquisition for the Landing Zone
Event
Feeds
User
event Normalized
Events (NE)
User
event
User
event
Project relevant
event attributes
Filter irrelevant
events
Tag and Transform
• Categorization
• Topic
• ….
HDFSUser
event
User
event
User
event
Map Operations
19
Data Acquisition for the Landing Zone
• Output:
• Single normalized feed containing all events for all users per time period
User Time Event Tag
U1 T0 Content browsing Web clicks by a Bank’s user
U2 T2 Search query Category: Credit Card
… … ……. ………
... … ……. ………
U23 T23 OLTP usage Drop event
U36 T36 Bank’s site page click Category: Some product
20
Feature and Target Generation for the Discovery
Zone
• Features:
• Summaries of user activities over a time window
• Aggregates, Moving averages, Rates etc. over moving time windows
• Support online updates to existing features
• Targets:
• Constructed in the offline model training phase
• Typically user actions in the future time period indicating interest
• Clicks/Click-through financial product offering and content
• Site and page visits
• Conversion events
• Deposit, Withdrawal, Quote requests etc.
• Sign-ups to newsletters, Registrations etc.
21
Feature Generation for Discovery Zone
NE 1
Feature
Set
HDFSNE 4
NE 2
NE 5 NE 6
NE 3
NE 7 NE 8 NE 9
Aggregate
Normalized
events
Map 1
U1, Event 1
Map 2
U1, Event 2
Map 3
U1, Event 2
Reduce 1 Reduce 2
All events for U1
U2, Event 2 U2, Event 3 U2, Event 1
All events for U2
Summaries over
user event history
Aggregates within window
Time and event weighted averages
Event rates
……..
22
Modeling Workflow within the Discovery Zone
Target generation
Feature generation
Data Acquisition
User
event
history
Targets
Features
Model Training
Weights
Training
Phase
Target generation
Feature generation
Data Acquisition
User
event
history
Targets
Features
Evaluation
Phase
Model Scoring
Evaluation
Scores
23
Batch Scoring for Discovery Results
Data Acquisition
User
event
history
Feature generation
Features
Online Serving
Systems
Model Scoring
Scores
Weights
24
Discovery Zone Pipeline System Estimation
Component Data Processed Time Estimation
Data Acquisition ~ 1 Tb per time period 2 – 3 hours
Feature and Target
Generation
~ 1 Tb * Size of feature
window
4 - 6 hours
Model Training ~ 50 - 100 Gb 1 – 2 hours for 100’s of
models
Scoring ~ 500 Gb 1 hour
Requirements Extraction Process
• Two-step process is used for requirement extraction:
1) Extract specific requirements and map to reference architecture based on each application’s
characteristics such as:
a) data sources (data size, file formats, rate of grow, at rest or in motion, etc.)
b) data lifecycle management (curation, conversion, quality check, pre-analytic processing, etc.)
c) data transformation (data fusion/mashup, analytics),
d) capability infrastructure (software tools, platform tools, hardware resources such as storage and
networking), and
e) data usage (processed results in text, table, visual, and other formats).
f) all architecture components informed by Goals and use case description
g) Security & Privacy has direct map
2) Aggregate all specific requirements into high-level generalized requirements which are
vendor-neutral and technology agnostic.
25
Cloud
Business Intelligence
 Data Analyses
 Data Cleansing
 Entity Relationship Modeling
 Dimensional Modeling
 Database Design & Implementation
 Database Population through ETL/ELT
 Downstream Applications linkage - Metadata
 Maintaining the processes
Source
Data
Extensive processes and costs:
Big Data Edge from Data Warehouse
Data Marts
Analytical
Database
Analytical
Database
Analytical
Database
Analytical
Database
Analytical
Database
Metadata Management
Security and Data Privacy
System Management and Administration
Network Connectivity, Protocols & Access Middleware
Hardware & Software Platforms
Web Browser
Portals
Devices
(ex.: mobile)
Web Services
Access
Collaboration
BusinessApplications
Query &
Reporting
Data Mining
Modeling
Scorecard
Visualization
Embedded
Analytics
Analytics
Operational
Data Stores
Data
Warehouse
Data Marts
Staging
Areas
Metadata
Data Repositories
Extraction
Transformation
Load / Apply
Synchronization
Transport /
Messaging
Information Integrity
Data Integration
Data Flow and Workflow
Enterprise
Unstructured
Informational
External
Data Sources
Supplier
Orders
Product
Promotions
Customer
Location
Invoice
ePOS
Other
HDFS
Analytical
Data Marts
HCatalog
Data Lake
Sqoop
MapReduce/PIG
Load / Apply
Single Source
HCatalog & Pig
Can work with most ETL tools
on the market
Transport /
Messaging
Metadata Management - HCatalog
Metadata Management
Security and Data Privacy
System Management and Administration
Network Connectivity, Protocols & Access Middleware
Hardware & Software Platforms
Web Browser
Portals
Devices
(ex.: mobile)
Web Services
Access
Collaboration
BusinessApplications
Query &
Reporting
Data Mining
Modeling
Scorecard
Visualization
Embedded
Analytics
Analytics
Operational
Data Stores
Data
Warehouse
Data Marts
Staging
Areas
Metadata
Data Repositories
Extraction
Transformation
Load / Apply
Synchronization
Transport /
Messaging
Information Integrity
Data Integration
Data Flow and Workflow
Enterprise
Unstructured
Informational
External
Data Sources
Supplier
Orders
Product
Promotions
Customer
Location
Invoice
ePOS
Other
Reference Architecture
Metadata Management
Security and Data Privacy
System Management and Administration
Network Connectivity, Protocols & Access Middleware
Hardware & Software Platforms
Web Browser
Portals
Devices
(ex.: mobile)
Web Services
Access
Collaboration
BusinessApplications
Query &
Reporting
Data Mining
Modeling
Scorecard
Visualization
Embedded
Analytics
Analytics
Operational
Data Stores
Data
Warehouse
Data Marts
Staging
Areas
Metadata
Data Repositories
Extraction
Transformation
Load / Apply
Synchronization
Transport /
Messaging
Information Integrity
Data Integration
Data Flow and Workflow
Enterprise
Unstructured
Informational
External
Data Sources
Supplier
Orders
Product
Promotions
Customer
Location
Invoice
ePOS
Other
Transport /
Messaging
HCatalog – Hadoop metadata repository and management
service that provides a centralized way for data processing systems
to understand the structure and location of the data stored within
Apache Hadoop.
Extraction is an application used to transfer data, usually from
relational databases to a flat file, which can then be use to transport to a
landing are of a Data Warehouse and ingest into BI/DW environment.
Reference Architecture
Extraction
Sqoop – is a command-line interface application for transferring data between relational
databases and Hadoop. It supports incremental loads of a single table or a free form SQL query
as well as saved jobs which can be run multiple times to import updates made to a database
since the last import. Exports can be used to put data from Hadoop into a relational database.
Source
Extract Target Source Target
Sqoop
Current BI Proposed BI
sftp
Database extract
MapReduce – A framework for writing applications that processes large amounts of
structured and unstructured data in parallel across large clusters of machines in a very reliable
and fault-tolerant manner.
Pig – A platform for processing and analyzing large data sets. Pig consists on a high-level
language (Pig Latin) for expressing data analysis programs paired with the MapReduce
framework for processing these programs.
Transformation
Landing
Staging
DW
HDFS
DM
Current BI Proposed BI
DM
MapReduce/PigComplex ETL
Complex ETL
Complex ETL
Load / Apply
Staging
DW
DM
Current BI Proposed BI
DM
Synchronization
Synchronization – The ETL process takes source data from staging, transforms using
business rules and loads into central repository DW. In this scenario, in order to retain
information integrity, one has to put in place a synchronization checks & correction mechanism.
HDFS as a Single Source – In the proposed solution HDFS acts as a single source of
data so there is no danger of desinhronization. The inconsistencies resulted from duplicated or
inconsistent data will be reconciled with assistance of HCatalog and proper data governance.
Staging
DW
Landing
Synchronization
Source DM
HDFSSource DM
Information Integrity
Current – Currently there is no special approach to the data quality other than
imbedded into the ETL processes and logic. There are tools and approaches to
implement QA & QC.
Hadoop – More focused approach - While we use HDFS as a one big “Data Lake” QA
and QC will be applied at the Data Mart Level where the actual transformations will
occur, hence reducing the overall effort. QA & QC will be an integral part of Data
Governance and augmented by usage of HCatalog.
Metadata Management
Security and Data Privacy
System Management and Administration
Network Connectivity, Protocols & Access Middleware
Hardware & Software Platforms
Web Browser
Portals
Devices
(ex.: mobile)
Web Services
Access
Collaboration
BusinessApplications
Query &
Reporting
Data Mining
Modeling
Scorecard
Visualization
Embedded
Analytics
Analytics
Operational
Data Stores
Data
Warehouse
Data Marts
Staging
Areas
Metadata
Data Repositories
Extraction
Transformation
Load / Apply
Synchronization
Transport /
Messaging
Information Integrity
Data Integration
Data Flow and Workflow
Enterprise
Unstructured
Informational
External
Data Sources
Supplier
Orders
Product
Promotions
Customer
Location
Invoice
ePOS
Other
Data Repositories
Operational
Data Stores
Data
Warehouse
Data Marts
Staging
Areas
Metadata
HDFS
HCatalog
HCatalog Metadata Management
HCatalog – A Hadoop metadata repository and management service
that provides a centralized way for data processing systems to understand
the structure and location of the data stored within Apache Hadoop.
Reference Architecture
Hadoop Distributed File System (HDFS) – A reliable and distributed Java-based file
system that allows large volumes of data to be stored and rapidly accessed across large
clusters of commodity servers
HCatalog Metadata Management
Security and Data Privacy
System Management and Administration
Network Connectivity, Protocols & Access Middleware
Hardware & Software Platforms
Web Browser
Portals
Devices
(ex.: mobile)
Web Services
Access
Collaboration
BusinessApplications
Query &
Reporting
Data Mining
Modeling
Scorecard
Visualization
Embedded
Analytics
Analytics
Data Flow and Workflow
Enterprise
Unstructured
Informational
External
Data Sources
Supplier
Orders
Product
Promotions
Customer
Location
Invoice
ePOS
Other
HDFS
Analytical
Data Marts
HCatalog
Data Repositories
Sqoop
MapReduce/PIG
Load / Apply
Single Source
HCatalog & Pig
Can work with Informatica
Data Integration
Transport /
Messaging
Reference Architecture
Capability Current BI Proposed BI Expected
Change
Data Sources Source Applications Source Applications No
Data Integration
Extraction from Source DB Export Sqoop On-to-one change
Transport/Messaging SFTP SFTP No
Staging Area
Transformations/Load
Complex ETL Code None required eliminated
Extract from Staging Complex ETL Code None required eliminated
Transformation for DW Complex ETL Code None required eliminated
Load to DW Complex ETL, RDBMS None required eliminated
Extract from from DW,
Transformation and load to DM
Complex ETL code & process to feed DM MapReduce/Pig simplified transformations
from HDFS to DM
Data Quality , Balance & Controls mbedded ETL Code MapReduce/Pig in conjunction
with HCatalog; Can also coexist
with Informatica
Yes
Reference Architecture
Map Reduce
Map Operation
MAP: Input data  <key, value> pair
Data
Collection: split1
web 1
weed 1
green 1
sun 1
moon 1
land 1
part 1
web 1
green 1
… 1
KEY VALUE
Split the data to
Supply multiple
processors
Data
Collection: split 2
Data
Collection: split n
Map
…… Map
34
web 1
weed 1
green 1
sun 1
moon 1
land 1
part 1
web 1
green 1
… 1
KEY VALUE
web 1
weed 1
green 1
sun 1
moon 1
land 1
part 1
web 1
green 1
… 1
KEY VALUE
web 1
weed 1
green 1
sun 1
moon 1
land 1
part 1
web 1
green 1
… 1
KEY VALUE
web 1
weed 1
green 1
sun 1
moon 1
land 1
part 1
web 1
green 1
… 1
KEY VALUE
…
Reduce
Reduce
Reduce
Reduce Operation
MAP: Input data  <key, value> pair
REDUCE: <key, value> pair  <result>
Data
Collection: split1 Split the data to
Supply multiple
processors
Data
Collection: split 2
Data
Collection: split n Map
Map
…… Map
35
…
Cat
Bat
Dog
Other
Words
(size:
TByte)
map
map
map
map
split
split
split
split
combine
combine
combine
reduce
reduce
reduce
part0
part1
part2
MapReduce
36
CountCountCount
Large scale data splits
Parse-hash
Parse-hash
Parse-hash
Parse-hash
Map <key, 1> Reducers (say, Count)
P-0000
P-0001
P-0002
, count1
, count2
,count3
37
Web References
• “MapReduce: Simplified Data Processing on Large Clusters”, Jeffrey Dean and
Sanjay Ghemawat, December 2004.
http://labs.google.com/papers/mapreduce.html
• “Scalable SQL”, ACM Queue, Michael Rys, April 19, 2011
http://queue.acm.org/detail.cfm?id=1971597
• “a practical guide to noSQL”, Posted by Denise Miura on March 17, 2011 at
http://blogs.marklogic.com/2011/03/17/a-practical-guide-to-nosql/
Thank you
Questions…

Weitere ähnliche Inhalte

Was ist angesagt?

Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Rittman Analytics
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data ArchitectureZaloni
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...Romeo Kienzler
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.Richard Vermillion
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationZaloni
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkDataWorks Summit
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentDataWorks Summit
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Zaloni
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Zaloni
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
StreamCentral Technical Overview
StreamCentral Technical OverviewStreamCentral Technical Overview
StreamCentral Technical OverviewRaheel Retiwalla
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeDataWorks Summit
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data LakeRobert Chong
 

Was ist angesagt? (20)

Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data Architecture
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environment
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
StreamCentral Technical Overview
StreamCentral Technical OverviewStreamCentral Technical Overview
StreamCentral Technical Overview
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC Edge
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 

Andere mochten auch

Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityClouderaUserGroups
 
Apache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data ScienceApache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data ScienceBikas Saha
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetupAlex Zeltov
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentrytrihug
 
Introduction to Apache Beam (incubating) - DataCamp Salzburg - 7 dec 2016
Introduction to Apache Beam (incubating) - DataCamp Salzburg - 7 dec 2016Introduction to Apache Beam (incubating) - DataCamp Salzburg - 7 dec 2016
Introduction to Apache Beam (incubating) - DataCamp Salzburg - 7 dec 2016Sergio Fernández
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentrymozillazg
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hPrecisely
 
Apache Zeppelin 소개
Apache Zeppelin 소개Apache Zeppelin 소개
Apache Zeppelin 소개KSLUG
 
Interactive Data Science Notebooks with Apache Zeppelin
Interactive Data Science Notebooks with Apache ZeppelinInteractive Data Science Notebooks with Apache Zeppelin
Interactive Data Science Notebooks with Apache ZeppelinGeorg Sorst
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetupnvvrajesh
 
A gentle intro of Apache zeppelin
A gentle intro of Apache zeppelinA gentle intro of Apache zeppelin
A gentle intro of Apache zeppelinAhyoung Ryu
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Clusterahortonworks
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 

Andere mochten auch (20)

Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
 
Apache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data ScienceApache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data Science
 
Oozie meetup - HA
Oozie meetup - HAOozie meetup - HA
Oozie meetup - HA
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentry
 
Introduction to Apache Beam (incubating) - DataCamp Salzburg - 7 dec 2016
Introduction to Apache Beam (incubating) - DataCamp Salzburg - 7 dec 2016Introduction to Apache Beam (incubating) - DataCamp Salzburg - 7 dec 2016
Introduction to Apache Beam (incubating) - DataCamp Salzburg - 7 dec 2016
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentry
 
April 2014 HUG : Apache Sentry
April 2014 HUG : Apache SentryApril 2014 HUG : Apache Sentry
April 2014 HUG : Apache Sentry
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
 
Apache Zeppelin 소개
Apache Zeppelin 소개Apache Zeppelin 소개
Apache Zeppelin 소개
 
Interactive Data Science Notebooks with Apache Zeppelin
Interactive Data Science Notebooks with Apache ZeppelinInteractive Data Science Notebooks with Apache Zeppelin
Interactive Data Science Notebooks with Apache Zeppelin
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
A gentle intro of Apache zeppelin
A gentle intro of Apache zeppelinA gentle intro of Apache zeppelin
A gentle intro of Apache zeppelin
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 

Ähnlich wie Big data meet_up_08042016

Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Tech Triveni
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsDataWorks Summit
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Databricks
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Imviplav
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big DataFrank Kienle
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 

Ähnlich wie Big data meet_up_08042016 (20)

Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
WebAction-Sami Abkay
WebAction-Sami AbkayWebAction-Sami Abkay
WebAction-Sami Abkay
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Hadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
 
Introduction Big Data
Introduction Big DataIntroduction Big Data
Introduction Big Data
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 

Mehr von Mark Smith

Ss jan19 2020_isafepeople
Ss jan19 2020_isafepeopleSs jan19 2020_isafepeople
Ss jan19 2020_isafepeopleMark Smith
 
Ss jan12 2020_introboundaries
Ss jan12 2020_introboundariesSs jan12 2020_introboundaries
Ss jan12 2020_introboundariesMark Smith
 
Ss dec092018genesis
Ss dec092018genesisSs dec092018genesis
Ss dec092018genesisMark Smith
 
The Bridge Sunday School. Acts Prayer Model Week 1
The Bridge Sunday School. Acts Prayer Model Week 1The Bridge Sunday School. Acts Prayer Model Week 1
The Bridge Sunday School. Acts Prayer Model Week 1Mark Smith
 
The Bridge Sunday School. Acts Prayer Model Week 2
The Bridge Sunday School. Acts Prayer Model Week 2The Bridge Sunday School. Acts Prayer Model Week 2
The Bridge Sunday School. Acts Prayer Model Week 2Mark Smith
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCMark Smith
 
Sunday School Trial of Jesus
Sunday School Trial of JesusSunday School Trial of Jesus
Sunday School Trial of JesusMark Smith
 
Ss sep11 2016_apologetics
Ss sep11 2016_apologeticsSs sep11 2016_apologetics
Ss sep11 2016_apologeticsMark Smith
 
Ss aug28 2016_apologetics
Ss aug28 2016_apologeticsSs aug28 2016_apologetics
Ss aug28 2016_apologeticsMark Smith
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Mark Smith
 

Mehr von Mark Smith (10)

Ss jan19 2020_isafepeople
Ss jan19 2020_isafepeopleSs jan19 2020_isafepeople
Ss jan19 2020_isafepeople
 
Ss jan12 2020_introboundaries
Ss jan12 2020_introboundariesSs jan12 2020_introboundaries
Ss jan12 2020_introboundaries
 
Ss dec092018genesis
Ss dec092018genesisSs dec092018genesis
Ss dec092018genesis
 
The Bridge Sunday School. Acts Prayer Model Week 1
The Bridge Sunday School. Acts Prayer Model Week 1The Bridge Sunday School. Acts Prayer Model Week 1
The Bridge Sunday School. Acts Prayer Model Week 1
 
The Bridge Sunday School. Acts Prayer Model Week 2
The Bridge Sunday School. Acts Prayer Model Week 2The Bridge Sunday School. Acts Prayer Model Week 2
The Bridge Sunday School. Acts Prayer Model Week 2
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKC
 
Sunday School Trial of Jesus
Sunday School Trial of JesusSunday School Trial of Jesus
Sunday School Trial of Jesus
 
Ss sep11 2016_apologetics
Ss sep11 2016_apologeticsSs sep11 2016_apologetics
Ss sep11 2016_apologetics
 
Ss aug28 2016_apologetics
Ss aug28 2016_apologeticsSs aug28 2016_apologetics
Ss aug28 2016_apologetics
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
 

Kürzlich hochgeladen

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 

Kürzlich hochgeladen (20)

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 

Big data meet_up_08042016

  • 1. Use of Big Data Architecture Atif Farid Mohammad, PhD Data Science Professor, Adjunct UNC Charlotte
  • 2. Big Data • How Big it is? • Does it matter? • ? • ? • ? • ? • There can be many more questions…
  • 3. Word of Caution •Kindly Avoid Thinking in SQL Mode •For this talk’s time period…
  • 4.
  • 5.
  • 6. VS
  • 8. RDBMS • Schema • Required on the Write • Speed • Reads are Fast • Governance • Standard and Structured • Processing • Limited, No Data Processing • Data Types • Structured
  • 9. Hadoop • Schema • Required on the Read • Speed • Writes are Fast • Governance • Loosely Structured • Processing • Processing coupled with data • Data Types • Multi and Unstructured
  • 10. RDBMS vs. Hadoop • Schema • Required on the Write • Speed • Reads are Fast • Governance • Standard and Structured • Processing • Limited, No Data Processing • Data Types • Structured • Schema • Required on the Read • Speed • Writes are Fast • Governance • Loosely Structured • Processing • Processing coupled with data • Data Types • Multi and Unstructured
  • 11. Attributes IT Systems Hadoop Data Size Gigabytes Peta/Zeta Bytes Access Batch & Interactive Batch CRUD Read & Write Many Times Write Once, Read Many Times Structure Static Dynamic Integrity Normalization De-Normalization Scalability Non-Linear Linear Differences between IT Systems and Hadoop
  • 12. A Scenario to Understand Big Data •A Trucking Company collects… Using…???
  • 13. A Scenario to Understand Big Data… • GPS • Speed • Acceleration • Stopping • Normal • To Quick • Driving to Close to other Vehicles
  • 14. What Standard Technologies You will use???
  • 15. Hadoop EcoSystem Utilization • Flume to get raw sensor data • Sqoop to transport data to HDFS about • Driver • Vehicle • Hcatalog to have all schema definition • Hive to analyze Gas Milage • Pig to compute Risk Factor for each Truck Driver based on his/her related events • Spark to create Data Sets by applying Machine Learning
  • 17. 17 Data Acquisition • Input • Multiple user event feeds (browsing activities, search etc.) per time period User Time Event Source U1 T0 visited Bank Site Server logs U1 T1 searched for “Credit Cards” Search logs U1 T2 browsed Banking Services Web server logs U1 T3 Saw an e-Mail sent link Link advertising logs U1 T4 Used OLTP Web server logs U1 T5 clicked on an ad for “some insurance” Ad logs, click server logs
  • 18. 18 Data Acquisition for the Landing Zone Event Feeds User event Normalized Events (NE) User event User event Project relevant event attributes Filter irrelevant events Tag and Transform • Categorization • Topic • …. HDFSUser event User event User event Map Operations
  • 19. 19 Data Acquisition for the Landing Zone • Output: • Single normalized feed containing all events for all users per time period User Time Event Tag U1 T0 Content browsing Web clicks by a Bank’s user U2 T2 Search query Category: Credit Card … … ……. ……… ... … ……. ……… U23 T23 OLTP usage Drop event U36 T36 Bank’s site page click Category: Some product
  • 20. 20 Feature and Target Generation for the Discovery Zone • Features: • Summaries of user activities over a time window • Aggregates, Moving averages, Rates etc. over moving time windows • Support online updates to existing features • Targets: • Constructed in the offline model training phase • Typically user actions in the future time period indicating interest • Clicks/Click-through financial product offering and content • Site and page visits • Conversion events • Deposit, Withdrawal, Quote requests etc. • Sign-ups to newsletters, Registrations etc.
  • 21. 21 Feature Generation for Discovery Zone NE 1 Feature Set HDFSNE 4 NE 2 NE 5 NE 6 NE 3 NE 7 NE 8 NE 9 Aggregate Normalized events Map 1 U1, Event 1 Map 2 U1, Event 2 Map 3 U1, Event 2 Reduce 1 Reduce 2 All events for U1 U2, Event 2 U2, Event 3 U2, Event 1 All events for U2 Summaries over user event history Aggregates within window Time and event weighted averages Event rates ……..
  • 22. 22 Modeling Workflow within the Discovery Zone Target generation Feature generation Data Acquisition User event history Targets Features Model Training Weights Training Phase Target generation Feature generation Data Acquisition User event history Targets Features Evaluation Phase Model Scoring Evaluation Scores
  • 23. 23 Batch Scoring for Discovery Results Data Acquisition User event history Feature generation Features Online Serving Systems Model Scoring Scores Weights
  • 24. 24 Discovery Zone Pipeline System Estimation Component Data Processed Time Estimation Data Acquisition ~ 1 Tb per time period 2 – 3 hours Feature and Target Generation ~ 1 Tb * Size of feature window 4 - 6 hours Model Training ~ 50 - 100 Gb 1 – 2 hours for 100’s of models Scoring ~ 500 Gb 1 hour
  • 25. Requirements Extraction Process • Two-step process is used for requirement extraction: 1) Extract specific requirements and map to reference architecture based on each application’s characteristics such as: a) data sources (data size, file formats, rate of grow, at rest or in motion, etc.) b) data lifecycle management (curation, conversion, quality check, pre-analytic processing, etc.) c) data transformation (data fusion/mashup, analytics), d) capability infrastructure (software tools, platform tools, hardware resources such as storage and networking), and e) data usage (processed results in text, table, visual, and other formats). f) all architecture components informed by Goals and use case description g) Security & Privacy has direct map 2) Aggregate all specific requirements into high-level generalized requirements which are vendor-neutral and technology agnostic. 25
  • 26. Cloud Business Intelligence  Data Analyses  Data Cleansing  Entity Relationship Modeling  Dimensional Modeling  Database Design & Implementation  Database Population through ETL/ELT  Downstream Applications linkage - Metadata  Maintaining the processes Source Data Extensive processes and costs: Big Data Edge from Data Warehouse Data Marts Analytical Database Analytical Database Analytical Database Analytical Database Analytical Database
  • 27. Metadata Management Security and Data Privacy System Management and Administration Network Connectivity, Protocols & Access Middleware Hardware & Software Platforms Web Browser Portals Devices (ex.: mobile) Web Services Access Collaboration BusinessApplications Query & Reporting Data Mining Modeling Scorecard Visualization Embedded Analytics Analytics Operational Data Stores Data Warehouse Data Marts Staging Areas Metadata Data Repositories Extraction Transformation Load / Apply Synchronization Transport / Messaging Information Integrity Data Integration Data Flow and Workflow Enterprise Unstructured Informational External Data Sources Supplier Orders Product Promotions Customer Location Invoice ePOS Other HDFS Analytical Data Marts HCatalog Data Lake Sqoop MapReduce/PIG Load / Apply Single Source HCatalog & Pig Can work with most ETL tools on the market Transport / Messaging Metadata Management - HCatalog
  • 28. Metadata Management Security and Data Privacy System Management and Administration Network Connectivity, Protocols & Access Middleware Hardware & Software Platforms Web Browser Portals Devices (ex.: mobile) Web Services Access Collaboration BusinessApplications Query & Reporting Data Mining Modeling Scorecard Visualization Embedded Analytics Analytics Operational Data Stores Data Warehouse Data Marts Staging Areas Metadata Data Repositories Extraction Transformation Load / Apply Synchronization Transport / Messaging Information Integrity Data Integration Data Flow and Workflow Enterprise Unstructured Informational External Data Sources Supplier Orders Product Promotions Customer Location Invoice ePOS Other Reference Architecture
  • 29. Metadata Management Security and Data Privacy System Management and Administration Network Connectivity, Protocols & Access Middleware Hardware & Software Platforms Web Browser Portals Devices (ex.: mobile) Web Services Access Collaboration BusinessApplications Query & Reporting Data Mining Modeling Scorecard Visualization Embedded Analytics Analytics Operational Data Stores Data Warehouse Data Marts Staging Areas Metadata Data Repositories Extraction Transformation Load / Apply Synchronization Transport / Messaging Information Integrity Data Integration Data Flow and Workflow Enterprise Unstructured Informational External Data Sources Supplier Orders Product Promotions Customer Location Invoice ePOS Other Transport / Messaging HCatalog – Hadoop metadata repository and management service that provides a centralized way for data processing systems to understand the structure and location of the data stored within Apache Hadoop. Extraction is an application used to transfer data, usually from relational databases to a flat file, which can then be use to transport to a landing are of a Data Warehouse and ingest into BI/DW environment. Reference Architecture Extraction Sqoop – is a command-line interface application for transferring data between relational databases and Hadoop. It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Exports can be used to put data from Hadoop into a relational database. Source Extract Target Source Target Sqoop Current BI Proposed BI sftp Database extract MapReduce – A framework for writing applications that processes large amounts of structured and unstructured data in parallel across large clusters of machines in a very reliable and fault-tolerant manner. Pig – A platform for processing and analyzing large data sets. Pig consists on a high-level language (Pig Latin) for expressing data analysis programs paired with the MapReduce framework for processing these programs. Transformation Landing Staging DW HDFS DM Current BI Proposed BI DM MapReduce/PigComplex ETL Complex ETL Complex ETL Load / Apply Staging DW DM Current BI Proposed BI DM Synchronization Synchronization – The ETL process takes source data from staging, transforms using business rules and loads into central repository DW. In this scenario, in order to retain information integrity, one has to put in place a synchronization checks & correction mechanism. HDFS as a Single Source – In the proposed solution HDFS acts as a single source of data so there is no danger of desinhronization. The inconsistencies resulted from duplicated or inconsistent data will be reconciled with assistance of HCatalog and proper data governance. Staging DW Landing Synchronization Source DM HDFSSource DM Information Integrity Current – Currently there is no special approach to the data quality other than imbedded into the ETL processes and logic. There are tools and approaches to implement QA & QC. Hadoop – More focused approach - While we use HDFS as a one big “Data Lake” QA and QC will be applied at the Data Mart Level where the actual transformations will occur, hence reducing the overall effort. QA & QC will be an integral part of Data Governance and augmented by usage of HCatalog.
  • 30. Metadata Management Security and Data Privacy System Management and Administration Network Connectivity, Protocols & Access Middleware Hardware & Software Platforms Web Browser Portals Devices (ex.: mobile) Web Services Access Collaboration BusinessApplications Query & Reporting Data Mining Modeling Scorecard Visualization Embedded Analytics Analytics Operational Data Stores Data Warehouse Data Marts Staging Areas Metadata Data Repositories Extraction Transformation Load / Apply Synchronization Transport / Messaging Information Integrity Data Integration Data Flow and Workflow Enterprise Unstructured Informational External Data Sources Supplier Orders Product Promotions Customer Location Invoice ePOS Other Data Repositories Operational Data Stores Data Warehouse Data Marts Staging Areas Metadata HDFS HCatalog HCatalog Metadata Management HCatalog – A Hadoop metadata repository and management service that provides a centralized way for data processing systems to understand the structure and location of the data stored within Apache Hadoop. Reference Architecture Hadoop Distributed File System (HDFS) – A reliable and distributed Java-based file system that allows large volumes of data to be stored and rapidly accessed across large clusters of commodity servers
  • 31. HCatalog Metadata Management Security and Data Privacy System Management and Administration Network Connectivity, Protocols & Access Middleware Hardware & Software Platforms Web Browser Portals Devices (ex.: mobile) Web Services Access Collaboration BusinessApplications Query & Reporting Data Mining Modeling Scorecard Visualization Embedded Analytics Analytics Data Flow and Workflow Enterprise Unstructured Informational External Data Sources Supplier Orders Product Promotions Customer Location Invoice ePOS Other HDFS Analytical Data Marts HCatalog Data Repositories Sqoop MapReduce/PIG Load / Apply Single Source HCatalog & Pig Can work with Informatica Data Integration Transport / Messaging Reference Architecture
  • 32. Capability Current BI Proposed BI Expected Change Data Sources Source Applications Source Applications No Data Integration Extraction from Source DB Export Sqoop On-to-one change Transport/Messaging SFTP SFTP No Staging Area Transformations/Load Complex ETL Code None required eliminated Extract from Staging Complex ETL Code None required eliminated Transformation for DW Complex ETL Code None required eliminated Load to DW Complex ETL, RDBMS None required eliminated Extract from from DW, Transformation and load to DM Complex ETL code & process to feed DM MapReduce/Pig simplified transformations from HDFS to DM Data Quality , Balance & Controls mbedded ETL Code MapReduce/Pig in conjunction with HCatalog; Can also coexist with Informatica Yes Reference Architecture
  • 34. Map Operation MAP: Input data  <key, value> pair Data Collection: split1 web 1 weed 1 green 1 sun 1 moon 1 land 1 part 1 web 1 green 1 … 1 KEY VALUE Split the data to Supply multiple processors Data Collection: split 2 Data Collection: split n Map …… Map 34 web 1 weed 1 green 1 sun 1 moon 1 land 1 part 1 web 1 green 1 … 1 KEY VALUE web 1 weed 1 green 1 sun 1 moon 1 land 1 part 1 web 1 green 1 … 1 KEY VALUE web 1 weed 1 green 1 sun 1 moon 1 land 1 part 1 web 1 green 1 … 1 KEY VALUE web 1 weed 1 green 1 sun 1 moon 1 land 1 part 1 web 1 green 1 … 1 KEY VALUE …
  • 35. Reduce Reduce Reduce Reduce Operation MAP: Input data  <key, value> pair REDUCE: <key, value> pair  <result> Data Collection: split1 Split the data to Supply multiple processors Data Collection: split 2 Data Collection: split n Map Map …… Map 35 …
  • 37. CountCountCount Large scale data splits Parse-hash Parse-hash Parse-hash Parse-hash Map <key, 1> Reducers (say, Count) P-0000 P-0001 P-0002 , count1 , count2 ,count3 37
  • 38. Web References • “MapReduce: Simplified Data Processing on Large Clusters”, Jeffrey Dean and Sanjay Ghemawat, December 2004. http://labs.google.com/papers/mapreduce.html • “Scalable SQL”, ACM Queue, Michael Rys, April 19, 2011 http://queue.acm.org/detail.cfm?id=1971597 • “a practical guide to noSQL”, Posted by Denise Miura on March 17, 2011 at http://blogs.marklogic.com/2011/03/17/a-practical-guide-to-nosql/