SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Networking Issues
For Big Data
.

Raj Jain
Washington University in Saint Louis
Saint Louis, MO 63130
Jain@cse.wustl.edu
These slides and audio/video recordings of this class lecture are at:
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-1

©2013 Raj Jain
Overview
1.

Why, What, and How of Big Data:
It’s all because of advances in networking

2.

Recent Developments in Networking and their
role in Big Data (Virtualization, SDN, NFV)

3.

Networking needs Big Data

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-2

©2013 Raj Jain
Big Data Enabled by Networking
Big Data
Large
Storage

Fast
Computing
Cloud
Virtualization
Networking

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-3

©2013 Raj Jain
MapReduce
Software framework to process massive
amounts of unstructured data by distributing it over a
large number of inexpensive processors
 Map: Takes a set of data and divides it for
computation
 Reduce: Takes the output from Map outputs the
result
Shuffle


Input

Map

Reduce

Output

Reduce

Output

Map
Map

Ref: J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” OSDI 2004,
http://research.google.com/archive/mapreduce-osdi04.pdf
http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm
Washington University in St. Louis

11-4

©2013 Raj Jain
Hadoop
An open source implementation of MapReduce
 Named by Doug Cutting at Yahoo after his son’s
yellow plus elephant
 Hadoop File System (HDFS) requires data to be
broken into blocks. Each block is stored on 2 or more
data nodes on different racks.
 Name node: Manages the file system name space
 keeps track of blocks on various Data Nodes.


Name Space Block Map
Name Node

DN= Data Node

Replicate
B1

B2

B3 DN

B2

Washington University in St. Louis

B3

B4 DN

Write

B4

B2

B1 DN

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-5

B4

B3

B1 DN

©2013 Raj Jain
Hadoop (Cont)
Job Tracker: Assigns MapReduce jobs to task tracker
nodes that are close to the data (same rack)
 Task Tracker: Keep the work as close to the data as
possible.


Switch

Switch

Job Tracker

Name Node

DN+TT
DN+TT

DN+TT
DN+TT

Rack

Washington University in St. Louis

Rack

Switch
Sec. Job Tracker

DN+TT
DN+TT
Rack

Switch
Sec. NN
DN+TT
DN+TT
Rack

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-6

DN = Data Node
TT = Task Tracker
NN = Name Node

©2013 Raj Jain
Networking Requirements for Big Data
1.
2.
3.
4.

5.

Code/Data Collocation: The data for map jobs
should be at the processors that are going to map.
Elastic bandwidth: to match the variability of
volume
Fault/Error Handling: If a processor fails, its task
needs to be assigned to another processor.
Security: Access control (authorized users only),
privacy (encryption), threat detection, all in real-time
in a highly scalable manner
Synchronization: The map jobs should be
comparables so that they finish together. Similarly
reduce jobs should be comparable.

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-7

©2013 Raj Jain
Recent Developments in Networking
1.

2.
3.
4.

High-Speed: 100 Gbps Ethernet
 400 Gbps  1000 Gbps
 Cheap storage access. Easy to move big data.
Virtualization
Software Defined Networking
Network Function Virtualization

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-8

©2013 Raj Jain
Virtualization (Cont)


Recent networking technologies and standards allow:
1. Virtualizing Computation
2. Virtualizing Storage
3. Virtualizing Rack Storage Connectivity
4. Virtualizing Data Center Storage
5. Virtualizing Metro and Global Storage

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-9

©2013 Raj Jain
1. Virtualizing Computation
Subnet




Subnet

Subnet

Data
Center

Initially data centers consisted of multiple IP subnets
 Each subnet = One Ethernet Network
 Ethernet addresses are globally unique and do not change
 IP addresses are locators and change every time you move
 If a VM moves inside a subnet  No change to IP address
 Fast
 If a VM moves from one subnet to another  Its IP address
changes  All connections break  Slow  Limited VM
mobility
IEEE 802.1ad-2005 Ethernet Provider Bridging (PB), IEEE
802.1ah-2008 Provider Backbone Bridging (PBB) allow
Ethernets to span long distances  Global VM mobility

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-10

©2013 Raj Jain
2. Virtualizing Storage


Initially data centers used Storage Area Networks (Fibre
Channel) for server-to-storage communications and Ethernet
for server-to-server communication
Server



Storage

Ethernet Fabric


Fibre Channel Fabric

Server

IEEE added 4 new standards to make Ethernet offer low loss,
low latency service like Fibre Channel:
 Priority-based Flow Control (IEEE 802.1Qbb-2011)
 Enhanced Transmission Selection (IEEE 802.1Qaz-2011)
 Congestion Control (IEEE 802.1Qau-2010)
 Data Center Bridging Exchange (IEEE 802.1Qaz-2011)
Result: Unified networking  Significant CapEx/OpEx saving

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-11

©2013 Raj Jain
3. Virtualizing Rack Storage Connectivity
MapReduce jobs are assigned to the nodes that have
the data
 Job tracker assigns jobs to task trackers in the rack
where the data is.
 High-speed Ethernet can get the data in the same rack.
 Peripheral Connect Interface (PCI) Special Interest
Group (SIG)’s Single Root I/O virtualization (SRIOV) allows a storage to be virtualized and shared
among multiple VMs.
pM
…
VM
VM


PCIe
Washington University in St. Louis

V PCIe

…

V PCIe

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-12

©2013 Raj Jain
Multi-Root IOV
PCI-SIG Multi-Root I/O Virtualization
(MR-IOV) standard allows one or more PCIe cards to
serve multiple servers and VMs in the same rack
 Fewer adapters  Less cooling. No adapters
 Thinner servers


VM
pM

VM

…

VM
pM

VM

VM
pM

VM

PCIe Fabric
PCIe Card
Washington University in St. Louis

vPCIe

vPCIe

PCIe Card vPCIe

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-13

vPCIe
©2013 Raj Jain
4. Virtualizing Data Center Storage



IEEE 802.1BR-2012 Virtual Bridgeport Extension (VBE)
allows multiple switches to combine in to a very large switch
Storage and computers located anywhere in the data center
appear as if connected to the same switch

vSwitch

Parent Switch
vSwitch
…

Port Extender

Port Extender
VM
VM
…
VM

Washington University in St. Louis

Port Extender

Distributed
vSwitch

Storage
Storage
…
Storage

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-14

©2013 Raj Jain
5. Virtualizing Metro Storage


Data center Interconnection standards:
 Virtual Extensible LAN (VXLAN),
 Network Virtualization using GRE (NVGRE), and
 Transparent Interconnection of Lots of Link
(TRILL)
 data centers located far away to appear to be on
the same Ethernet
Data Center 1

Data Center 2

Ref: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-04, http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-03,
RFC 5556
http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm
Washington University in St. Louis
©2013 Raj Jain

11-15
Virtualizing the Global Storage



Energy Science Network (ESNet) uses virtual switch to
connect members located all over the world
Virtualization  Fluid networks  The world is flat  You
draw your network  Every thing is virtually local

Ref: I. Monga, “Software Defined Networking for Big-data Science,”
http://www.es.net/assets/pubs_presos/Monga-WAN-Switch-SC12SRS.pdf
http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm
Washington University in St. Louis

11-16

©2013 Raj Jain
Software Defined Networking
Controller

Policies
Network
Manager






Control

Centralized Programmable Control Plane
Allows automated orchestration (provisioning) of a
large number of virtual resources (machines,
networks, storage)
Large Hadoop topologies can be created on demand

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-17

©2013 Raj Jain
Network Function Virtualization (NFV)
q Fast standard hardware  Software based Devices
Virtual networking modules (DHCP, Firewall, DNS, …)
running on standard processors
 Modules can be combined to create any combination of
function for data privacy, access control, …
q Virtual Machine implementation  Quick provisioning
q Standard Application Programming Interfaces (APIs)
 Networking App Market
 Privacy and Security for Big data in the multi-tenant clouds
DHCP
Router =

NAT

VM
Forwarding

QoS
Washington University in St. Louis

VM
Hypervisor

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-18

VM

©2013 Raj Jain
Big Data for Networking
Today’s data center:
 Tomorrow:
 Tens of tenants
 1k of clients
 Hundreds of switches
 10k of pSwitches
and routers
 100k of vSwitches
 Thousands of servers
 1M of VMs
 Hundreds of
 Tens of Administrators
administrators
 Need to monitor traffic patterns and rearrange virtual
networks connecting millions of VMs in real-time
 Managing clouds is a real-time big data problem.
 Internet of things  Big Data generation and
analytics
http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm


Washington University in St. Louis

©2013 Raj Jain

11-19
Summary
1.

I/O virtualization allows all storage in the rack to appear local
to any VM in that rack  Solves the co-location problem of
MapReduce

2.

Network virtualization allows storage anywhere in the data
center or even other data centers to appear local

3.

Software defined networking allows orchestration of a large
number of resources  Dynamic creation of Hadoop clusters

4.

Network function virtualization will allow these clusters to
have special functions and security in multi-tenant clouds.

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-20

©2013 Raj Jain
Acronyms

















ADCOM
API
CapEx
DARPA
DHCP
DN
DNS
DoD
DOE
ESNet
GDP
GRE
HDFS
IEEE
IOV
IP

Advanced Computing and Communications
Application programming interface,
Capital Expenditure
Defense Advanced Project Research Agency
Dynamic Host Control Protocol
Data Node
Domain Name System
Department of Defense
Department of Energy
Energy Science Network
Gross Domestic Production
Generic Routing Encapsulation
Hadoop Distributed File System
Institution of Electrical and Electronic Engineers
I/O Virtualization
Internet Protocol

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-21

©2013 Raj Jain
Acronyms (Cont)

















LAN
MR-IOV
NAT
NFV
NN
NSA
OpEx
PB
PBB
PCI-SIG
PCI
PCIe
pM
pSwitches
QoS
RFC

Local Area Network
Multi-root I/O Vertualization
Network Address Translation
Network Function Virtualization
Name Node
National Security Agency
Operational Expences
Provider Bridging
Provider Backbone Bridging
PCI Special Interest Group
Peripheral Computer Interface
PCI Express
Physical Machine
Physical Switch
Quality of Service
Request for Comments

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-22

©2013 Raj Jain
Acronyms (Cont)










SDN
SR-IOV
TRILL
TT
USGS
VBE
VM
vSwitch
WAN

.Software Defined Networking
Single Root I/O Vertualization
Transparent Interconnection of Lots of Link
Task Tracker
United States Geological Survey
Virtual Bridgeport Extension
Virtual Machine
Virtual Switch
Wide-Area Network

Washington University in St. Louis

http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm

11-23

©2013 Raj Jain

Weitere ähnliche Inhalte

Andere mochten auch

End-to-End Data Center Virtualization
End-to-End Data Center VirtualizationEnd-to-End Data Center Virtualization
End-to-End Data Center VirtualizationCisco Canada
 
10 asp.net session14
10 asp.net session1410 asp.net session14
10 asp.net session14Vivek chan
 
A Deeper Look at Network Virtualization
A Deeper Look at Network VirtualizationA Deeper Look at Network Virtualization
A Deeper Look at Network VirtualizationScott Lowe
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachAndry Alamsyah
 
Big data and Social Media Analytics
Big data and Social Media AnalyticsBig data and Social Media Analytics
Big data and Social Media AnalyticsSimplify360
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedSlideShare
 

Andere mochten auch (9)

End-to-End Data Center Virtualization
End-to-End Data Center VirtualizationEnd-to-End Data Center Virtualization
End-to-End Data Center Virtualization
 
Silverlight Databinding
Silverlight DatabindingSilverlight Databinding
Silverlight Databinding
 
10 asp.net session14
10 asp.net session1410 asp.net session14
10 asp.net session14
 
Integrating
IntegratingIntegrating
Integrating
 
A Deeper Look at Network Virtualization
A Deeper Look at Network VirtualizationA Deeper Look at Network Virtualization
A Deeper Look at Network Virtualization
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big data and Social Media Analytics
Big data and Social Media AnalyticsBig data and Social Media Analytics
Big data and Social Media Analytics
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 

Ähnlich wie Networking Issues For Big Data

SDN and NFV: Facts, Extensions, and Carrier Opportunities
SDN and NFV: Facts, Extensions, and Carrier OpportunitiesSDN and NFV: Facts, Extensions, and Carrier Opportunities
SDN and NFV: Facts, Extensions, and Carrier Opportunitiesrjain51
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)rjain51
 
Network Virtualization in Cloud Data Centers
Network Virtualization in Cloud Data CentersNetwork Virtualization in Cloud Data Centers
Network Virtualization in Cloud Data Centersrjain51
 
Introduction to OpenFlow
Introduction to OpenFlowIntroduction to OpenFlow
Introduction to OpenFlowrjain51
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
Data Center Network Topologies
Data Center Network TopologiesData Center Network Topologies
Data Center Network Topologiesrjain51
 
Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd Iaetsd
 
Server Virtualization
Server VirtualizationServer Virtualization
Server Virtualizationrjain51
 
Carrier Ethernet
Carrier EthernetCarrier Ethernet
Carrier Ethernetrjain51
 
OpenFlow Controllers and Tools
OpenFlow Controllers and ToolsOpenFlow Controllers and Tools
OpenFlow Controllers and Toolsrjain51
 
Introduction to Network Function Virtualization (NFV)
Introduction to Network Function Virtualization (NFV)Introduction to Network Function Virtualization (NFV)
Introduction to Network Function Virtualization (NFV)rjain51
 
Internet of Things.pdf
Internet of Things.pdfInternet of Things.pdf
Internet of Things.pdfsateeshka
 
Case Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile DatabasesCase Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile DatabasesG. Habib Uddin Khan
 
Case Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile DatabasesCase Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile DatabasesG. Habib Uddin Khan
 
Ogce Workflow Suite
Ogce Workflow SuiteOgce Workflow Suite
Ogce Workflow Suitesmarru
 
The Impact on Security due to the Vulnerabilities Existing in the network a S...
The Impact on Security due to the Vulnerabilities Existing in the network a S...The Impact on Security due to the Vulnerabilities Existing in the network a S...
The Impact on Security due to the Vulnerabilities Existing in the network a S...IJAEMSJORNAL
 
ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...Ruo Ando
 
Web of Things (wiring web objects with Node-RED)
Web of Things (wiring web objects with Node-RED)Web of Things (wiring web objects with Node-RED)
Web of Things (wiring web objects with Node-RED)Francesco Collova'
 

Ähnlich wie Networking Issues For Big Data (20)

SDN and NFV: Facts, Extensions, and Carrier Opportunities
SDN and NFV: Facts, Extensions, and Carrier OpportunitiesSDN and NFV: Facts, Extensions, and Carrier Opportunities
SDN and NFV: Facts, Extensions, and Carrier Opportunities
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
Network Virtualization in Cloud Data Centers
Network Virtualization in Cloud Data CentersNetwork Virtualization in Cloud Data Centers
Network Virtualization in Cloud Data Centers
 
Introduction to OpenFlow
Introduction to OpenFlowIntroduction to OpenFlow
Introduction to OpenFlow
 
grid mining
grid mininggrid mining
grid mining
 
Data Center Network Topologies
Data Center Network TopologiesData Center Network Topologies
Data Center Network Topologies
 
Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)
 
Server Virtualization
Server VirtualizationServer Virtualization
Server Virtualization
 
Carrier Ethernet
Carrier EthernetCarrier Ethernet
Carrier Ethernet
 
OpenFlow Controllers and Tools
OpenFlow Controllers and ToolsOpenFlow Controllers and Tools
OpenFlow Controllers and Tools
 
Introduction to Network Function Virtualization (NFV)
Introduction to Network Function Virtualization (NFV)Introduction to Network Function Virtualization (NFV)
Introduction to Network Function Virtualization (NFV)
 
Internet of Things.pdf
Internet of Things.pdfInternet of Things.pdf
Internet of Things.pdf
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
Case Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile DatabasesCase Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile Databases
 
Case Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile DatabasesCase Study: Synchroniztion Issues in Mobile Databases
Case Study: Synchroniztion Issues in Mobile Databases
 
Ogce Workflow Suite
Ogce Workflow SuiteOgce Workflow Suite
Ogce Workflow Suite
 
The Impact on Security due to the Vulnerabilities Existing in the network a S...
The Impact on Security due to the Vulnerabilities Existing in the network a S...The Impact on Security due to the Vulnerabilities Existing in the network a S...
The Impact on Security due to the Vulnerabilities Existing in the network a S...
 
ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...
 
Web of Things (wiring web objects with Node-RED)
Web of Things (wiring web objects with Node-RED)Web of Things (wiring web objects with Node-RED)
Web of Things (wiring web objects with Node-RED)
 
LOD2 webinar series: Virtuoso by OpenLink Software
LOD2 webinar series: Virtuoso by OpenLink SoftwareLOD2 webinar series: Virtuoso by OpenLink Software
LOD2 webinar series: Virtuoso by OpenLink Software
 

Mehr von rjain51

Internet of Things: Challenges and Issues
Internet of Things: Challenges and IssuesInternet of Things: Challenges and Issues
Internet of Things: Challenges and Issuesrjain51
 
Introduction to Internet of Things
Introduction to Internet of ThingsIntroduction to Internet of Things
Introduction to Internet of Thingsrjain51
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentalsrjain51
 
Data Center Networks:Virtual Bridging
Data Center Networks:Virtual BridgingData Center Networks:Virtual Bridging
Data Center Networks:Virtual Bridgingrjain51
 
Application Delivery Networking
Application Delivery NetworkingApplication Delivery Networking
Application Delivery Networkingrjain51
 
Storage Virtualization
Storage VirtualizationStorage Virtualization
Storage Virtualizationrjain51
 
Data Center Ethernet
Data Center EthernetData Center Ethernet
Data Center Ethernetrjain51
 
Networking Protocols for Internet of Things
Networking Protocols for Internet of ThingsNetworking Protocols for Internet of Things
Networking Protocols for Internet of Thingsrjain51
 

Mehr von rjain51 (8)

Internet of Things: Challenges and Issues
Internet of Things: Challenges and IssuesInternet of Things: Challenges and Issues
Internet of Things: Challenges and Issues
 
Introduction to Internet of Things
Introduction to Internet of ThingsIntroduction to Internet of Things
Introduction to Internet of Things
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Data Center Networks:Virtual Bridging
Data Center Networks:Virtual BridgingData Center Networks:Virtual Bridging
Data Center Networks:Virtual Bridging
 
Application Delivery Networking
Application Delivery NetworkingApplication Delivery Networking
Application Delivery Networking
 
Storage Virtualization
Storage VirtualizationStorage Virtualization
Storage Virtualization
 
Data Center Ethernet
Data Center EthernetData Center Ethernet
Data Center Ethernet
 
Networking Protocols for Internet of Things
Networking Protocols for Internet of ThingsNetworking Protocols for Internet of Things
Networking Protocols for Internet of Things
 

Kürzlich hochgeladen

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Networking Issues For Big Data

  • 1. Networking Issues For Big Data . Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides and audio/video recordings of this class lecture are at: http://www.cse.wustl.edu/~jain/cse570-13/ Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-1 ©2013 Raj Jain
  • 2. Overview 1. Why, What, and How of Big Data: It’s all because of advances in networking 2. Recent Developments in Networking and their role in Big Data (Virtualization, SDN, NFV) 3. Networking needs Big Data Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-2 ©2013 Raj Jain
  • 3. Big Data Enabled by Networking Big Data Large Storage Fast Computing Cloud Virtualization Networking Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-3 ©2013 Raj Jain
  • 4. MapReduce Software framework to process massive amounts of unstructured data by distributing it over a large number of inexpensive processors  Map: Takes a set of data and divides it for computation  Reduce: Takes the output from Map outputs the result Shuffle  Input Map Reduce Output Reduce Output Map Map Ref: J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” OSDI 2004, http://research.google.com/archive/mapreduce-osdi04.pdf http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm Washington University in St. Louis 11-4 ©2013 Raj Jain
  • 5. Hadoop An open source implementation of MapReduce  Named by Doug Cutting at Yahoo after his son’s yellow plus elephant  Hadoop File System (HDFS) requires data to be broken into blocks. Each block is stored on 2 or more data nodes on different racks.  Name node: Manages the file system name space  keeps track of blocks on various Data Nodes.  Name Space Block Map Name Node DN= Data Node Replicate B1 B2 B3 DN B2 Washington University in St. Louis B3 B4 DN Write B4 B2 B1 DN http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-5 B4 B3 B1 DN ©2013 Raj Jain
  • 6. Hadoop (Cont) Job Tracker: Assigns MapReduce jobs to task tracker nodes that are close to the data (same rack)  Task Tracker: Keep the work as close to the data as possible.  Switch Switch Job Tracker Name Node DN+TT DN+TT DN+TT DN+TT Rack Washington University in St. Louis Rack Switch Sec. Job Tracker DN+TT DN+TT Rack Switch Sec. NN DN+TT DN+TT Rack http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-6 DN = Data Node TT = Task Tracker NN = Name Node ©2013 Raj Jain
  • 7. Networking Requirements for Big Data 1. 2. 3. 4. 5. Code/Data Collocation: The data for map jobs should be at the processors that are going to map. Elastic bandwidth: to match the variability of volume Fault/Error Handling: If a processor fails, its task needs to be assigned to another processor. Security: Access control (authorized users only), privacy (encryption), threat detection, all in real-time in a highly scalable manner Synchronization: The map jobs should be comparables so that they finish together. Similarly reduce jobs should be comparable. Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-7 ©2013 Raj Jain
  • 8. Recent Developments in Networking 1. 2. 3. 4. High-Speed: 100 Gbps Ethernet  400 Gbps  1000 Gbps  Cheap storage access. Easy to move big data. Virtualization Software Defined Networking Network Function Virtualization Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-8 ©2013 Raj Jain
  • 9. Virtualization (Cont)  Recent networking technologies and standards allow: 1. Virtualizing Computation 2. Virtualizing Storage 3. Virtualizing Rack Storage Connectivity 4. Virtualizing Data Center Storage 5. Virtualizing Metro and Global Storage Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-9 ©2013 Raj Jain
  • 10. 1. Virtualizing Computation Subnet   Subnet Subnet Data Center Initially data centers consisted of multiple IP subnets  Each subnet = One Ethernet Network  Ethernet addresses are globally unique and do not change  IP addresses are locators and change every time you move  If a VM moves inside a subnet  No change to IP address  Fast  If a VM moves from one subnet to another  Its IP address changes  All connections break  Slow  Limited VM mobility IEEE 802.1ad-2005 Ethernet Provider Bridging (PB), IEEE 802.1ah-2008 Provider Backbone Bridging (PBB) allow Ethernets to span long distances  Global VM mobility Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-10 ©2013 Raj Jain
  • 11. 2. Virtualizing Storage  Initially data centers used Storage Area Networks (Fibre Channel) for server-to-storage communications and Ethernet for server-to-server communication Server  Storage Ethernet Fabric  Fibre Channel Fabric Server IEEE added 4 new standards to make Ethernet offer low loss, low latency service like Fibre Channel:  Priority-based Flow Control (IEEE 802.1Qbb-2011)  Enhanced Transmission Selection (IEEE 802.1Qaz-2011)  Congestion Control (IEEE 802.1Qau-2010)  Data Center Bridging Exchange (IEEE 802.1Qaz-2011) Result: Unified networking  Significant CapEx/OpEx saving Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-11 ©2013 Raj Jain
  • 12. 3. Virtualizing Rack Storage Connectivity MapReduce jobs are assigned to the nodes that have the data  Job tracker assigns jobs to task trackers in the rack where the data is.  High-speed Ethernet can get the data in the same rack.  Peripheral Connect Interface (PCI) Special Interest Group (SIG)’s Single Root I/O virtualization (SRIOV) allows a storage to be virtualized and shared among multiple VMs. pM … VM VM  PCIe Washington University in St. Louis V PCIe … V PCIe http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-12 ©2013 Raj Jain
  • 13. Multi-Root IOV PCI-SIG Multi-Root I/O Virtualization (MR-IOV) standard allows one or more PCIe cards to serve multiple servers and VMs in the same rack  Fewer adapters  Less cooling. No adapters  Thinner servers  VM pM VM … VM pM VM VM pM VM PCIe Fabric PCIe Card Washington University in St. Louis vPCIe vPCIe PCIe Card vPCIe http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-13 vPCIe ©2013 Raj Jain
  • 14. 4. Virtualizing Data Center Storage   IEEE 802.1BR-2012 Virtual Bridgeport Extension (VBE) allows multiple switches to combine in to a very large switch Storage and computers located anywhere in the data center appear as if connected to the same switch vSwitch Parent Switch vSwitch … Port Extender Port Extender VM VM … VM Washington University in St. Louis Port Extender Distributed vSwitch Storage Storage … Storage http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-14 ©2013 Raj Jain
  • 15. 5. Virtualizing Metro Storage  Data center Interconnection standards:  Virtual Extensible LAN (VXLAN),  Network Virtualization using GRE (NVGRE), and  Transparent Interconnection of Lots of Link (TRILL)  data centers located far away to appear to be on the same Ethernet Data Center 1 Data Center 2 Ref: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-04, http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-03, RFC 5556 http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm Washington University in St. Louis ©2013 Raj Jain 11-15
  • 16. Virtualizing the Global Storage   Energy Science Network (ESNet) uses virtual switch to connect members located all over the world Virtualization  Fluid networks  The world is flat  You draw your network  Every thing is virtually local Ref: I. Monga, “Software Defined Networking for Big-data Science,” http://www.es.net/assets/pubs_presos/Monga-WAN-Switch-SC12SRS.pdf http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm Washington University in St. Louis 11-16 ©2013 Raj Jain
  • 17. Software Defined Networking Controller Policies Network Manager    Control Centralized Programmable Control Plane Allows automated orchestration (provisioning) of a large number of virtual resources (machines, networks, storage) Large Hadoop topologies can be created on demand Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-17 ©2013 Raj Jain
  • 18. Network Function Virtualization (NFV) q Fast standard hardware  Software based Devices Virtual networking modules (DHCP, Firewall, DNS, …) running on standard processors  Modules can be combined to create any combination of function for data privacy, access control, … q Virtual Machine implementation  Quick provisioning q Standard Application Programming Interfaces (APIs)  Networking App Market  Privacy and Security for Big data in the multi-tenant clouds DHCP Router = NAT VM Forwarding QoS Washington University in St. Louis VM Hypervisor http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-18 VM ©2013 Raj Jain
  • 19. Big Data for Networking Today’s data center:  Tomorrow:  Tens of tenants  1k of clients  Hundreds of switches  10k of pSwitches and routers  100k of vSwitches  Thousands of servers  1M of VMs  Hundreds of  Tens of Administrators administrators  Need to monitor traffic patterns and rearrange virtual networks connecting millions of VMs in real-time  Managing clouds is a real-time big data problem.  Internet of things  Big Data generation and analytics http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm  Washington University in St. Louis ©2013 Raj Jain 11-19
  • 20. Summary 1. I/O virtualization allows all storage in the rack to appear local to any VM in that rack  Solves the co-location problem of MapReduce 2. Network virtualization allows storage anywhere in the data center or even other data centers to appear local 3. Software defined networking allows orchestration of a large number of resources  Dynamic creation of Hadoop clusters 4. Network function virtualization will allow these clusters to have special functions and security in multi-tenant clouds. Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-20 ©2013 Raj Jain
  • 21. Acronyms                 ADCOM API CapEx DARPA DHCP DN DNS DoD DOE ESNet GDP GRE HDFS IEEE IOV IP Advanced Computing and Communications Application programming interface, Capital Expenditure Defense Advanced Project Research Agency Dynamic Host Control Protocol Data Node Domain Name System Department of Defense Department of Energy Energy Science Network Gross Domestic Production Generic Routing Encapsulation Hadoop Distributed File System Institution of Electrical and Electronic Engineers I/O Virtualization Internet Protocol Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-21 ©2013 Raj Jain
  • 22. Acronyms (Cont)                 LAN MR-IOV NAT NFV NN NSA OpEx PB PBB PCI-SIG PCI PCIe pM pSwitches QoS RFC Local Area Network Multi-root I/O Vertualization Network Address Translation Network Function Virtualization Name Node National Security Agency Operational Expences Provider Bridging Provider Backbone Bridging PCI Special Interest Group Peripheral Computer Interface PCI Express Physical Machine Physical Switch Quality of Service Request for Comments Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-22 ©2013 Raj Jain
  • 23. Acronyms (Cont)          SDN SR-IOV TRILL TT USGS VBE VM vSwitch WAN .Software Defined Networking Single Root I/O Vertualization Transparent Interconnection of Lots of Link Task Tracker United States Geological Survey Virtual Bridgeport Extension Virtual Machine Virtual Switch Wide-Area Network Washington University in St. Louis http://www.cse.wustl.edu/~jain/talks/m_11nbd.htm 11-23 ©2013 Raj Jain