SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Experiences in providing
secure multi-tenant Lustre
access to OpenStack
Peter Clapham <pc7@sanger.ac.uk>
Wellcome Sanger Institute
Sanger Science
Scientific Research
Programmes
Core Facilities
HPC and Cloud computing are
complementary
Traditional HPC
● Highest possible performance
● A mature and centrally
managed compute platform
● High performance Lustre
filesystems for data intensive
analysis
Flexible Compute
● Full segregation of projects
ensures data security
● Developers no longer tied to a
single stack
● Reproducibility through
containers / images and
infrastructure-as-code
But there’s a catch or two...
• Large number of traditional/legacy pipelines
• They require a performant shared POSIX filesystem, while cloud
workloads support object stores
• We do not always have the source code or expertise to migrate
• We need multi-gigabyte per second performance
• The tenant will have root
• and could impersonate any user, but Lustre trusts the client’s identity
assertions, just like NFSv3
• The solution must be simple for the tenant and administrator
Lustre hardware
6+ year old hardware
• 4x Lustre object storage servers
• Dual Intel E5620 @ 2.40GHz
• 256GB RAM
• Dual 10G network
• lustre: 2.9.0.ddnsec2
• https://jira.hpdd.intel.com/browse/LU-9289 (landed
in 2.10)
• OSTs from DDN SFA-10k
• 300x SATA, 7200rpm , 1TB spindles
We have seen this system reach 6 GByte/second in production
Lustre 2.9 features
• Each tenant’s I/O can be squashed to their own unique UID/GID
• Each tenant is restricted to their own subdirectory of the Lustre
filesystem
It might be possible to treat general access outside of OpenStack as a
separate tenant with:
• a UID space reserved for a number of OpenStack tenants
• only a subdirectory exported for standard usage
Logical layout Public network
tcp32769 Provider Network
Lustre router
Lustre serverstcp0
Tenant network
tcp32770 Provider Network
Tenant network
Lustre server
Per-tenant UID mapping
Allows UIDs from a set of NIDs to be mapped to another set of UIDs
These commands are run on the MGS:
lctl nodemap_add ${TENANT_NAME}
lctl nodemap_modify --name ${TENANT_NAME} --property trusted --value 0
lctl nodemap_modify --name ${TENANT_NAME} --property admin --value 0
lctl nodemap_modify --name ${TENANT_NAME} --property squash_uid --value
${TENANT_UID}
lctl nodemap_modify --name ${TENANT_NAME} --property squash_gid --value
${TENANT_UID}
lctl nodemap_add_idmap --name ${TENANT_NAME} --idtype uid --idmap
1000:${TENANT_UID}
Lustre server:
Per-tenant subtree restriction
Constraints client access to a subdirectory of a filesystem.
mkdir /lustre/secure/${TENANT_NAME}
chown ${TENANT_NAME} /lustre/secure/${TENANT_NAME}
Set the subtree root directory for the tenant:
lctl set_param -P
nodemap.${TENANT_NAME}.fileset=/${TENANT_NAME}
Lustre server:
Map nodemap to network
Add the tenant network range to the Lustre nodemap
lctl nodemap_add_range --name ${TENANT_NAME} --range  [0-
255].[0-255].[0-255].[0-255]@tcp${TENANT_UID}
And this command adds a route via a Lustre network router. This is run on all
MDS and OSS (or the route added to /etc/modprobe.d/lustre.conf)
lnetctl route add --net tcp${TENANT_UID} --gateway
${LUSTRE_ROUTER_IP}@tcp
In the same way a similar command is needed on each client using TCP
Openstack:
Network configuration
neutron net-create LNet-1 --shared --provider:network_type vlan 
--provider:physical_network datacentre --provider:segmentation_id 
${TENANT_PROVIDER_VLAN_ID}
neutron subnet-create --enable-dhcp --dns-nameserver 172.18.255.1 --
no-gateway 
--name LNet-subnet-1 --allocation-pool
start=172.27.202.17,end=172.27.203.240  172.27.202.0/23
${NETWORK_UUID}
openstack role create LNet-1_ok
Squashing roles continued
For each tenant user that needs to create instances attached to this Lustre
network:
openstack role add --project ${TENANT_UUID} --user
${USER_UUID} ${ROLE_ID}
BUT as the openstack cli and server tools mature, we can now use
RBAC:
openstack network rbac create --type network --action
access_as_shared --target-project PROJECT_UUID NETWORK_UUID
Openstack policy
Simplify automation by minimal change to /etc/neutron/policy.json
"get_network": "rule:get_network_local"
/etc/neutron/policy.d/get_networks_local.jsonthen defines the
new rule:
{
"get_network_local": "rule:admin_or_owner or rule:external
or rule:context_is_advsvc or rule:show_providers or ( not
rule:provider_networks and rule:shared )"
}
Openstack policy/etc/neutron/policy.d/provider.json is used to define networks
and their mapping to roles, and allow access to the provider network.
{
"net_LNet-1": "field:networks:id=d18f2aca-163b-4fc7-a493-
237e383c1aa9",
"show_LNet-1": "rule:net_LNet-1 and role:LNet-1_ok",
"net_LNet-2": "field:networks:id=169b54c9-4292-478b-ac72-
272725a26263",
"show_LNet-2": "rule:net_LNet-2 and role:LNet-2_ok",
"provider_networks": "rule:net_LNet-1 or rule:net_LNet-2",
"show_providers": "rule:show_LNet-1 or rule:show_LNet-2"
}
Restart Neutron - can be disruptive!
1.75 pt
Physical router
configuration
• Repurposed Nova compute node
• RedHat 7.3
• Lustre 2.9.0.ddnsec2
• Mellanox ConnectX-4 (2*25GbE)
• Dual Intel E5-2690 v4 @ 2.60GHz
• 512 GB RAM
Connected in a single rack so packets from other racks will have to
transverse the spine. No changes from default settings.
Client virtual machines
• 2 CPU
• 4 GB RAM
• CentOS Linux release 7.3.1611 (Core)
• Lustre: 2.9.0.ddnsec2
• Two NICs
• Tenant network
• Tenant-specific Lustre provider network
Single client read
performance
Filesets and uid mapping have no effect
Instance size has little effect
Single client write
performance
Multiple VMs, aggregate
write performance, metal
LNet routers
Multiple VMs, aggregate
read performance, metal
LNet routers
Virtualised Lustre routers
• We could see that bare metal Lustre routers gave acceptable
performance
• We wanted to know if we could virtualise these routers
• Each tenant could have their own set of virtual routers
• Fault isolation
• Ease of provisioning routers
• No additional cost
• Increases east-west traffic, but that’s OK.
Logical layout
Public network
Tenant network
tcp32769 Provider Network
Lustre router
Lustre servers
tcp0
Provider Network
Tenant network
Lustre router
tcp32770 Provider Network
Improved security
As each tenant has its own set of Lustre routers:
• The traffic to a different tenant does not go to a shared router
• A Lustre router could be compromised without directly compromising
another tenant’s data - the filesystem servers will not route data for
@tcp1 to the router @tcp2
• Either a second Lustre router or the Lustre servers would need to be
compromised to intercept or reroute the data
Port security...
The routed Lustre provider network ( tcp32769 etc) required that port security
was disabled on the virtual Lustre router ports.
neutron port-list | grep 172.27.70.36 | awk '{print $2}'
08a1808a-fe4a-463c-b755-397aedd0b36c
neutron port-update --no-security-groups 08a1808a-fe4a-463c-
b755-397aedd0b36c
neutron port-update 08a1808a-fe4a-463c-b755-397aedd0b36c  --
port-security-enabled=False
http://kimizhang.com/neutron-ml2-port-security/
But ! this was due to a race condition in Liberty release. We can now avoid
this by adding the lustre provider network interface when the instance is
created
Virtual Lnet routers:
Sequential performance
Virtual Lnet routers:
Random Performance
Asymmetric routing?
Public network
Lustre routers
Lustre servers
tcp0
Tenant network
tcp32770 Provider Network
Hostile system
(e.g. laptop)
tcp0
tcp0
http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html
Conclusions
• Follow our activities on http://hpc-news.sanger.ac.uk
• Isolated POSIX islands can be deployed to OpenStack with Lustre
2.9+
• Performance is acceptable
• Lustre routers require little CPU and memory
• Physical routers work and can give good locality for network usage
• Virtual routers work, can scale and give additional security benefits
• Next steps:
• Improve configuration automation
• Understand the neutron port security issue
• Improve network performance (MTU, OpenVSwitch etc).
Acknowledgements
DDN: Sébastien Buisson, Thomas Favre-Bulle, Richard Mansfield, James Coomer
Sanger Informatics Systems Group: Pete Clapham, James Beal, John Constable, Helen Cousins,
Brett Hartley, Dave Holland, Jon Nicholson, Matthew Vernon

Weitere ähnliche Inhalte

Was ist angesagt?

Open daylight and Openstack
Open daylight and OpenstackOpen daylight and Openstack
Open daylight and OpenstackDave Neary
 
L3HA-VRRP-20141201
L3HA-VRRP-20141201L3HA-VRRP-20141201
L3HA-VRRP-20141201Manabu Ori
 
OpenStack networking (Neutron)
OpenStack networking (Neutron) OpenStack networking (Neutron)
OpenStack networking (Neutron) CREATE-NET
 
Introduction to Software Defined Networking and OpenStack Neutron
Introduction to Software Defined Networking and OpenStack NeutronIntroduction to Software Defined Networking and OpenStack Neutron
Introduction to Software Defined Networking and OpenStack NeutronSana Khan
 
Neutron behind the scenes
Neutron   behind the scenesNeutron   behind the scenes
Neutron behind the scenesinbroker
 
OpenStack Networking and Automation
OpenStack Networking and AutomationOpenStack Networking and Automation
OpenStack Networking and AutomationAdam Johnson
 
OpenStack Neutron Tutorial
OpenStack Neutron TutorialOpenStack Neutron Tutorial
OpenStack Neutron Tutorialmestery
 
Open stack Architecture and Use Cases
Open stack Architecture and Use CasesOpen stack Architecture and Use Cases
Open stack Architecture and Use CasesAhmad Tfaily
 
NetDevOps 202: Life After Configuration
NetDevOps 202: Life After ConfigurationNetDevOps 202: Life After Configuration
NetDevOps 202: Life After ConfigurationCumulus Networks
 
OpenStack DVR_What is DVR?
OpenStack DVR_What is DVR?OpenStack DVR_What is DVR?
OpenStack DVR_What is DVR?Yongyoon Shin
 
OpenStack Neutron Havana Overview - Oct 2013
OpenStack Neutron Havana Overview - Oct 2013OpenStack Neutron Havana Overview - Oct 2013
OpenStack Neutron Havana Overview - Oct 2013Edgar Magana
 
OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석Yongyoon Shin
 
MICRO ROTOR ENHANCED BLOCK CIPHER DESIGNED FOR EIGHT BITS MICRO-CONTROLLERS (...
MICRO ROTOR ENHANCED BLOCK CIPHER DESIGNED FOR EIGHT BITS MICRO-CONTROLLERS (...MICRO ROTOR ENHANCED BLOCK CIPHER DESIGNED FOR EIGHT BITS MICRO-CONTROLLERS (...
MICRO ROTOR ENHANCED BLOCK CIPHER DESIGNED FOR EIGHT BITS MICRO-CONTROLLERS (...IJNSA Journal
 
How deep is your buffer – Demystifying buffers and application performance
How deep is your buffer – Demystifying buffers and application performanceHow deep is your buffer – Demystifying buffers and application performance
How deep is your buffer – Demystifying buffers and application performanceCumulus Networks
 
Open stack networking_101_update_2014
Open stack networking_101_update_2014Open stack networking_101_update_2014
Open stack networking_101_update_2014yfauser
 
Openstack Basic with Neutron
Openstack Basic with NeutronOpenstack Basic with Neutron
Openstack Basic with NeutronKwonSun Bae
 

Was ist angesagt? (19)

Open daylight and Openstack
Open daylight and OpenstackOpen daylight and Openstack
Open daylight and Openstack
 
L3HA-VRRP-20141201
L3HA-VRRP-20141201L3HA-VRRP-20141201
L3HA-VRRP-20141201
 
OpenStack networking (Neutron)
OpenStack networking (Neutron) OpenStack networking (Neutron)
OpenStack networking (Neutron)
 
Introduction to Software Defined Networking and OpenStack Neutron
Introduction to Software Defined Networking and OpenStack NeutronIntroduction to Software Defined Networking and OpenStack Neutron
Introduction to Software Defined Networking and OpenStack Neutron
 
Mininet demo
Mininet demoMininet demo
Mininet demo
 
Neutron behind the scenes
Neutron   behind the scenesNeutron   behind the scenes
Neutron behind the scenes
 
Quic illustrated
Quic illustratedQuic illustrated
Quic illustrated
 
OpenStack Networking and Automation
OpenStack Networking and AutomationOpenStack Networking and Automation
OpenStack Networking and Automation
 
OpenStack Neutron Tutorial
OpenStack Neutron TutorialOpenStack Neutron Tutorial
OpenStack Neutron Tutorial
 
Open stack Architecture and Use Cases
Open stack Architecture and Use CasesOpen stack Architecture and Use Cases
Open stack Architecture and Use Cases
 
NetDevOps 202: Life After Configuration
NetDevOps 202: Life After ConfigurationNetDevOps 202: Life After Configuration
NetDevOps 202: Life After Configuration
 
OpenStack DVR_What is DVR?
OpenStack DVR_What is DVR?OpenStack DVR_What is DVR?
OpenStack DVR_What is DVR?
 
OpenStack Neutron Havana Overview - Oct 2013
OpenStack Neutron Havana Overview - Oct 2013OpenStack Neutron Havana Overview - Oct 2013
OpenStack Neutron Havana Overview - Oct 2013
 
OpenStack Neutron behind the Scenes
OpenStack Neutron behind the ScenesOpenStack Neutron behind the Scenes
OpenStack Neutron behind the Scenes
 
OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석OpenStack networking-sfc flow 분석
OpenStack networking-sfc flow 분석
 
MICRO ROTOR ENHANCED BLOCK CIPHER DESIGNED FOR EIGHT BITS MICRO-CONTROLLERS (...
MICRO ROTOR ENHANCED BLOCK CIPHER DESIGNED FOR EIGHT BITS MICRO-CONTROLLERS (...MICRO ROTOR ENHANCED BLOCK CIPHER DESIGNED FOR EIGHT BITS MICRO-CONTROLLERS (...
MICRO ROTOR ENHANCED BLOCK CIPHER DESIGNED FOR EIGHT BITS MICRO-CONTROLLERS (...
 
How deep is your buffer – Demystifying buffers and application performance
How deep is your buffer – Demystifying buffers and application performanceHow deep is your buffer – Demystifying buffers and application performance
How deep is your buffer – Demystifying buffers and application performance
 
Open stack networking_101_update_2014
Open stack networking_101_update_2014Open stack networking_101_update_2014
Open stack networking_101_update_2014
 
Openstack Basic with Neutron
Openstack Basic with NeutronOpenstack Basic with Neutron
Openstack Basic with Neutron
 

Ähnlich wie Enabling a Secure Multi-Tenant Environment for HPC

Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean Winn
Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean WinnCouch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean Winn
Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean WinnTrevor Roberts Jr.
 
Private cloud networking_cloudstack_days_austin
Private cloud networking_cloudstack_days_austinPrivate cloud networking_cloudstack_days_austin
Private cloud networking_cloudstack_days_austinChiradeep Vittal
 
DevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network ArchitectDevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network ArchitectJames Denton
 
(ARC401) Black-Belt Networking for the Cloud Ninja | AWS re:Invent 2014
(ARC401) Black-Belt Networking for the Cloud Ninja | AWS re:Invent 2014(ARC401) Black-Belt Networking for the Cloud Ninja | AWS re:Invent 2014
(ARC401) Black-Belt Networking for the Cloud Ninja | AWS re:Invent 2014Amazon Web Services
 
slides-frnog34.pdf
slides-frnog34.pdfslides-frnog34.pdf
slides-frnog34.pdfjnbrains
 
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesGDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesJames Anderson
 
D108636GC10_les01.pptx
D108636GC10_les01.pptxD108636GC10_les01.pptx
D108636GC10_les01.pptxSuresh569521
 
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDNOpenStack Korea Community
 
20151222_Interoperability with ML2: LinuxBridge, OVS and SDN
20151222_Interoperability with ML2: LinuxBridge, OVS and SDN20151222_Interoperability with ML2: LinuxBridge, OVS and SDN
20151222_Interoperability with ML2: LinuxBridge, OVS and SDNSungman Jang
 
Introduction to netlink in linux kernel (english)
Introduction to netlink in linux kernel (english)Introduction to netlink in linux kernel (english)
Introduction to netlink in linux kernel (english)Sneeker Yeh
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
TakeDownCon Rocket City: Bending and Twisting Networks by Paul Coggin
TakeDownCon Rocket City: Bending and Twisting Networks by Paul CogginTakeDownCon Rocket City: Bending and Twisting Networks by Paul Coggin
TakeDownCon Rocket City: Bending and Twisting Networks by Paul CogginEC-Council
 
Dockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and NovaDockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and Novaclayton_oneill
 

Ähnlich wie Enabling a Secure Multi-Tenant Environment for HPC (20)

Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean Winn
Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean WinnCouch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean Winn
Couch to OpenStack: Neutron (Quantum) - August 13, 2013 Featuring Sean Winn
 
Neutron DVR
Neutron DVRNeutron DVR
Neutron DVR
 
Private cloud networking_cloudstack_days_austin
Private cloud networking_cloudstack_days_austinPrivate cloud networking_cloudstack_days_austin
Private cloud networking_cloudstack_days_austin
 
nested-kvm
nested-kvmnested-kvm
nested-kvm
 
DevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network ArchitectDevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network Architect
 
Neutron qos overview
Neutron qos overviewNeutron qos overview
Neutron qos overview
 
(ARC401) Black-Belt Networking for the Cloud Ninja | AWS re:Invent 2014
(ARC401) Black-Belt Networking for the Cloud Ninja | AWS re:Invent 2014(ARC401) Black-Belt Networking for the Cloud Ninja | AWS re:Invent 2014
(ARC401) Black-Belt Networking for the Cloud Ninja | AWS re:Invent 2014
 
10 sdn-vir-6up
10 sdn-vir-6up10 sdn-vir-6up
10 sdn-vir-6up
 
slides-frnog34.pdf
slides-frnog34.pdfslides-frnog34.pdf
slides-frnog34.pdf
 
Mikro tik
Mikro tikMikro tik
Mikro tik
 
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud BoundariesGDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
GDG Cloud Southlake #9 Secure Cloud Networking - Beyond Cloud Boundaries
 
D108636GC10_les01.pptx
D108636GC10_les01.pptxD108636GC10_les01.pptx
D108636GC10_les01.pptx
 
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
[OpenStack 하반기 스터디] Interoperability with ML2: LinuxBridge, OVS and SDN
 
20151222_Interoperability with ML2: LinuxBridge, OVS and SDN
20151222_Interoperability with ML2: LinuxBridge, OVS and SDN20151222_Interoperability with ML2: LinuxBridge, OVS and SDN
20151222_Interoperability with ML2: LinuxBridge, OVS and SDN
 
Introduction to netlink in linux kernel (english)
Introduction to netlink in linux kernel (english)Introduction to netlink in linux kernel (english)
Introduction to netlink in linux kernel (english)
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
SDN approach.pptx
SDN approach.pptxSDN approach.pptx
SDN approach.pptx
 
OpenStack sdn
OpenStack sdnOpenStack sdn
OpenStack sdn
 
TakeDownCon Rocket City: Bending and Twisting Networks by Paul Coggin
TakeDownCon Rocket City: Bending and Twisting Networks by Paul CogginTakeDownCon Rocket City: Bending and Twisting Networks by Paul Coggin
TakeDownCon Rocket City: Bending and Twisting Networks by Paul Coggin
 
Dockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and NovaDockerizing the Hard Services: Neutron and Nova
Dockerizing the Hard Services: Neutron and Nova
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Enabling a Secure Multi-Tenant Environment for HPC

  • 1. Experiences in providing secure multi-tenant Lustre access to OpenStack Peter Clapham <pc7@sanger.ac.uk> Wellcome Sanger Institute
  • 3. HPC and Cloud computing are complementary Traditional HPC ● Highest possible performance ● A mature and centrally managed compute platform ● High performance Lustre filesystems for data intensive analysis Flexible Compute ● Full segregation of projects ensures data security ● Developers no longer tied to a single stack ● Reproducibility through containers / images and infrastructure-as-code
  • 4. But there’s a catch or two... • Large number of traditional/legacy pipelines • They require a performant shared POSIX filesystem, while cloud workloads support object stores • We do not always have the source code or expertise to migrate • We need multi-gigabyte per second performance • The tenant will have root • and could impersonate any user, but Lustre trusts the client’s identity assertions, just like NFSv3 • The solution must be simple for the tenant and administrator
  • 5. Lustre hardware 6+ year old hardware • 4x Lustre object storage servers • Dual Intel E5620 @ 2.40GHz • 256GB RAM • Dual 10G network • lustre: 2.9.0.ddnsec2 • https://jira.hpdd.intel.com/browse/LU-9289 (landed in 2.10) • OSTs from DDN SFA-10k • 300x SATA, 7200rpm , 1TB spindles We have seen this system reach 6 GByte/second in production
  • 6.
  • 7. Lustre 2.9 features • Each tenant’s I/O can be squashed to their own unique UID/GID • Each tenant is restricted to their own subdirectory of the Lustre filesystem It might be possible to treat general access outside of OpenStack as a separate tenant with: • a UID space reserved for a number of OpenStack tenants • only a subdirectory exported for standard usage
  • 8. Logical layout Public network tcp32769 Provider Network Lustre router Lustre serverstcp0 Tenant network tcp32770 Provider Network Tenant network
  • 9. Lustre server Per-tenant UID mapping Allows UIDs from a set of NIDs to be mapped to another set of UIDs These commands are run on the MGS: lctl nodemap_add ${TENANT_NAME} lctl nodemap_modify --name ${TENANT_NAME} --property trusted --value 0 lctl nodemap_modify --name ${TENANT_NAME} --property admin --value 0 lctl nodemap_modify --name ${TENANT_NAME} --property squash_uid --value ${TENANT_UID} lctl nodemap_modify --name ${TENANT_NAME} --property squash_gid --value ${TENANT_UID} lctl nodemap_add_idmap --name ${TENANT_NAME} --idtype uid --idmap 1000:${TENANT_UID}
  • 10. Lustre server: Per-tenant subtree restriction Constraints client access to a subdirectory of a filesystem. mkdir /lustre/secure/${TENANT_NAME} chown ${TENANT_NAME} /lustre/secure/${TENANT_NAME} Set the subtree root directory for the tenant: lctl set_param -P nodemap.${TENANT_NAME}.fileset=/${TENANT_NAME}
  • 11. Lustre server: Map nodemap to network Add the tenant network range to the Lustre nodemap lctl nodemap_add_range --name ${TENANT_NAME} --range [0- 255].[0-255].[0-255].[0-255]@tcp${TENANT_UID} And this command adds a route via a Lustre network router. This is run on all MDS and OSS (or the route added to /etc/modprobe.d/lustre.conf) lnetctl route add --net tcp${TENANT_UID} --gateway ${LUSTRE_ROUTER_IP}@tcp In the same way a similar command is needed on each client using TCP
  • 12. Openstack: Network configuration neutron net-create LNet-1 --shared --provider:network_type vlan --provider:physical_network datacentre --provider:segmentation_id ${TENANT_PROVIDER_VLAN_ID} neutron subnet-create --enable-dhcp --dns-nameserver 172.18.255.1 -- no-gateway --name LNet-subnet-1 --allocation-pool start=172.27.202.17,end=172.27.203.240 172.27.202.0/23 ${NETWORK_UUID} openstack role create LNet-1_ok
  • 13. Squashing roles continued For each tenant user that needs to create instances attached to this Lustre network: openstack role add --project ${TENANT_UUID} --user ${USER_UUID} ${ROLE_ID} BUT as the openstack cli and server tools mature, we can now use RBAC: openstack network rbac create --type network --action access_as_shared --target-project PROJECT_UUID NETWORK_UUID
  • 14. Openstack policy Simplify automation by minimal change to /etc/neutron/policy.json "get_network": "rule:get_network_local" /etc/neutron/policy.d/get_networks_local.jsonthen defines the new rule: { "get_network_local": "rule:admin_or_owner or rule:external or rule:context_is_advsvc or rule:show_providers or ( not rule:provider_networks and rule:shared )" }
  • 15. Openstack policy/etc/neutron/policy.d/provider.json is used to define networks and their mapping to roles, and allow access to the provider network. { "net_LNet-1": "field:networks:id=d18f2aca-163b-4fc7-a493- 237e383c1aa9", "show_LNet-1": "rule:net_LNet-1 and role:LNet-1_ok", "net_LNet-2": "field:networks:id=169b54c9-4292-478b-ac72- 272725a26263", "show_LNet-2": "rule:net_LNet-2 and role:LNet-2_ok", "provider_networks": "rule:net_LNet-1 or rule:net_LNet-2", "show_providers": "rule:show_LNet-1 or rule:show_LNet-2" } Restart Neutron - can be disruptive! 1.75 pt
  • 16. Physical router configuration • Repurposed Nova compute node • RedHat 7.3 • Lustre 2.9.0.ddnsec2 • Mellanox ConnectX-4 (2*25GbE) • Dual Intel E5-2690 v4 @ 2.60GHz • 512 GB RAM Connected in a single rack so packets from other racks will have to transverse the spine. No changes from default settings.
  • 17. Client virtual machines • 2 CPU • 4 GB RAM • CentOS Linux release 7.3.1611 (Core) • Lustre: 2.9.0.ddnsec2 • Two NICs • Tenant network • Tenant-specific Lustre provider network
  • 18. Single client read performance Filesets and uid mapping have no effect Instance size has little effect
  • 20. Multiple VMs, aggregate write performance, metal LNet routers
  • 21. Multiple VMs, aggregate read performance, metal LNet routers
  • 22. Virtualised Lustre routers • We could see that bare metal Lustre routers gave acceptable performance • We wanted to know if we could virtualise these routers • Each tenant could have their own set of virtual routers • Fault isolation • Ease of provisioning routers • No additional cost • Increases east-west traffic, but that’s OK.
  • 23. Logical layout Public network Tenant network tcp32769 Provider Network Lustre router Lustre servers tcp0 Provider Network Tenant network Lustre router tcp32770 Provider Network
  • 24. Improved security As each tenant has its own set of Lustre routers: • The traffic to a different tenant does not go to a shared router • A Lustre router could be compromised without directly compromising another tenant’s data - the filesystem servers will not route data for @tcp1 to the router @tcp2 • Either a second Lustre router or the Lustre servers would need to be compromised to intercept or reroute the data
  • 25. Port security... The routed Lustre provider network ( tcp32769 etc) required that port security was disabled on the virtual Lustre router ports. neutron port-list | grep 172.27.70.36 | awk '{print $2}' 08a1808a-fe4a-463c-b755-397aedd0b36c neutron port-update --no-security-groups 08a1808a-fe4a-463c- b755-397aedd0b36c neutron port-update 08a1808a-fe4a-463c-b755-397aedd0b36c -- port-security-enabled=False http://kimizhang.com/neutron-ml2-port-security/ But ! this was due to a race condition in Liberty release. We can now avoid this by adding the lustre provider network interface when the instance is created
  • 28. Asymmetric routing? Public network Lustre routers Lustre servers tcp0 Tenant network tcp32770 Provider Network Hostile system (e.g. laptop) tcp0 tcp0 http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html
  • 29. Conclusions • Follow our activities on http://hpc-news.sanger.ac.uk • Isolated POSIX islands can be deployed to OpenStack with Lustre 2.9+ • Performance is acceptable • Lustre routers require little CPU and memory • Physical routers work and can give good locality for network usage • Virtual routers work, can scale and give additional security benefits • Next steps: • Improve configuration automation • Understand the neutron port security issue • Improve network performance (MTU, OpenVSwitch etc).
  • 30. Acknowledgements DDN: Sébastien Buisson, Thomas Favre-Bulle, Richard Mansfield, James Coomer Sanger Informatics Systems Group: Pete Clapham, James Beal, John Constable, Helen Cousins, Brett Hartley, Dave Holland, Jon Nicholson, Matthew Vernon