Computer 10: Lesson 10 - Online Crimes and Hazards
RNP 5th J-PAS 11-Nov-2012
1. RNP
Brazilian National Education and Research Network
ICT Support for Large-scale Science
5th J-PAS collaboration meeting / 11-Sep-2012
Leandro N. Ciuffo Alex S. Moura
leandro.ciuffo@rnp.br alex.moura@rnp.br
1
3. Qualified as a non-profit Social Organization
(OS)
maintained by federal public resources of
• Government budget includes items to cover costs
network and also RNP operating costs
• Support by MCTI, MEC and now MinC (Culture)
• RNP monitored by MCTI, CGU and TCU Nacional
INPE - Instituto
de Pesquisas Espaciais
• Additional projects supported by sectorial funds directly or
LNCC - Laboratório INPA - Instituto Nacional
through Nacional de Computação
management contract, plus MS (Health) da
de Pesquisas
Científica
• “Reserach Unit” of the Ministry of S&T Amazônia
http://www.mct.gov.br/index.php/content/view/741.html
4. Build and operate R&E networks
• Maintenance and continued renewal of infrastructure
• RNP backbone of 2000 has been renewed 3 times (2004, 2005 and
2011), with large increases in maximum link capacity from 25 Mbps to 10
Gbps (factor of 400)
• Metro networks have been built in capital cities to provide access to
Point of Presence (PoP) at 1 Gbps or more
www.redecomep.rnp.br
• International capacity has increased from 300 Mbps in 2002 to over 20
Gbps since 2009 (factor of 70). RNP has also played a major role in building
the RedCLARA (Latin American regional) network, linking R&E networks from
more than 12 countries – www.redclara.net
• Testbed networks for network experimentation, especially
project GIGA (with CPqD) since 2003 and the EU-BR FIBRE project (2011-
2014)
5. Ipê Network
RNP Backbone Boa Vista
Macapá
Fortaleza
Manaus
Salvador
Brasilia
São Paulo
Rio de Janeiro
Bandwidth Commodity Internet
RedCLARA (to Europe)
Minimum…: 20Mbps Florianópolis Americas Light (to USA)
Maximum..: 10Gbps
Porto Alegre
Aggregated: 250Gbps
http://www.rnp.br/backbone/
7. Metro
Networks
• 23 cities operational
• 6 under deployment
• 13 planned
• 1980 Km
to IPÊ network
Institution A RNP
PoP
Institution B
Institution C
http://redecomep.rnp.br
12. Science paradigms evolution
Data-intensive research
unify theory, experiment and simulation at scale. “big data”
Computational Simulations
simulating complex phenomena. “in silico”
Theoretical Modeling
e,g, Kepler's and Newtonw´s laws
Empirical Science
describing natural phenomena
13. Key components of a new research
infrastructure
Scientific portal
Local
services and
workflows Big Data
Processing
Registering Discovering
Publishers
Harvesting Users
& Indexing
Publishing Information
Visualization
Data
Repositories
Instruments
15. Network Requirements Workshop
1. What science is being done?
2. What instruments and facilities are used?
3. What is the process/workflow of science?
4. How are the use of instruments and facilities, the process of
science, and other aspects of the science going to change
over the next 5 years?
5. What is coming beyond 5 years out?
6. Are new instruments or facilities being built, or are there
other significant changes coming?
20. Hybrid
Networks
• Since the beginning of the Internet, NRENs provide the routed IP
service
• Around 2002, NRENs have begun to provide two network services:
routed IP (traditional Internet)
end-to-end virtual circuits (a.k.a. “lightpaths”)
– This lightpath service is intended for users with high QoS needs, usually
guaranteed bandwidth, as is implemented by segregation
between their traffic and the general routed IP traffic.
• The GLIF organisation (www.glif.is) coordinates international
21. High bandwidth research connectivity
(lightpaths for supporting international collaboration)
GLIF world map, 2011 http://www.glif.is
22. GLIF links in
South America
• RNP networks
• Ipê backbone
(29,000 km)
• metro networks in
state capitals
• GIGA optical testbed,
from RNP and CPqD
• links 20 research
institutions in 7 cities
(750 km)
• KyaTera research
network in S. Paulo
• links research
institutions in 11
cities (1500 km)
25. Why?
• R&E networks in Brazil, and especially RNP, are
funded by government agencies to provide quality
network services to the national R&E community
• In most cases, this is handled normally by providing
R&E institutions with a connection to our networks,
which operate standard Internet services of good
quality.
• However, there are times when this is not enough…
26. Network Requirements and
Expectations
• Expected transfer rates to transfer data
• As a first step in improving your network
performance, it is critical to have a baseline
understanding of what speed you should expect
from your network connection under ideal
conditions.
• The following shows how long it takes to transfer
1 Terabyte of data across various speed networks:
10 Mbps network: 300h (12.5 days)
100 Mbps network: 30h
1Gbps network: 3h
10Gbps network: 20min
28. Inadequate performance for critical
applications
In some cases, the standard Internet services are not
good enough for high-performance or data-intensive
projects.
Sensitive to perturbations caused by security devices:
- Numerous cases of
firewalls causing problems
- Often difficult to diagnose
- Router filters can often
provide equivalent security
without the performance
impact
Science and Enterprise network requirements are in conflict
28
30. Remedies which can be
applied
Tuning of networking software is generally necessary on
high bandwidth and long latency data connections,
because of the peculiarities of TCP implementations
In the case of high QoS requirements it is often
necessary to use lightpaths, to avoid interference with
cross traffic
In many cases, both these
approaches are required
30
31. The Cipó Experimental Service
• We are now beginning to deploy dynamic circuits as an
experimental service on our network
– This will also interoperate with similar services in other networks.
32. Getting support
• If you need advice or assistance with these network
problems, it is important to get in touch with network
support
1. At your own institution
2. At your state network provider
www.rnp.br/pops/index.php
3. In the case of an specific circuit (lightpath) services, you
may contact RNP directly at pd@rnp.br
37. Network Diagnostic Tool (NDT)
• Test your bandwidth from your computer to the
RNP’s PoP
• São Paulo: http://ndt.pop-sp.rnp.br
• Rio de Janeiro: http://ndt.pop-rj.rnp.br
• Florianopolis: http://ndt.pop-sc.rnp.br
38. Recommended Approach
• On a high-speed network it takes less time to transfer 1 Terabyte of data
than one might expect.
• It is usually sub-optimal to try and get 900 megabits per second of
throughput on a 1 gigabit per second network path in order to move one
or two terabytes of data per day. The disk subsystem can also be a
bottleneck - simple storage systems often have trouble filling a 1 gigabit
per second pipe.
• In general it is not a good idea to try to completely saturate the network,
as you will likely end up causing problems for both yourself and others
trying to use the same link. A good rule of thumb is that for periodic
transfers it should be straightforward to get throughput equivalent to 1/4 to
1/3 of a shared path that has nominal background load.
• For example, if you know your receiving host is connected to 1 Gbps
Ethernet, then a target of speed of 150-200 Mbps is reasonable. You can
adjust the number of parallel streams (as described on the tools page)
that you are using to achieve this.
• Many labs and large universities are connected at speeds of at least 1
32
Gbps, and most LANs are at least 100 Mbps, so if you don't get at least
39. Performance using TCP
• There are 3 important variables (there are others)
that affect TCP performance: packet loss, latency
(or RTT - Round Trip Time), and buffer size/window.
All are interrelated.
• The optimal buffer size is twice the product
bandwidth*delay of the link/connection:
• buffer size = bandwidth x RTT
• e.g.: if the result of ping if 50ms and the end-to-end
network is all 1G or 10G Ethernet, the TCP receiving
buffers (an operating system parameter) should be:
• 0.05 seg x (1 Gbit / 8 bits) = 6.25 MBytes
40. TCP Congestion Avoidance
Algorithms
• The TCP reno congestion avoidance algorithm was the default in all
TCP implementations for many years. However, as networks got
faster and faster it became clear that reno would not work well for
high bandwidth delay product networks. To address this a number
of new congestion avoidance algorithms were developed, including:
• reno: Traditional TCP used by almost all other operating systems.
(default)
• cubic: CUBIC-TCP
• bic: BIC-TCP
• htcp: Hamilton TCP
• vegas: TCP Vegas
• westwood: optimized for lossy networks
• Most Linux distributions now use cubic by default, and Windows
now uses compound tcp. If you are using an older version of Linux,
be sure to change the default from reno to cubic or htcp.
• More details on can be found at:
http://en.wikipedia.org/wiki/TCP_congestion_avoidance_algorithm
41. TCP Congestion Avoidance
Algorithms many years. However, as networks the default and all TCP
• The TCP reno congestion avoidance algorithm was
implementations for got faster
in
faster it
became clear that reno would not work well for high bandwidth delay product
networks. To address this a number of new congestion avoidance algorithms were
developed, including:
• reno: Traditional TCP used by almost all other operating systems (default)
• cubic: CUBIC-TCP
• bic: BIC-TCP
• htcp: Hamilton TCP
• vegas: TCP Vegas
• westwood: optimized for lossy networks
• Most Linux distributions now use cubic by default, and Windows
now uses compound tcp. If you are using an older version of
Linux, be sure to change the default from reno to cubic or htcp.
• More details on can be found at:
http://en.wikipedia.org/wiki/TCP_congestion_avoidance_algorithm
42. MTU Issues
• Jumbo Ethernet frames can increase
performance by a factor of 2-4.
• ping tool can be used to verify the MTU size.
• For example, on Linux you can do:
• ping -s 8972 -M do -c 4 10.200.200.12
• Other tools that can help verify the MTU size are
scamper and tracepath
43. Say No to scp: Why you should avoid scp over a
WAN
• In a Unix environment scp, sftp, and rsync are commonly
used to copy data between hosts.
• While these tools work fine in a local environment, they
perform poorly on a WAN.
• The openssh versions of scp and sftp have a built in 1 MB
buffer (previously only 64 KB in openssh older than version
4.7) that severely limits performance on a WAN.
• rsync is not part of the openssh distribution, but typically uses
ssh as transport (and is subject to the limitations imposed by
the underlying ssh implementation).
• DO NOT USE THESE TOOLS if you need to transfer large
data sets across a network path with a RTT of more than
around 25ms.
• More information is here.
44. Why you should avoid scp over a WAN
(cont.)
• The following results are typical: scp is 10x slower
than single stream GridFTP, and 50x slower than
parallel GridFTP.
• Sample Results
Berkeley, CA to Argonne, IL (near Chicago).
RTT = 53 ms, network capacity = 10Gbps.
46. A Simple Science DMZ
• A simple Science DMZ has several essential components. These include
dedicated access to high-performance wide area networks and advanced
services infrastructures, high-performance network equipment, and
dedicated science resources such as Data Transfer Nodes. Here is a
diagram of a simple Science DMZ showing these components and data
paths:
47. Science DMZ: Supercomputer Center
Network below illustrates a simplified supercomputer center
• The diagram
network. While this may not look much like the previous simple
Science DMZ diagram, the same principles are used in its
design.
48. Science DMZ: Big Data Site
• For sites that handle very large data volumes (e.g. for big experiments such as the LHC), individual data
transfer nodes are not enough.
• Data transfer clusters are needed: groups of machines serve data from multi-petabyte data stores.
• The same principles of the Science DMZ apply - dedicated systems are used for data transfer, and the
path to the wide area is clean, simple, and easy to troubleshoot. Test and measurement are integrated in
multiple locations to enable fault isolation. This network is similar to the supercomputer center example in
that the wide area data path covers the entire network front-end.
49. Data Transfer Node (DTN)
• Computer systems used for wide area data transfers perform far
better if they are purpose-built and dedicated to the function of wide
area data transfer. These systems, which we call Data Transfer
Nodes (DTNs), are typically PC-based Linux servers built with high-
quality components and configured specifically for wide area data
transfer.
• ESnet has assembled a reference implementation of a host that
can be deployed as a DTN or as a high-speed GridFTP test
machine.
• The host can fill a 10Gbps network connection with disk-to-disk
data transfers using GridFTP.
• The total cost of this server was around $10K, or $12.5K with the
more expensive RAID controller. If your DTN node is used only as
a data cache using RAID0 instead of a reliable storage server using
RAID5, you can get by with the less expensive RAID controller.
• Key aspects of the configuration include: recent version of the
50. DTN Hardware Description
• Chassis: AC SuperMicro SM-936A-R1200B 3U 19" Rack Case with
Dual 1200W PS
• Motherboard: SuperMicro X8DAH+F version 1.0c
• CPU: 2 x Intel Xeon Nehalem E5530 2.4GHz
• Memory: 6 x 4GB DDR3-1066MHz ECC/REG
• I/O Controller: 2 x 3ware SAS 9750SA-8i (about $600) or 3ware SAS
9750-24i4e (about $1500)
• Disks: 16 x Seagate 500GB SAS HDD 7,200 RPM ST3500620SS
• Network Controller: Myricom 10G-PCIE2-8B2-2S+E
• Linux Distribution
• Most recent distribution of CentOS Linux
• Install 3ware driver: http://www.3ware.com/support/download.asp
• Install ext4 utilities: yum install e4fsprogs.x86_64
51. DTN Tuning
• Add to /etc/sysctl.conf, then run sysctl -p
• # standard TCP tuning for 10GE
• net.core.rmem_max = 33554432
• net.core.wmem_max = 33554432
• net.ipv4.tcp_rmem = 4096 87380 33554432
• net.ipv4.tcp_wmem = 4096 65536 33554432
• net.ipv4.tcp_no_metrics_save = 1
• net.core.netdev_max_backlog = 250000
• Add to /etc/rc.local
• #Increase the size of data the kernel will read ahead (this favors sequential reads)
• /sbin/blockdev --setra 262144 /dev/sdb
• /sbin/blockdev --setra 262144 /dev/sdc
• /sbin/blockdev --setra 262144 /dev/sdd
• # increase txqueuelen
• /sbin/ifconfig eth2 txqueuelen 10000
• /sbin/ifconfig eth3 txqueuelen 10000
• # make sure cubic and htcp are loaded
• /sbin/modprobe tcp_htcp
• /sbin/modprobe tcp_cubic
• # set default to htcp
• /sbin/sysctl net.ipv4.tcp_congestion_control=htcp
• # with the Myricom 10G NIC increasing interrupt coalencing helps a lot:
52. DTN Tuning (cont.)
• Tools
• Install a data transfer tool such as GridFTP - see the
GridFTP quick start page. Information on other tools can be
found on the tools page.
• Performance Results for this configuration
• Back-to-Back Testing using GridFTP
• - memory to memory, 1 10GE NIC: 9.9 Gbps
- memory to memory, 4 10GE NICs: 38 Gbps
- disk to disk: 9.6 Gbps (1.2 GBytes/sec) using large files on
all 3 disk partitions in parallel
53. References (1/3)
• TCP Performance Tuning for WAN Transfers - NASA HECC Knowledge Base
http://www.nas.nasa.gov/hecc/support/kb/TCP-Performance-Tuning-for-WAN-Transfers_137.html
• Google's software-defined/OpenFlow backbone drives WAN links to 100 per cent utilization -
Computerworld
http://www.computerworld.com.au/article/427022/google_software-defined_openflow_backbone_drives_wan_l
• Achieving 98Gbps of Crosscountry TCP traffic using 2.5 hosts, 10 x 10G NICs, and 10 TCP streams
http://www.internet2.edu/presentations/jt2012winter/20120125-Pouyoul-JT-lighting.pdf
• Tutorials / Talks
• Achieving the Science DMZ: Eli Dart, Eric Pouyoul, Brian Tierney, and Joe Breen, Joint Techs, January 2012. (
watch the webcast)
• Tutorial in 4 sections: Overview and Archetecture, Building a Data Transfer Node,
Bulk Data Transfer Tools and PerfSONAR, Case Study: University of Utah's Science DMZ
• How to Build a Low Cost Data Transfer Node: Eric Pouyoul, Brian Tierney and Eli Dart, Joint Techs, July 2011.
• High Performance Bulk Data Transfer: (includes TCP tuning tutorial), Brian Tierney and Joe Metzger, Joint Techs,
July 2010.
• Science Data Movement: Deployment of a Capability: Eli Dart, Joint Techs, January 2010.
• Bulk Data Transfer Tutorial, Brian Tierney, September 2009
• Internet2 Performance Workshop, current slides
• SC06 Tutorial on high performance networking, Phil Dykstra, Nov 2006
43
54. References (2/3)
• Papers
• O'Reilly ONLamp Article on TCP Tuning
• Tuning
• PSC TCP performance tuning guide
• SARA Server Performance Tuning Guide
• Troubleshooting
• Fermilab Network Troubleshooting Methodology
• Geant2 Network Tuning Knowledge Base
• Network and OS Tuning
• Linux IP Tuning Info
• Linux TCP Tuning Info
• A Comparison of Alternative Transport protocols
44
55. References (3/3)
• Network Performance measurement tools
• Convert Bytes/Sec to bits/sec, etc.
• Measurement Lab Tools
• Speed Guide's performance tester and TCP analyzer . (mostly useful for home users)
• ICSI's Netalyzr
• CAIDA Taxonomy
• SLAC Tool List
• iperf vs ttcp vs nuttcp comparison
• Sally Floyd's list of Bandwidth Estimation Tools
• Linux Foundation's TCP Testing Page
• Others
• bufferbloat.net: Site devoted to pointing out the problems with large network buffers on slower networks, such as
homes or wireless.
45
56. Thank you / Obrigado!
Leandro Ciuffo - leandro.ciuffo@rnp.br
Alex Moura - alex@rnp.br
Twitter: @RNP_pd
53