2. Protocol Stack Summary
“turtles all the way down”
each layer uses the services of the layer below and provides a
service to the layer above
Python lets you work at the layer of your choice
programs are “cleaner” the higher the layer you use
layers work by “hiding” the layers below behind function calls
3. What lies under Socket?
TCP/UDP
IP
“link layer”
Internet Protocol Stack
4. Networking
About “sharing” resources.
Compare to sharing of disk, IO devices, etc done by programs
running on a computer
Computer Example: OS is the master controller
Network Example: Each participant “plays by the rules” but no
“controller”
5. Networking
All network cards share the same ethernet cable
all wireless transmitters share the same frequency channels
fundamental unit of sharing is “packet”
individual packets carry addressing info sufficient to arrive at
final destination.
6. Addressing
two layers: one hop at a time and end-to-end
single hop addressing performed by link layer
end-to-end addressing is IP
process addressing is TCP or UDP
8. Domain Name Service
DNS converts host names into host IP addresses.
corresponds to directory assistance
address = socket.gethostbyname(name);
9. How DNS Works
gethostbyname() first looks in /etc/hosts
if this fails then it looks in /etc/resolv.conf for the address of
“directory assistance”, also called a DNS Server.
sends the request to this address
Observation: If your DNS server is down, you won't get
anywhere on the Internet.
10. Routing
Each time a packet arrives at a new node a decision must be
made at that node as to where to send the packet next.
Guiding principle of routing on the Internet is that each time a
packet “hops” from one node to another it is always one hop
closer to its final destination.
Exercise: Difference between host and node.
11. Lots of Reading
The classic text is TCP/IP Illustrated: Vol I by Richard Stevens.
PDF file available on the web at books.google.com among
other places
We will concentrate on Chapters 1-4, 9, 11, 14, 17-19.
14. GoogleMaps
googlemaps library (3rd party) uses
urllib, uses
httplib, uses
Socket, uses
TCP, IP,
Ethernet
GoogleMaps
URL
HTTP
Socket
protocol stack
inside the actual
program itself
TCP, IP and
Ethernet make up
the OS part of the
protocol stack
15. APIs vs Sockets:
well-tested
written by experts
common practice to use them
we still need to understand Sockets to
appreciate things that depend upon them
16. Wireshark:
lets you look at packets crossing the wire
needs root permissions
easy to filter out unneeded traffic
I saved some traffic and you can view it with Wireshark (see
course web page).
17. Highest Level API Example:
Fetch a JSON document without realizing it:
#!/usr/bin/env python
# Foundations of Python Network Programming - Chapter 1 - search1.py
# Not even clear you are using a web service
from googlemaps import GoogleMaps
address = '207 N. Defiance St, Archbold, OH'
print GoogleMaps().address_to_latlng(address)
25. # search4.py output
HTTP/1.1 200 OK
Content-Type: text/javascript; charset=UTF-8
Vary: Accept-Language
Date: Wed, 21 Jul 2010 16:10:38 GMT
Server: mafe
Cache-Control: private, x-gzip-ok=""
X-XSS-Protection: 1; mode=block
Connection: close
{
"name": "207 N. Defiance St, Archbold, OH",
"Status": {
"code": 200,
"request": "geocode"
},
"Placemark": [ {
...
"Point": {
"coordinates": [ -84.3063479, 41.5228242, 0 ]
}
} ]
}
data transmitted by
web server
data read into
program variable
26. Things We've Seen:
protocols stacked on top of one another
higher level protocols using services of lower levels
programs get more specific and harder to maintain the lower
down you go
the idea behind high-level protocols is precisely to hide lower
levels
there's a whole lot going on below Socket.
27. The Stack:
Fundamental unit of shared information is the packet.
Typical packet structure:
Transmitted as a single unit (but serially)
Routing is generally at the packet level
Things packets contain: data, addresses, layering,
sequencing, protocol bytes, checksums
ethernet packets are called frames.
program
data
TCP/UDP
header
IP
header
ethernet
header
28. Ethernet:
14-byte header
addresses: two 6-byte addresses – source and destination
type: 2 bytes – 0800 == IP datagram
the two network cards involved can process the header
without using the CPU, RAM, etc.
cable length (100m) and MTU
CSMA/CD
Some of the details:
http://serverfault.com/questions/422158/
what-is-the-in-the-wire-size-of-a-ethernet-frame-1518-or-1542
29. IP Addresses:
32 bits: a.b.c.d
network address – n bits; host id – (32-n) bits
some times the network part has a subnet component; some
times the subnet component is carved out of the hostID bits
a.b == 137.140 == New Paltz network address
a.b.c == 137.140.8 == CS subnet at New Paltz
the part of the network address that is not subnet identifies an
organization like New Paltz.
31. IP Special Addresses:
127.*.*.*: local to the current machine
10.*.*.*, 172.16-31.*.*, 192.168.*.*: private subnets.
none of these address found on the larger Internet.
32. IP Routing:
Guiding principle: after each hop you are one step closer to
your destination
typical local routing table contains a default entry pointing to
the Internet together with one entry for each local subnet the
host belongs to.
[pletcha@archimedes PPT]$ netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags Iface
0.0.0.0 192.168.1.1 0.0.0.0 UG wlan0
192.168.1.0 0.0.0.0 255.255.255.0 U wlan0
192.168.122.0 0.0.0.0 255.255.255.0 U virbr0
33. IP Routing Next Hop Algorithm:
Search Destination column of table entries with H-flag set
which is an exact match to Destination IP in packet
If found and Flag is G or H then Gateway is next hop;
otherwise Destination IP is next hop.
If not found then calculate Dest IP && Genmask for each entry
that is not the default. If Dest IP && Genmask == Destination
column entry then if Flag is G or H then Gateway is next hop;
otherwise Destination IP is next hop.
Otherwise use the default entry. Flag is almost always G so
Gateway is next hop IP.
34. IP Routing Next Hop Algorithm:
Once you have the next hop IP you need to determine the
next hop ethernet.
The Address Resolution Protocol (ARP) converts the next hop
IP into a next hop ethernet. More recently replaced by the ip
neigh command
Exercise: Read up on ARP in TCP/IP Illustrated.
[pletcha@archimedes PPT]$ ip neigh
137.140.39.139 dev enp0s25 lladdr 00:c0:17:c2:14:f3 STALE
137.140.193.250 dev wlp3s0 lladdr 00:1f:29:07:e4:6a STALE
137.140.39.250 dev enp0s25 lladdr 00:21:a0:39:65:00 DELAY
35. ARP Example
● From my laptop (137.140.8.104) I try to locate joyous
(137.140.8.101)
● Because of my routing table I know it is locally connected so
137.140.8.101 is “next hop”.
[pletcha@archimedes PPT]$ ping 137.140.8.101
PING 137.140.8.101 (137.140.8.101) 56(84) bytes of data.
64 bytes from 137.140.8.101: icmp_seq=1 ttl=64 time=0.266 ms
^C
[pletcha@archimedes PPT]$ netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 137.140.8.250 0.0.0.0 UG 0 0 0 enp0s25
137.140.8.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s25
137.140.192.0 0.0.0.0 255.255.254.0 U 0 0 0 wlp3s0
38. Packet Fragmentation:
The Internet Protocol Suite supports 64k packets but specific
IP networks support much smaller packets.
Ethernet networks support 1500 byte packets.
IP headers contain a Don't Fragment (DF) flag, set by sender.
– DF not set, then a router can fragment a packet too
large to be forwarded on a particular interface.
– DF set, router sends an ICMP message to original
sender so sender can fragment the original message
and try again.
UDP: DF unset by OS
TCP: DF set by OS
39. Packet Fragmentation (continued):
Each subnet has an MTU – Maximum Transmission Unit.
Path MTU = min hop MTU over all hops in a path
DSL providers make MTU = 1492.
– Initially many service providers used MTU = 1500 and
disabled ICMP so never knew their “large” traffic was
being dropped.
TCP/IP Illustrated discusses how fragmentation actually
happens (Read Section 11.5).
42. net_py
Reliable or Unreliable
Do you need service that keeps packets in order and
guarantees delivery?
Are your packets “individual” and so can arrive in any order or
possibly even not at all?
In the above example, you would only need to worry about
non-delivery (and hence non-reply).
Example:
Suppose you plan to send a single packet and get a single reply?
43. net_py
UDP
Short, self-contained requests and responses.
Real-time traffic like voice.
Author says it is not used often but that is not to say it is not
useful.
A single server port can receive packets from millions of
distinct clients over time with no additional memory
allocation beyond original setup.
congested network routers tend to be more sympathetic to
UDP traffic compared to TCP traffic since they know that
the latter will be retransmitted if dropped by the router.
You can add your own “99% reliability” more cheaply
(performance) than using TCP's 100% reliability.
easier to use
45. net_py
Addresses and Ports
Ports allow multiple programs to use the same TCP/IP stack
Packets come in and go up the same stack. They are
“demultiplexed” at the top of the stack by port number to
different programs.
The pair (IP address:port number) is called a socket and
identifies a program connected to the internet, running on
some machine.
Every packet sent on the internet contains a quadruple
that identifies the connection.
Example:
www.mysite.com:8080
(sourceIP: source port, destIP: dest port)
46. net_py
Port Ranges
Well-known: 0-1023
Registered: 1024-4915: used by large companies
The Rest: above, used by us
How to find a well-known or registered port number
Check out the IANA (www.iana.org)
>>> import socket
>>> socket.getservbyname('domain')
47. net_py
Python Virtual Environment
● It is a good idea to create your own virtual environment for
the various python programming projects found in this
course.
● This gives you an easy way to install special python
packages just for your own use and leave the main python
install on your computer alone.
● This is a useful skill for future python development you might
do.
● Here is an on-line tutorial; read it carefully and follow the
instructions.
http://dabapps.com/blog/introduction-to-pip-and-virtualenv-python/
http://virtualenvwrapper.readthedocs.org/en/latest/
48. net_py
Using virtualenvwrapper
● The virtualenv tutorial suggests that inside each project
directory (say Project1) there should be a project-specific
virtual environment (Project1/env).
● There is an alternative approach suggested by
virtualenvwrapper – an extension of virtualenv.
● When using using virtualenvwrapper your file hierarchy looks
like:
● All the executable code in .virtualenvs is from the wrapper
package.
Projects:
ArcGIS BenchMark GeoIP Memcached QGIS Rabbit
.virtualenvs:
ArcGIS hook.log.1 postmkvirtualenv premkvirtualenv
Benchmark initialize postrmproject prermproject
BenchMark Memcached postrmvirtualenv prermvirtualenv
GeoIP postactivate preactivate QGIS
get_env_details postdeactivate predeactivate Rabbit
hook.log postmkproject premkproject README
49. net_py
Sockets
Python makes calls to the lower level operating system-level
calls that implement the networking functionality
The python interface is slightly OO
Python's contact to these OS calls is through a program
construct called a socket
Sockets are “files” just like everything else and are accessed
via a file descriptor, which in python you never see but in C it
is all you see.
Sockets are the file descriptor
We read and write to sockets the same way we do files
50. net_py
import socket, sys
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# AF_INET is a “family” type
# SOCK_DGRAM means UDP and not TCP
MAX = 65535 # packet size
PORT = 1060 # port used by our server
if sys.argv[1:] == ['server']: # are these two lists equal
s.bind(('127.0.0.1', PORT)) # specify your local socket address
# since you are a server
print 'Listening at', s.getsockname() # returns a list
while True:
data, address = s.recvfrom(MAX) # wait for traffic
print 'The client at', address, 'says', repr(data) # data.toString()
s.sendto('Your data was %d bytes' % len(data), address)
# send reply to original sender
51. net_py
elif sys.argv[1:] == ['client']:
print 'Address before sending:', s.getsockname()
# 0.0.0.0:0 means no port on any interface
s.sendto('This is my message', ('127.0.0.1', PORT))
print 'Address after sending', s.getsockname()
# bind is automatic, but only to a port number
# 0.0.0.0:34567 means port number 34567 on any interface
data, address = s.recvfrom(MAX) # overly promiscuous - see Chapter 2
# will accept a datagramfrom anyone and not just the server
print 'The server', address, 'says', repr(data)
else:
print >>sys.stderr, 'usage: udp_local.py server|client'
52. net_py
elif sys.argv[1:] == ['client']:
print 'Address before bind:', s.getsockname()
s.bind('',55000)
print 'Address after bind:', s.getsockname()
s.sendto('This is my message', ('127.0.0.1', PORT))
print 'Address after sending', s.getsockname()
# bind is automatic, but only to a port number
# 0.0.0.0:55000 means port number 55000 on any interface
data, address = s.recvfrom(MAX) # overly promiscuous - see Chapter 2
# will accept a datagramfrom anyone and not just the server
print 'The server', address, 'says', repr(data)
else:
print >>sys.stderr, 'usage: udp_local.py server|client'
53. net_py
Ports
Remote access to a socket is through its port number.
Access from the program that opens the socket is through the
file descriptor.
54. net_py
Socket
Author says “both IP address and port number start as all
zeros – a new socket is a blank slate”. This is not entirely true.
recv() vs recvfrom():
recv(): does not return sender address and typical of client
code because clients typically go to a single server and so
“know” where any data they receive comes from.
recvfrom() returns sender address and is typical of server
code because servers receive data from many clients and
typically need to reply to a client request with a packet sent
right back to the same client. The socket.sendto() function
takes the address returned by recvfrom().
55. net_py
Congestion:
If you timeout and resend and the problem is that the server
is down, you will add useless traffic to the network and cause
congestion that will slow everyone down.
Best answer is each time you timeout extend the timeout
interval so that eventually you are resending a packet only
once per hour or more.
exponential backoff: delay *= 2
What about giving up?
What about trying forever?
best to give up only if you need a timely answer or not at all
weather icon example
56. net_py
More Complications:
Suppose a successful request/reply exchange takes 200 ms
on average.
What does the server do with duplicate requests?
In both cases the server must keep track of request identifiers
so it knows what requests have already been received.
If client also keeps track of request IDs and whether or not
they have been replied to then the client can quietly drop
duplicate replies.
Client can use this information to set a minimum timeout
delay to be at least 200 ms?
1) Reply again
2) Don't bother to repeat reply (How does the server know a previous reply
got to the original sender?).
57. net_py
reply lost
no need to pass off
to higher level but
must keep data ID
to know it has already
been forwarded
optional
58. net_py
Reliable Code
#!/usr/bin/env python
# Foundations of Python Network Programming - Chapter 2 - udp_remote.py
# UDP client and server for talking over the network
import random, socket, sys
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
MAX = 65535
PORT = 1060
# usage: udp_remote.py server [ <interface> ]
if 2 <= len(sys.argv) <= 3 and sys.argv[1] == 'server':
# server side
interface = sys.argv[2] if len(sys.argv) > 2 else '' # two single quotes
s.bind((interface, PORT)) # '' is the same as '0.0.0.0' == all interfaces
print 'Listening at', s.getsockname()
while True:
data, address = s.recvfrom(MAX) # server always needs client address
if random.randint(0, 1): # flip a coin and reply only if heads
print 'The client at', address, 'says:', repr(data)
s.sendto('Your data was %d bytes' % len(data), address)
else: # tails
print 'Pretending to drop packet from', address
59. net_py
Reliable Code
# Usage: udp_remote.py client <host>
elif len(sys.argv) == 3 and sys.argv[1] == 'client':
hostname = sys.argv[2]
s.connect((hostname, PORT)) # no packets exchanged;
print 'Client socket name is', s.getsockname()
delay = 0.1 # initial retry delay
while True: # keep resending if no reply
s.send('This is another message')
print 'Waiting up to', delay, 'seconds for a reply'
s.settimeout(delay) # so recv() will block only so long
try:
data = s.recv(MAX) # blocking <delay> seconds
except socket.timeout: # timeout expired
delay *= 2 # double timeout delay time
if delay > 2.0: # time to stop all this nonsense
raise RuntimeError('I think the server is down')
else:
break # we are done after one success,
# and can stop looping
print 'The server says', repr(data)
61. net_py
Connecting vs Implicit Connecting
• In Listing 2-1 we used sendto(msg,address) on the client and
an implicit binding happened when the first datagram was sent.
• In Listing 2-2 we used send(msg) so the binding had to be
done explicitly, and at the same time we indicate where the
message is to be sent
• In the first situation we could send to various different servers
by modifying the address argument. In the latter we can only
send to the address we originally connected to.
s.connect(remote_host,remote_PORT)
...
s.send(data) check netstat -an | grep udp
at this point
62. net_py
How things look after s.sendto(msg,address):
[pletcha@archimedes ~]$ python
>>> import socket
>>> s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
>>> s.sendto('my message',('wyvern.cs.newpaltz.edu',50000))
10
>>> s.getsockname()
('0.0.0.0', 52011)
[pletcha@archimedes ~]$ netstat -an | grep udp
udp 0 0 0.0.0.0:55188 0.0.0.0:*
udp 0 0 0.0.0.0:47503 0.0.0.0:*
udp 0 0 0.0.0.0:52011 0.0.0.0:*
udp 0 0 0.0.0.0:60386 0.0.0.0:*
udp 0 0 192.168.122.1:53 0.0.0.0:*
anyone can send data to
my port 52011.
63. net_py
Sneaking into the conversation
[pletcha@archimedes ~]$ python
>>> import socket
>>> s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
>>> s.sendto('my message',('wyvern.cs.newpaltz.edu',50000))
10
>>> s.getsockname()
('0.0.0.0', 33897)
>>> data,address = s.recvfrom(4000)
>>> print data
Fake reply
[pletcha@archimedes ~]$ python
>>> import socket
>>> s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
>>> s.sendto('Fake reply',('127.0.0.1',33697))
10
use the bound port number
as your destination
64. net_py
How things look after s.connect(address):
> python
>>> import socket
>>> s =socket.socket(socket.AF_NET,socket.SOCK_DGRAM)
>>> s.connect((wyvern.cs.newpaltz.edu,50000))
[pletcha@archimedes ~]$ netstat -an | grep udp
udp 0 0 0.0.0.0:55188 0.0.0.0:*
udp 0 0 137.140.8.104:51400 137.140.4.187:50000 ESTABLISHED
udp 0 0 0.0.0.0:47503 0.0.0.0:*
udp 0 0 0.0.0.0:60386 0.0.0.0:*
udp 0 0 192.168.122.1:53 0.0.0.0:* no one can send data
to my port 51400 except
what I connected to
(I really didn't connect
since no data was sent)
65. net_py
Exercise
•
• Lesson to learn: If you want to use sendto() instead of
connect() followed by send(), then you can use recvfrom() and
look at each sender address to be sure it is an address you
recognize.
• Exercise: Think of a UDP application that would have you
sending data to several destinations so you could expect
answers back from them all.
Repeat the “sneak into the conversation” example running the client software
of Listing 2-2 and see that your message never gets read by the application.
It is rejected by UDP
What happens is that Ethernet and IP recognize the sneaky packet as intended
for this machine. IP forwards the packet to UDP. UDP looks up the destination
port and sees that it will only accept data from (wyvern, 50000)
66. net_py
Request IDs
• Sending a packet ID with every packet makes it easier to
identify replies and know what request they are replying to.
• (See homework)
• Packet IDs are some protection against spoofing. Client that
doesn't call connect() can be exposed to unexpected
(spoofing) traffic. If the unwelcome traffic doesn't know the
packet ID it is not possible to fake a response.
67. net_py
Binding to Interfaces
• We have used bind() to listen on 127.0.0.1 or on ' ', which
means all network interfaces.
• We can specify an interface if we know its IP address
(remember /sbin/ifconfig -a)
• Connecting to the College using VPN I have 4 network
interfaces on my laptop
68. net_py
4 Network Interfaces at once:
[pletcha@archimedes 02]$ ifconfig -a
cscotun0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST> mtu 1399
inet 137.140.108.133 netmask 255.255.255.224 destination 137.140.108.1 ...
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0 ...
virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ...
wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.124 netmask 255.255.255.0 broadcast 192.168.1.255 ...
69. net_py
Ports are bound to Interfaces
[pletcha@archimedes 02]$ cat all_interfaces.sh
#!/usr/bin/bash
./udp_remote.py server 127.0.0.1&
sleep 1; ./udp_remote.py server 137.140.108.133&
sleep 1; ./udp_remote.py server 192.168.122.1&
sleep 1; ./udp_remote.py server &
sleep 1; ./udp_remote.py server 192.168.1.124&
[pletcha@archimedes 02]$ ./all_interfaces.sh
Listening at ('127.0.0.1', 1060)
Listening at ('137.140.108.133', 1060)
Listening at ('192.168.122.1', 1060)
Traceback (most recent call last):
File "./udp_remote.py", line 13, in <module>
s.bind((interface, PORT))
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 98] Address already in use
[pletcha@archimedes 02]$ Listening at ('192.168.1.124', 1060)
70. net_py
local host vs remote host:
• Local packets can arrive destined for 127.0.0.1 but they can
also arrive locally by using the machine IP address, say
192.168.122.1.
• By binding to 127.0.0.1, external clients can not talk to you.
• By binding to 192.168.122.1 both internal and external clients
can talk to you.
• Lesson to Learn: Binding means specifying both a port
number and a network interface (or all interfaces), so the basic
data structure is not just a port but an (IP address, port) pair, in
other words, a socket.
71. net_py
Two Interface problem:
• Suppose a machine has two external interfaces –
192.168.122.1 and 192.168.1.124.
• If we open a socket at (192.168.1.124,1060) but try to send
data to (192.168.122.1, 1060) what will happen? Apparently it
depends on the OS.
• On my laptop I get:
[pletcha@archimedes 02]$ ps -ef | grep remote
pletcha 28577 3450 0 18:07 pts/1 python ./udp_remote.py server 192.168.122.1
[pletcha@archimedes 02]$ ./udp_remote.py client 192.168.1.124
Client socket name is ('192.168.1.124', 48271)
Waiting up to 0.1 seconds for a reply
Traceback (most recent call last):
File "./udp_remote.py", line 33, in <module>
data = s.recv(MAX)
socket.error: [Errno 111] Connection refused
74. net_py
TCP
workhorse of the Internet – 1974
- data stream (mule train versus conveyor belt)
most traffic suited to TCP:
- long conversations like SSH, large exchanges like HTTP
Reliability: packet sequence numbers and retransmission.
TCP Sequence Number: byte number of first byte in data
package plus 1 for SYN and 1 for FIN
Retransmitted packets can begin at different packet
boundaries.
Initial SeqNum is randomly choosen to confound “villains”.
75. net_py
TCP is bursty
TCP noes not keep the sender and receiver in lock-step.
Sender can send multiple packets with no acknowledgment
of any being received.
78. net_py
TCP Overwhelmed
sender receiver
time
ACK 15
ACK 40
TCP Overwhelmed:
Receiver, if over-
whelmed, sends TCP
option, telling sender
to shut window down;
specifies window new
window size.
79. net_py
TCP Congested
sender receiver
time
ACK 15
ACK 40
TCP Congested:
Receiver, if it sees a
missing packet,
assumes congestion
and shuts the window
down as in the case of
being overwhelmed.
80. net_py
When to use TCP
Most of the time.
When is it inappropriate?
- single request/reply exchange – TCP uses 9 packets in total
- too many clients – each connection costs memory allocation,
port allocation, etc.
- real time like audio chat; retransmitted voice is discarded
because conversation has moved on.
highly compressed voice from previous and next packet
normally compressed voice
81. net_py
TCP Socket
Listen Sockets and Data Sockets.
Listen Sockets: like UDP, open and waiting for SYN
Data Sockets always involve a connection.
UDP socket.connect() shows intention of data exchange
between two particular processes and is optional
TCP socket.connect() does the same and is fundamental.
(localIP,localport,remoteIP, remoteport)
82. net_py
client server
listen = socket.listen()
only accepts SYN
c_data = socket.connect()
s_data = listen.accept()
and blocks
in a loop
c_data.send() s_data.recv()
passive socket
active sockets,
c_data and s_data are
identical in nature, identified by
(clntIP,clntPort,servIP,servPort)
ie a connection
NOTE: Hundreds or thousands of active sockets are possible on server with
same (srvIP,srvPort) pair but only one passive socket with the same pair.
83. net_py
Example:
#!/usr/bin/env python
# Foundations of Python Network Programming - Chapter 3 - tcp_sixteen.py
# Simple TCP client and server that send and receive 16 octets
import socket, sys
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# doesn't know yet if active or passive
HOST = sys.argv.pop() if len(sys.argv) == 3 else '127.0.0.1'
PORT = 1060
def recvall(sock, length): # used by both client and server
data = ''
while len(data) < length: # building a complete message
more = sock.recv(length - len(data))
if not more:
raise EOFError('socket closed %d bytes into a %d-byte message'
% (len(data), length))
data += more
return data
84. net_py
Example:
if sys.argv[1:] == ['server']:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((HOST, PORT)) # bind to whatever we wish <HOST>
s.listen(1)
while True:
print 'Listening at', s.getsockname()
sc, sockname = s.accept()
print 'We have accepted a connection from', sockname
print 'Socket connects', sc.getsockname(), 'and', sc.getpeername()
message = recvall(sc, 16)
print 'The incoming sixteen-octet message says', repr(message)
sc.sendall('Farewell, client')
sc.close()
print 'Reply sent, socket closed'
85. net_py
socket.SO_REUSEADDR
NOTEs:
1: When a server passive socket is closed, TCP does not allow it to be
reused for several minutes to give all clients a chance to see that any future
listener is a new listener. If you want to rerun the program multiple times (as
in testing) then allow immediate reuse of the same passive socket.
2: If a popular server goes down you would want it to come back up immediately
and so for the same reason we set the SO_REUSEADDR socket option.
86. net_py
Example:
elif sys.argv[1:] == ['client']:
s.connect((HOST, PORT)) # can fail; unlike UDP
print 'Client has been assigned socket name', s.getsockname()
s.sendall('Hi there, server')
reply = recvall(s, 16)
print 'The server said', repr(reply)
s.close()
else:
print >>sys.stderr, 'usage: tcp_local.py server|client [host]'
NOTE: Calling send(), TCP might not send all bytes immediately and
returns the number actually sent;
calling sendall() asks that all data be sent asap and that happens before
the function call returns. Notice that program does not capture sendall()
return value.
NOTE: recv_all() loops until all expected data is received.
88. net_py
Exercises:
Create two sockets, one UPD and the other TCP and call
connect on both of them to a non-existent remote socket.
See what happens
[pletcha@archimedes 03]$ python
Python 2.7.3 (default, Jul 24 2012, 10:05:38)
>>> import socket
>>> udp_s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
>>> tcp_s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
>>> udp_s.connect(('137.140.215.244',1060))
# [pletcha@archimedes 03]$ netstat -an | grep 215
# udp 0 0 192.168.1.124:42809 137.140.215.244:1060 ESTABLISHED
>>> tcp_s.connect(('192.169.1.124',1060))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 110] Connection timed out
>>>
89. net_py
TCP send()
The Stack has space so send() dumps all its data into a TCP
buffer and returns, even though nothing has been sent. From
now on, it is TCP's headache.
TCP buffer is full or Stack is busy so send() blocks
TCP buffer has some space but not enough for all send()
wants to transmit so send() moves as much data as it can
into the TCP buffer and returns that number. It is the
program's responsibility to recall send() with data starting at
the first byte not previously accepted by TCP.
90. net_py
TCP send() code:
bytes_sent = 0
while bytes_sent < len(message):
message_remaining = message[bytes_sent:]
bytes_sent += s.send(message_remaining)
# python has a sendall() function that does this for us.
91. net_py
TCP recv()
Works like send():
If no data then recv() blocks
if plenty available then gives back just what the call asks for
if some data available but not what you ask for you are gien
what is available and need to build up your own complete
message by repeatedly calling recv().
93. net_py
No TCP recvall()?
Because we hardly ever know in advance the precise size we
are expecting. For example, suppose we send out an HTTP
GET request to maps.google.com
The data that comes back consists of a header, a blank line
and some data.
The header may or may not contain a Content-Length field.
If it does we know exactly how many bytes to read so could
use a recvall() function. If not then we just need to read until
we feel we've come to the end of the data.
In Listing 1.5 this means until we come to the end of a
syntactically correct JSON data object, serialized.
98. net_py
TCP sliding window
Sliding-window protocol without selective or negative
acknowledgment.
window “belongs” to sender; each end of a connection has a
window
n1: byte (sequence) number of next byte to be transmitted for
the first time
n2: byte (sequence) number of next byte to be acknowledged
B: data already transmitted but not acknowledged
data stream to be transmitted
n1 n2
direction of data flow
A B C
window direction
sender receiver
99. net_py
TCP Acknowledgment:
TCP does not “negative” acknowledge; ie, send a message
that something is missing
TCP does not “selectively” acknowledge; ie, send a message
that a specific byte range has been received
TCP acknowledgment number means TCP has received all
bytes up to but not including the ACK number.
100. net_py
TCP Flow Control
TCP sends a window size field in its header. This field is
used by the end that receives the TCP segment to control its
own sliding window size.
Typical window size is 4096. If the receiver end of a
connection sends a TCP header with a smaller number the
sender slows down transmission of new bytes until more
bytes are acknowledged by the other end (hence n1 – n2
shrinks)
101. net_py
TCP Options
MSS: Maximum Segment Size. Usually sent during 3-way
handshake so that both ends will know how big (or small)
future segments should be.
106. net_py
TCP MSL (Maximum Segment Lifetime)
MSL is the maximum time that a segment might expect to live
on the Internet while moving between the two hosts of a
connection.
When TCP performs an active close, and sends the final
ACK, that connection must stay in the TIME_WAIT state for
twice the MSL.
Prevents delayed segments from an earlier incarnation of the
same connection (srcIP,srcPort,destIP,destPort) from being
interpreted as part of the new connection
109. net_py
Domains
Top-level Domains: .edu, .com., .ca, etc
Domain Name: newpaltz.edu, etc
Fully Qualified Domain Name: wyvern.cs.newpaltz.edu.
Owning a domain name gives you the right to create FQDNs
with your Domain Name at the end. Technically any name that
ends with a '.' is also considered fully qualified.
Hostname: wyvern.
110. net_py
/etc/resolv.conf
How domain name searches start:
If your search name is not fully qualified search begins by
searching for using the name servers specified.
<your search name>
<your search name>.cs.newpaltz.edu
<your search name>.acsl.newpaltz.edu
<your search name>.newpaltz.edu
(GeoIP)[pletcha@archimedes pypcap-1.1]$ cat /etc/resolv.conf
# Generated by NetworkManager
domain newpaltz.edu
search cs.newpaltz.edu acsl.newpaltz.edu newpaltz.edu
nameserver 137.140.1.98
nameserver 137.140.1.102
111. net_py
Example:
Searching for my.website.cam will search for
my.website.cam
my.website.cam.cs.newpaltz.edu
my.website.cam.acsl.newpaltz.edu
my.website.cam.newpaltz.edu
But searching for my.website.cam. will search for
my.website.cam
only.
112. net_py
Example 2:
Searching for argos searches for:
argos.cs.newpaltz.edu # fails
argos.acsl.newpaltz.edu # fails
argos.newpaltz.edu # succeeds
This is why you can use host names only on campus.
113. net_py
Socket Names Used By:
mysocket.accept(): returns a tuple whose 2nd entry is a
remote address
mysocket.bind(address): binds to a local address so
outgoing packets have this source address.
mysocket.connect(address): indicates the remote address
that packets using this connection must either go to or come
from.
mysocket.getpeername(): Returns the remote address the
socket is connected to
mysocket.getsocketname(): returns the address of this
socket
mysocket.recvfrom(): UDP: returns data and the address of
the sender
mysocket.sendto(data,address): UDP, indicates the receiver
address.
114. net_py
Five Socket Properties:
socket.AF_INET: Internet Address Family => kind of network
and transport – UDP or TCP.
socket.AF_UNIX: like the internet but between processes on
the same host
socket.SOCK_DGRAM: packet service (UDP for AF_INET)
socket.SOCK_STREAM: stream service (TCP for AF_INET)
3rd argument, not used n this book
socket.IPPROTO_TCP: use TCP for SOCK_STREAM.
socket.IPPROTO_UDP: use TCP for SOCK_DGRAM
115. net_py
IPv6:
Will replace IPv4 when we run our ot IPv4 IP addresses.
IPv6 has more services than IPv4.
>>> import socket
>>> print socket.has_ipv6()
# tells you if your machine is capable of ipv6; not
# if it is enabled.
116. net_py
Address Resolution:
One way of avoiding specifying destination IP addresses, etc,
is to let the socket module tell you what you need to know.
We are asking, “How can we connect to the web server on
host gatech.edu?”
>>> import socket
>>> infolist = socket.getaddrinfo('gatech.edu','www')
>>> pprint infolist
[(1,2,6,'',('130.207.244.244','80'),
(1,2,17,'',('130.207.244.244','80')
# all the interfaces on .244.244 associated with 'www'
>>> ftpca = infolist[0] # == (1,2,6,'',('130.207.244.244','80')
>>> ftp = ftpca[0:3] # == (2,1,6)
>>> s = socket.socket(*ftp) # unpack a list with *
# s = socket.socket(ftp[0],ftp[1]) would do just as well
>>> s.connect(ftpca[4]) # ('130.207.244.244','80')
117. net_py
NOTE:
HTTPD officially supports TCP (6) and UDP (17).
gatech.edu is an alias for this host. It also has a “cannonical”
name. but we didn't ask for this so it wasn't returned.
Calling socket.getaddrinfo() we don't need to use
socket.AF_INET, etc, in our code.
118. net_py
Asking socket.getaddrinfo() for help in binding.
Problem: Provide an address for bind() without deciding this
yourself.
Example: Suppose you want to open a server (passive) using port 1060
>>> from socket import getaddrinfo
>>> import socket
>>> getaddrinfo(None,1060,0,socket.SOCK_STREAM,0,socket.AI_PASSIVE)
[(2, 1, 6, '', ('0.0.0.0', 1060)), (10, 1, 6, '', ('::', 1060, 0, 0))]
>>> getaddrinfo(None,1060,0,socket.SOCK_DGRAM,0,socket.AI_PASSIVE)
[(2, 2, 17, '', ('0.0.0.0', 1060)), (10, 2, 17, '', ('::', 1060, 0, 0))]
# we are asking here, where to bind to. We get an IPv4 and an IPv6 answer
# in both cases, TCP and UDP.
119. net_py
Asking socket.getaddrinfo() for help in binding.
Problem: Want a particular address for bind().
Example: Suppose you want to open a server (passive) using port 1060
and a known local IP address.
>>> from socket import getaddrinfo
>>> import socket
>>> getaddrinfo('127.0.0.1',1060,0,socket.SOCK_STREAM,0)
[(2, 1, 6, '', ('127.0.0.1', 1060))]
>>> getaddrinfo(localhost,1060,0,socket.SOCK_STREAM,0)
[(2, 1, 6, '', ('::1', 1060,0,0)), (2, 1,6, '', ('127.0.0.1', 1060))]
# we are asking here, where to bind to. We get an IPv4 and an IPv6 answer
120. net_py
Asking getaddrinfo() about services
The rest of the uses of getaddrinfo() are outward looking.
socket.AI_ADDRCONFIG: filter out things you can't use
socket.AI_AI_V4MAPPED: get IPV4 only
If you don't specify certain address types you get multiple
addresses
>>> getaddrinfo('ftp.kernel.org','ftp',0,socket.SOCK_STREAM,0,
... socket.AI_ADDRCONFIG | socket.AI_V4MAPPED)
[(2, 1, 6, '', ('149.20.4.69', 21))]
>>> getaddrinfo('iana.org','www',0,socket.SOCK_STREAM,0)
[(2, 1, 6, '', ('192.0.43.8', 80)), (10, 1, 6, '', ('2001:500:88:200::8', 80, 0, 0))]
121. net_py
Asking getaddrinfo() about services
The rest of the uses of getaddrinfo() are outward looking.
socket.AI_ADDRCONFIG: filter out things you can't use
socket.AI_AI_V4MAPPED: get IPV4 only
If you don't specify certain address types you get multiple
addresses
>>> getaddrinfo('ftp.kernel.org','ftp',0,socket.SOCK_STREAM,0,
... socket.AI_ADDRCONFIG | socket.AI_V4MAPPED)
[(2, 1, 6, '', ('149.20.4.69', 21))]
>>> getaddrinfo('iana.org','www',0,socket.SOCK_STREAM,0)
[(2, 1, 6, '', ('192.0.43.8', 80)), (10, 1, 6, '', ('2001:500:88:200::8', 80, 0, 0))]
122. net_py
Alternative Methods:
Older methods are hard-wired for IPv4 so should be avoided.
gethostbyname()
getfqdn()
gethostnbyaddr()
getprotobyname(0
getservbyname()
getservbyport()
socket.gethostbyname(socket.getfqdn())
# gives you the primary IP address of this machine
123. net_py
Using getaddrinfo() and getsockaddr():
#!/usr/bin/env python
# Foundations of Python Network Programming - Chapter 4 - www_ping.py
# Find the WWW service of an arbitrary host using getaddrinfo().
import socket, sys
if len(sys.argv) != 2:
print >>sys.stderr, 'usage: www_ping.py <hostname_or_ip>'
sys.exit(2)
hostname_or_ip = sys.argv[1]
try:
infolist = socket.getaddrinfo(
hostname_or_ip, 'www', 0, socket.SOCK_STREAM, 0,
socket.AI_ADDRCONFIG | socket.AI_V4MAPPED | socket.AI_CANONNAME,
)
except socket.gaierror, e:
print 'Name service failure:', e.args[1]
sys.exit(1)
124. net_py
Using getaddrinfo() and getsockaddr():
info = infolist[0] # per standard recommendation, try the first one
socket_args = info[0:3]
address = info[4]
s = socket.socket(*socket_args)
try:
s.connect(address)
except socket.error, e:
print 'Network failure:', e.args[1]
else:
print 'Success: host', info[3], 'is listening on port 80'
125. net_py
Just to be sure 1:
#!/usr/bin/env python
# Foundations of Python Network Programming - Chapter 4 - forward_reverse.py
import socket, sys
if len(sys.argv) != 2:
print >>sys.stderr, 'usage: forward_reverse.py <hostname>'
sys.exit(2)
hostname = sys.argv[1]
try:
infolist = socket.getaddrinfo(
hostname, 0, 0, socket.SOCK_STREAM, 0,
socket.AI_ADDRCONFIG | socket.AI_V4MAPPED | socket.AI_CANONNAME )
except socket.gaierror, e:
print 'Forward name service failure:', e.args[1]
sys.exit(1)
info = infolist[0] # choose the first, if there are several addresses
canonical = info[3]
socketname = info[4]
ip = socketname[0]
126. net_py
Just to be sure 2:
if not canonical:
print 'WARNING! The IP address', ip, 'has no reverse name'
sys.exit(1)
print hostname, 'has IP address', ip
print ip, 'has the canonical hostname', canonical
# Lowercase for case-insensitive comparison, and chop off hostnames.
forward = hostname.lower().split('.')
reverse = canonical.lower().split('.')
if forward == reverse:
print 'Wow, the names agree completely!'
sys.exit(0)
127. net_py
Just to be sure 3:
# Truncate the domain names, which now look like ['www', mit', 'edu'],
# to the same length and compare. Failing that, be willing to try a
# compare with the first element (the hostname?) lopped off if both of
# they are the same length.
length = min(len(forward), len(reverse))
if (forward[-length:] == reverse[-length:]
or (len(forward) == len(reverse)
and forward[-length+1:] == reverse[-length+1:]
and len(forward[-2]) > 2)): # avoid thinking '.co.uk' means a match!
print 'The forward and reverse names have a lot in common'
else:
print 'WARNING! The reverse name belongs to a different organization'
131. net_py
DNS Packet
Q/R: 0/1
Opcode: Name or pointer
AA: Answer is authoritative(1)
TC: truncated
RD: Recursion desired (1)
RA: Recursion available (1)
rcode: ) - ok, 3 – invalid name from AA
136. net_py
How to Interpret Previous Slide:
Blue area is the entire response packet.
Bytes 2D-35 are the number of Questions (1), Answers (1),
Authoritative Answers (2), Additional Answers(2).
Byte 36 is the length of the first label in the string
Query Type: 00 0C
Query Class: 00 01 (ends on byte 54)
And so on ...
48.1.140.137.in-addr.arpa
137. net_py
Handling Repeated Strings:
Bytes 55 and 56 ought to be the beginning of the same IP
address string repeated. Instead there are two bytes – C0 0C.
When you see most significant four bits of a byte as C this
indicates that the next 12 bits are a pointer into the response
packet.
In our case the pointer is 0C. So count C bytes into the packet
and guess where this gets you – right to the beginning of the
original version of
namely, 02 34 38 01 31 03 31 34 30 03 ...
And so on ...
48.1.140.137.in-addr.arpa
138. net_py
When to use DNS directly:
The authors of our text suggest we stick to getaddrinfo() for all
our addressing needs except one – find the mail server for a
remote email address.
Suppose you want to send an email to someone but your own
mail server is down. Since you normally use your own email
server to route the email to the destination mail server, your
email can't be sent unless you can by-pass your own email
server.
139. net_py
Email Servers: What protocols are involved in email.
SMTP
SMTP
POP3 or
IMAP
“your” mail server “their” mail server
your client:
browser,
thunderbird
their client:
perhaps not
running right
now
140. net_py
Email Servers: How to by-pass your local mail server.
SMTP
SMTP
POP3 or
IMAP
“your” mail server “their” mail server
your client:
a python program
that knows how to
send an email to a
remote mail server
(Chapter 13)
their client:
perhaps not
running right
now
SMTP
141. net_py
How to send an email without your local email server:
Ask the DNS server for a remote domain name for the MX
resource.
The reply should come back with the domain name and/or IP
address of the mail server for that domain.
Build up your email message with the necessary header fields
and send it off to the remote email server (port numbers: 25
and 465).
142. net_py
How to find a remote Mail Server:
Ask the DNS server for a remote domain name for the MX
resource.
[pletcha@archimedes 04]$ cat dns_mx.py
#!/usr/bin/env python
# Foundations of Python Network Programming - Chapter 4 - dns_mx.py
# Looking up a mail domain - the part of an email address after the `@`
import sys, DNS
if len(sys.argv) != 2:
print >>sys.stderr, 'usage: dns_basic.py <hostname>'
sys.exit(2)
143. net_py
def resolve_hostname(hostname, indent=0):
"""Print an A or AAAA record for `hostname`; follow CNAMEs if necessary."""
indent = indent + 4
istr = ' ' * indent
request = DNS.Request()
reply = request.req(name=sys.argv[1], qtype=DNS.Type.A)
if reply.answers:
for answer in reply.answers:
print istr, 'Hostname', hostname, '= A', answer['data']
return
reply = request.req(name=sys.argv[1], qtype=DNS.Type.AAAA)
if reply.answers:
for answer in reply.answers:
print istr, 'Hostname', hostname, '= AAAA', answer['data']
return
reply = request.req(name=sys.argv[1], qtype=DNS.Type.CNAME)
if reply.answers:
cname = reply.answers[0]['data']
print istr, 'Hostname', hostname, 'is an alias for', cname
resolve_hostname(cname, indent)
return
print istr, 'ERROR: no records for', hostname
144. net_py
def resolve_email_domain(domain):
"""Print mail server IP addresses for an email address @ `domain`."""
request = DNS.Request()
reply = request.req(name=sys.argv[1], qtype=DNS.Type.MX)
if reply.answers:
print 'The domain %r has explicit MX records!' % (domain,)
print 'Try the servers in this order:'
datalist = [ answer['data'] for answer in reply.answers ]
datalist.sort() # lower-priority integers go first
for data in datalist:
priority = data[0]
hostname = data[1]
print 'Priority:', priority, ' Hostname:', hostname
resolve_hostname(hostname)
else:
print 'Drat, this domain has no explicit MX records'
print 'We will have to try resolving it as an A, AAAA, or CNAME'
resolve_hostname(domain)
DNS.DiscoverNameServers()
resolve_email_domain(sys.argv[1])
145. net_py
Send a simple email:
Following slide, from Chapter 13, sends a simple email
directly to the remote email server.
Exercise: Combine the previous two programs.
146. net_py
Sending a simple email message:
[pletcha@archimedes 13]$ cat simple.py
#!/usr/bin/env python
# Basic SMTP transmission - Chapter 13 - simple.py
import sys, smtplib
if len(sys.argv) < 4:
print "usage: %s server fromaddr toaddr [toaddr...]" % sys.argv[0]
sys.exit(2)
server, fromaddr, toaddrs = sys.argv[1], sys.argv[2], sys.argv[3:]
message = """To: %s
From: %s
Subject: Test Message from simple.py
Hello,
This is a test message sent to you from the simple.py program
in Foundations of Python Network Programming.
""" % (', '.join(toaddrs), fromaddr)
s = smtplib.SMTP(server)
s.sendmail(fromaddr, toaddrs, message)
print "Message successfully sent to %d recipient(s)" % len(toaddrs)
148. Bytes and Octets, ASCII and Unicode
Early on bytes could be anywhere from 5 to 9 bits so octet
came into use to tell us exactly what we were talking about.
Today bytes are also universally 8 bits so we have two names
for the same thing.
Unicode (16-bit codes) is an expansion of ASCII (8-bit codes).
Authors recommend always using Unicode for strings (but
don't follow their own advice.
elvish = u'Namárië!'
149. Unicode 2 Network
Unicode characters that need to be transmitted across a
network are sent as octets.
We need a Unicode2Network conversion scheme.
Enter 'utf-8'
For example, the uft-8 encoding of the character ë is the two
characters C3 AB.
Understand that the above string means that when printed,
printables are themselves and unprintables are xnn where nn
is a hexadecimal value.
>>> elvish = u'Namárië!'
>>> elvish.encode('utf-8')
'Namxc3xa1rixc3xab!'
150. Other Encodings
There are many choices fro encoding schemes.
utf-16: 'xffxfe' represents byte order and all other characters
are represented in 2 octets, typically <p>x00 where <p>
means “printable”
>>> elvish.encode('utf-16')
'xffxfeNx00ax00mx00xe1x00rx00ix00xebx00!x00'
>>> elvish.encode('cp1252')
'Namxe1rixeb!'
>>> elvish.encode('idna')
'xn--namri!-rta6f'
>>> elvish.encode('cp500')
'xd5x81x94Ex99x89SO'
151. Decodings:
Upon receipt, byte streams need to be decoded. To do this the
encoding needs to be understood and then things are easy.
>>> print 'Namxe1rixeb!'.decode('cp1252')
Namárië!
>>> print 'Namxe1rixeb!'.decode('cp1252')
Namárië!
>>> print 'Namxe1rixeb!'.decode('cp1252')
Namárië!
152. Decodings:
Note that if you are not “printing” that decode returns some
universal representation of the original string.
>>> 'Namxe1rixeb!'.decode('cp1252')
u'Namxe1rixeb!'
>>> print 'Namxe1rixeb!'.decode('cp1252')
Namárië!
>>> 'xd5x81x94Ex99x89SO'.decode('cp500')
u'Namxe1rixeb!'
>>> 'xn--namri!-rta6f'.decode('idna')
u'namxe1rixeb!'
>>> 'xffxfeNx00ax00mx00xe1x00rx00ix00xebx00!x00'.decode('utf-16')
u'Namxe1rixeb!'
>>> 'Namxc3xa1rixc3xab!'.decod('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decod'
>>> 'Namxc3xa1rixc3xab!'.decode('utf-8')
u'Namxe1rixeb!'
153. Do it yourself; or not!
If you use high-level protocols (and their libraries) like HTTP
encoding is done for you.
If not, you'll need to do it yourself.
154. Not supported:
ASCII is a 7-bit code so can't be used to encode some things.
>>> elvish.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'xe1' in position 3:
ordinal not in range(128)
155. Variable length encodings:
Some codecs have different encodings of characters in
different lengths.
Example, utf-16 uses either 16 or 32 bits to encode a
character.
utf-16 adds prefix bytes - xffxfe.
All these things make it hard to pick out individual characters
156. Network Byte Order
Either big-endian or little-endian.
Typically needed for binary data. Text is handled by encoding
(and knowing where your message ends (framing)).
Problem: Send 4253 across a netwrok connection
Solution 1: Send '4253'
Problem: Need to convert string <--> number. Lots of
arithmetic.
Still, lots of situations do exactly this (HTTP, for example,
since it is a text protocol)
We used to use dense binary protocols but less and less.
157. How does Python see 4253?
Python stores a number as binary, we can look at its hex
representation as follows:
Each hex digit is 4 bits.
Computers store this value in memory using big-endian (most
significant bits first) or little-endian (least significant bits first)
format.
>>> hex(4253)
'0x109d'
158. Python's perspective on a religious war.
Python is agnostic.
'<': little-endian
'>': big-endian
'i': integer
'!': network perspective (big-endian)
>>> import struct
>>> struct.pack('<i',4253)
'x9dx10x00x00'
>>> struct.pack('>i',4253)
'x00x00x10x9d'
>>> struct.pack('!i',4253)
'x00x00x10x9d'
>>> struct.unpack('!i','x00x00x10x9d')
(4253,)
160. Framing
UDP does framing for you. Data is transmitted in the same
chucks it is received from the application
In TCP you have to frame your own transmitted data.
Framing answers the question, “When is it safe to stop calling
recv()?
161. Simple Example: Single Stream
Send data with no reply
import socket, sys
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
HOST = sys.argv.pop() if len(sys.argv) == 3 else '127.0.0.1'
PORT = 1060
if sys.argv[1:] == ['server']:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((HOST, PORT))
s.listen(1)
print 'Listening at', s.getsockname()
sc, sockname = s.accept()
print 'Accepted connection from', sockname
sc.shutdown(socket.SHUT_WR)
message = ''
while True:
more = sc.recv(8192) # arbitrary value of 8k
if not more: # socket has closed when recv() returns ''
break
message += more
print 'Done receiving the message; it says:'
print message
sc.close()
s.close()
162. Simple Example
elif sys.argv[1:] == ['client']:
s.connect((HOST, PORT))
s.shutdown(socket.SHUT_RD)
s.sendall('Beautiful is better than ugly.n')
s.sendall('Explicit is better than implicit.n')
s.sendall('Simple is better than complex.n')
s.close()
else:
print >>sys.stderr, 'usage: streamer.py server|client [host]'
163. Simple Example: Streaming in both directions; one RQ, one
RP
Important cariat: Always complete streaming in one direction
before beginning in the opposite direction. If not, deadlock can
happen.
164. Simple Example: Fixed Length Messages
In this case use TCP's sendall() and write your own recvall().
Rarely happens.
def recvall(sock, length):
data = ''
while len(data) < length:
more = sock.recv(length - len(data))
if not more:
raise EOFError('socket closed %d bytes into a %d-byte message'
% (len(data), length))
data += more
return data
165. Simple Example: Delimit Message with Special Characters.
Use a character outside the range of possible message
characters unless the message is binary.
Authors' recommendation is to use this only if you know the
message “alphabet” is limited.
If you need to use message characters then “escape” them
inside the message.
Using this approach has issues – recognizing an escaped
character, removing the escaping upon arrival and message
length.
166. Simple Example: Prefix message with its length
Popular with binary data.
Don't forget to “frame” the length itself.
What if this is your choice but you don't know in advance the
length of the message? Divide your message up into known
length segments and send them separately. Now all you need
is a signal for the final segment.
167. Listing 5-2.
#!/usr/bin/env python
# Foundations of Python Network Programming - Chapter 5 - blocks.py
# Sending data one block at a time.
import socket, struct, sys
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
HOST = sys.argv.pop() if len(sys.argv) == 3 else '127.0.0.1'
PORT = 1060
format = struct.Struct('!I') # for messages up to 2**32 - 1 in length
def recvall(sock, length):
data = ''
while len(data) < length:
more = sock.recv(length - len(data))
if not more:
raise EOFError('socket closed %d bytes into a %d-byte message'
% (len(data), length))
data += more
return data
169. Listing 5-2.
elif sys.argv[1:] == ['client']:
s.connect((HOST, PORT))
s.shutdown(socket.SHUT_RD)
put(s, 'Beautiful is better than ugly.')
put(s, 'Explicit is better than implicit.')
put(s, 'Simple is better than complex.')
put(s, '')
s.close()
else:
print >>sys.stderr, 'usage: streamer.py server|client [host]'
170. HTTP Example:
• Uses a delimiter - 'rnrn' – for the header and Content-
Length field in the header for possibly purely binary data.
171. Pickles:
• Pickles is native serialization built into Python.
• Serialization is used to send objects that include pointers
across the network where the pointers ill have to be rebuilt.
• Pickling is a mix of text and data:
• At the other end:
>>> import pickle
>>> pickle.dumps([5,6,7])
'(lp0nI5naI6naI7na.'
>>>
>>> pickle.dumps([5,6,7])
'(lp0nI5naI6naI7na.'
>>> pickle.loads(('(lp0nI5naI6naI7na.An apple day') )
[5, 6, 7]
172. Pickles:
• Problem in network case is that we can't tell how many bytes
of pickle data were consumed before we get to what follows
(“An apple a day”).
• If we use load() function on a file instead, then the file pointer
is maintained and we can ask its location.
• Remember that Python lets you turn a socket into a file
object – makefile().
>>> from StringIO import StringIO
>>> f = StringIO('(lp0nI5naI6naI7na.An apple day')
>>> pickle.load(f)
[5, 6, 7]
>>> f.pos
18
>>> f.read()
'An apple day'
>>>
173. JSON
• Popular and easily allows data exchange between software
written in different languages.
• Does not support framing.
• JSON supports Unicode but not binary (see BSON)
• See Chapter 18
>>> import json
>>> json.dumps([51,u'Namárië!'])
'[51, "Namu00e1riu00eb!"]'
>>> json.loads('{"name": "lancelot", "quest" : "Grail"}')
{u'quest': u'Grail', u'name': u'lancelot'}
>>>
174. XML
• Popular and easily allows data exchange between software
written in different languages.
• Does not support framing.
• Best for text documents.
• See Chapter 10
175. Compression
• Time spent transmitting much longer than time pre- and post-
processsing exchanged data.
• HTTP lets client and server decide whether to compress or
not.
• zlib is self-framing. Start feeding it a compressed data stream
and it will know when the stream has come to an end.
>>> data = zlib.compress('sparse')+'.'+zlib.compress('flat')+'.'
>>> data
'xx9c+.H,*Nx05x00trx02x8f.xx9cKxcbI,x01x00x04x16x01xa8.'
>>> len(data)
28
>>>
did not try to compress this
176. Compression
• Suppose the previous data arrives in 8-byte chunks.
• We are still expecting more data.
>>> dobj = zlib.decompressobj()
>>> dobj.decompress(data[0:8]), dobj.unused_data
('spars', '')
>>> indicates we haven't reached EOF
>>> dobj.decompress(data[8:16]), dobj.unused_data
('e', '.x')
>>>
says we consumed the first compressed bit
and some data was unused.
177. Compression
• Skip over the '.' and start to decompress the rest of the
compressed data
>>> dobj2 = zlib.decompressobj()
>>> dobj2.decompress('x'), dobj2.unused_data
('', '')
>>> dobj2.decompress(data[16:24]), dobj2.unused_data
('flat', '')
>>> dobj2.decompress(data[24:]), dobj2.unused_data
('', '.')
>>>
final '.'; the point is, the stuff we have gathered so far
'' + 'flat' + ''
consists of all the data compressed by the 2nd use
of zlib.compress()
NOTE: Using zlib regularly provides its own framing.
178. Network Exceptions:
• Many possibilities, some specific (socket.timeout) and some
generic (socket.error).
• Homework: Write two short python scripts; one that opens a
UDP socket connected to a remote socket. The second
program tries to send data to the previous socket but will fail
since its socket is not the one the other was “connected” to.
Find out the exact error that Python returns, along with the
value of ErrNo.
• Familiar exceptions – socket.gaierror, socket.error,
socket.timeout.
180. net_py
Security:
Before you send data you want to be sure of the machine with
which you are communicating – getaddrinfo()
Once you are in communication with another host you want to
be sure that no one is “listening in” on the conversation –
Transport Layer Security (TSL).
181. net_py
Computer Security:
It is a bad world out there; believe it.
Criminals, script “kiddies”, governments, militaries.
Authors' suggestions:
Test your code: Ned Batchelder's coverage.
Isolate your code: virtualenv
Write as little as possible: rely on third party libraries like
googlemaps
Use a high-level language: python
Learn about known attack techniques: cross-scripting, SQL
injection, privilege escalation, viruses, trojan horses, etc.
Spend time verifying data that has traversed the Internet.
182. net_py
IP Access Rules:
Used to be we trusted everyone: finger, whois, telnet, echo
timed, ...
Effective protection restricts who can access your service.
TCP Wrappers: /etc/hosts.allow and /etc/hosts.deny.
Safest way is to deny all (ALL) and selectively allow some.
Man page reading: (man 5 host.allow)
Access will be granted when a (daemon,client) pair
matches an entry in the /etc/hosts.allow file.
Otherwise, access will be denied when a (daemon,client)
pair matches an entry in the /etc/hosts.deny file.
Otherwise, access will be granted.
183. net_py
Some Rules
Deny all
And then allow specifics
ALL: ALL
ALL:127.0.0.1
sshd: ALL
portmap: 192.168.7
184. net_py
Why Not Build Filtering into Python?
We could pattern-match on IP addresses.
Sys Admins now use firewalls, rather than trust that individual
services are well-protected bytheir own code.
IP address restrictions are not enough; although they can be
effective in denial-of-service attacks.
Some protections should be at the edge of your network
Some protections are better provided by your OS (iptables).
Simple python example:
sc, sockname = s.accept()
if not sockname[0].startswith('192.168.'):
raise RuntimeError('can not connect from other networks')
185. net_py
Cleartext on the Network
Possible attacks:
sniffing: somebody watches your traffic while sitting in a
coffee shop or sets up near a popular tourist site - tcpdump or
wireshark
effectiveness depends on amount of traffic.
usernames and passwords are visible; either customer or
backend for a “replay” attack
log messages can be intercepted – can see what “errors”
look like (perhaps attacker's own mistakes)
log message might include tracebacks.
Might break into the database server itself if webserver2db
traffic is visible.
186. net_py
Flank attack:
What if someone can see or manipulate your DNS service?
By redirecting traffic to yourdb.example.com an attacker
can find out userID/PW pairs although a fake db server will
soon run out of “answers”
What if the fake database server forwards db requests to
the real database and then logs all answers (man-in-the-
middle). This even works with one-time passwords unlike
“replay”.
Insert SQL queries into the data stream and download an
entire database.
This can all happen even with no compromising of the
server or network itself; just interfere with the naming
service.
187. net_py
Or Controls a Network Gateway
All the previous attacks are possible and DNS is safe.
188. net_py
TLS:
TLS uses public-key encryption: Two keys, one private and
one public.
Each key is a few kb of data put in a data file with base64
encoding.
Features of public-key encryption:
Anyone can generate a key pair (private,public)
If someone uses your public key to encrypt data then only
someone holding your private key can decrypt.
If the private key is used to encrypt then any copy of the public
key can decrypt it. Data is not secret but identity of sender is
confirmed.
189. net_py
TLS Use of Public-key Encryption:
Certificate Authority System: Lets servers prove who they
really are and lets a server and a client communicate securely.
Symmetric-key Encryption is faster. TLS is used to set up a
symmetric key and then both ends switch over to the
symmetric key.
Details: what is the strongest symmetric key both ends
support?
In TLS, the terms “server” and “client” only identify who
speaks first about encryption and who speaks second.
190. net_py
TLS Verifies Identities:
Could someone perform a “man-in-the-middle” attack
encrypting to you, decrypting momentarily to store the data
exchanged and then re-encrypting to send data to the other
end.
TLS must perform identity check.
Servers start by sharing a public key. The key they distribute
has been “signed” for them by a certificate authority (CA) (you
pay for this).
A CA sets up their own key pair and then begins “signing”
anyone else's public key using the CA's private key. Signing
involves encrypting a hash of the server's public key.
A server sends out its public key along with the signed version
of the same key
191. net_py
TLS Verifies Identities:
The client now uses the CA's public key to decrypt the signed
data.
The decrypted info says that you can trust anyone calling
themselves mysite.com if their public key hashes to xyz.
It is possible that this is coming from a host trying to inject
itself into the conversation.
Suppose a third party sends out a servers public key and the
same servers certificate? The client can decrypt the certificate
using the CA's public key (so it knows the certificate is
authentic). The decrypted certificate says who you “should be”
talking to.
At this point the client sends back a symmetric key encrypted
with the server's public key. If the certificate didn't come from
the original server then the receiver won't be able to get the
symmetric key and continue the conversation.
192. net_py
TLS Verifies Identities
Clients trust this process because they trust the CA to keep its
own private key secure.
The CA is also trusted to ensure that the pair ( mysite.com,
server public key) is real.
Clients can keep copies of signed certificates for comparison
during future exchanges (so no need to decrypt, etc).
If you control, as a server, who your clients are you can sign
your own certificates with a new key and physically move the
certificate to each client. This way you save money.
You can also sign your public key with itself but then who can
trust you?
193. net_py
Installing SSL for Python
Create a new virtual environment and install two packages
$ pip install backports.ssl_match_hostname
$ pip-2.5 install ssl # for Python 2.5 only
195. net_py
How to Code TSL;
Some Client code(connected to a secure web server):
[pletcha@archimedes 06]$ cat sslclient.py
#!/usr/bin/env python
# Foundations of Python Network Programming - Chapter 6 - sslclient.py
# Using SSL to protect a socket in Python 2.6 or later
import os, socket, ssl, sys
from backports.ssl_match_hostname import match_hostname, CertificateError
try:
script_name, hostname = sys.argv
except ValueError:
print >>sys.stderr, 'usage: sslclient.py <hostname>'
sys.exit(2)
# First we connect, as usual, with a socket.
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((hostname, 443))
196. net_py
How to Code TSL;
# Next, we turn the socket over to the SSL library!
ca_certs_path = os.path.join(os.path.dirname(script_name), 'certfiles.crt')
sslsock = ssl.wrap_socket(sock, ssl_version=ssl.PROTOCOL_SSLv3,
cert_reqs=ssl.CERT_REQUIRED, ca_certs=ca_certs_path)
# Does the certificate that the server proffered *really* match the
# hostname to which we are trying to connect? We need to check.
try:
match_hostname(sslsock.getpeercert(), hostname)
except CertificateError, ce:
print 'Certificate error:', str(ce)
sys.exit(1)
# From here on, our `sslsock` works like a normal socket. We can, for
# example, make an impromptu HTTP call.
sslsock.sendall('GET / HTTP/1.0rnrn')
result = sslsock.makefile().read() # quick way to read until EOF
sslsock.close()
print 'The document https://%s/ is %d bytes long' % (hostname, len(result))
197. net_py
When can we use this code?
We can use it against big sites that will have a certificate
signed by some CA.
It won't work on a location that does not work with a site that
only provides a self-signed certificate
[pletcha@archimedes 06]$ python sslclient.py www.openssl.org
The document https://www.openssl.org/ is 16000 bytes long
I have no available server to test this
198. net_py
When can we use this code?
At New Paltz only some servers have certificates:
It won't work on other servers on campus with a domain name
that does not match the domain name for the certificate we
possess.
[pletcha@archimedes 06]$ python sslclient.py www.newpaltz.edu
The document https://www.newpaltz.edu/ is 50823 bytes long
[pletcha@archimedes 06]$ python sslclient.py wyvern.cs.newpaltz.edu
Certificate error: hostname 'wyvern.cs.newpaltz.edu' doesn't match
either of '*.newpaltz.edu', 'newpaltz.edu'
199. net_py
Some Other Situations:
Google provides a certificate for www.google.com as well as
google.com, which is an alias for the same site.
[pletcha@archimedes 06]$ python sslclient.py www.google.com
The document https://www.google.com/ is 47926 bytes long
[pletcha@archimedes 06]$ python sslclient.py google.com
The document https://google.com/ is 47894 bytes long
[pletcha@archimedes 06]$ python sslclient.py maps.google.com
The document https://maps.google.com/ is 47898 bytes long
200. net_py
Server Code:
Client code expressly says the server must send a certificate.
Server code doesn't expect a certificate from the client.
Except some times
sslsock = ssl.wrap_socket(sock, server_side = True,
ssl_version=ssl.PROTOCOL_SSLv23,
cert_reqs=ssl.CERT_NONE,
keyfile=”mykeyfile”, certfile=”mycertfile”)
sslsock = ssl.wrap_socket(sock, server_side = True,
ssl_version=ssl.PROTOCOL_SSLv23,
cert_reqs=ssl.CERT_REQUIRED,
ca_certs=ca_certs_path,
keyfile=”mykeyfile”, certfile=”mycertfile”)
201. net_py
Exercise:
Modify the details of slide 16 to take into account the
expectation that the client as well as the server need supply a
signed certificate.
203. net_py
Scaling up from one client at a time
All server code in the book, up to now, dealt with one client at
a time.
Except our last chatroom homework.
Options for scaling up:
event driven: See chatroom example. problem is its
restriction to a single CPU or core
multiple threads
multiple processes (in Python, this really exercises all
CPUs or cores)
204. net_py
Load Balancing I
Prior to your server code via DNS round-robin:
; zone file fragment
ftp IN A 192.168.0.4
ftp IN A 192.168.0.5
ftp IN A 192.168.0.6
www IN A 192.168.0.7
www IN A 192.168.0.8
; or use this format which gives exactly the same result
ftp IN A 192.168.0.4
IN A 192.168.0.5
IN A 192.168.0.6
www IN A 192.168.0.7
IN A 192.168.0.8
205. net_py
Load Balancing II
Have your own machine front an array of machines with the
same service on each and forward service requests in a
round-robin fashion.
206. net_py
Daemons and Logging:
“Daemon” means the program is isolated from the terminal in
which it was executed. So if the terminal is killed the program
continues to live.
The Python program supervisord does a good job in this
isolation process and in addition offers the following services:
starts and monitors services
re-starts a service that terminates and stops doing so if the
service terminates several times in a short period of time.
http://www.supervisord.org
supervisord sends stdout and stderr output to a log file system
that cycles through log, log.1, log.2, log.3 and log.4.
207. net_py
Logging continued:
Better solution is to import your own logging module and save
things to a log in that way.
logging has the benefit of writing to what you want - files, tcp/ip
connection, printer, whatever.
It can also be customized from a configuration file called
logging.conf by using the logging.fileConfig() method.
import logging
log = logging.getLoger(__name__)
log.error('This is a mistake')
208. net_py
Sir Launcelot:
The following is an importable module:
#!/usr/bin/env python
# Foundations of Python Network Programming - Chapter 7 - lancelot.py
# Constants and routines for supporting a certain network conversation.
import socket, sys
PORT = 1060
qa = (('What is your name?', 'My name is Sir Lancelot of Camelot.'),
('What is your quest?', 'To seek the Holy Grail.'),
('What is your favorite color?', 'Blue.'))
qadict = dict(qa)
def recv_until(sock, suffix):
message = ''
while not message.endswith(suffix):
data = sock.recv(4096)
if not data:
raise EOFError('socket closed before we saw %r' % suffix)
message += data
return message
209. net_py
Sir Launcelot II:
The following is part of an importable module:
def setup():
if len(sys.argv) != 2:
print >>sys.stderr, 'usage: %s interface' % sys.argv[0]
exit(2)
interface = sys.argv[1]
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind((interface, PORT))
sock.listen(128)
print 'Ready and listening at %r port %d' % (interface, PORT)
return sock
211. net_py
Details:
The server has two nested infinite loops – one iterating over
different client/server exchanges and one iterating over the
individual client/server exchange until the client terminates.
The server is very inefficient; it can only server one client at a
time.
If too many clients try to attach the connection queue will fill up
and prospective clients will be dropped. Hence the #WHS will
not even begin; let alone complete.
212. net_py
Elementary Client:
This client asks each of the available questions once and only
once and then disconnects.
#!/usr/bin/env python
import socket, sys, lancelot
def client(hostname, port):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((hostname, port))
s.sendall(lancelot.qa[0][0])
answer1 = lancelot.recv_until(s, '.') # answers end with '.'
s.sendall(lancelot.qa[1][0])
answer2 = lancelot.recv_until(s, '.')
s.sendall(lancelot.qa[2][0])
answer3 = lancelot.recv_until(s, '.')
s.close()
print answer1
print answer2
print answer3
213. net_py
Elementary Client II:
The rest
It seems fast but is it really?
To test this for real we need some realistic network latency so
shouldn't use localhost.
We also need to measure microsecond behaviour.
if __name__ == '__main__':
if not 2 <= len(sys.argv) <= 3:
print >>sys.stderr, 'usage: client.py hostname [port]'
sys.exit(2)
port = int(sys.argv[2]) if len(sys.argv) > 2 else lancelot.PORT
client(sys.argv[1], port)
214. net_py
Tunnel to another machine:
ssh -L 1061:archimedes.cs.newpaltz.edu:1060 joyous.cs.newpaltz.edu
215. net_py
Tunneling:
See page 289 for this feature
Alternatively, here is agood explanation of various possible
scenarios
http://www.zulutown.com/blog/2009/02/28/
ssh-tunnelling-to-remote-servers-and-with-local-address-binding/
http://toic.org/blog/2009/reverse-ssh-port-forwarding/#.Uzr2zTnfHsY
216. net_py
More on SSHD Port Forwarding:
Uses:
– access a backend database that is only visible on the
local subnet
– your ISP gives you a shell account but expects emails
to be sent from their browser mail client to their server
– reverse port forwarding
then
ssh -L 3306:mysql.mysite.com user@sshd.mysite.com
ssh -L 8025:smtp.homeisp.net:25 username@shell.homeisp.net
ssh -R 8022:localhost:22 username@my.home.ip.address
ssh -p 8022 username@localhost
217. net_py
Waiting for Things to Happen:
So now we have traffic that takes some time to actually move
around.
We need to time things.
If your function, say foo(), is in a file called myfile.py then the
script called my_trace.py will time the running of foo() from
myfile.py.
218. net_py
My Experiment
Set up VPN from my home so I have a New Paltz IP address
Use joyous.cs.newpaltz.edu as my remote machine
Have both server and client run on my laptop
[pletcha@archimedes 07]$ ssh -L 1061:137.140.108.130:1060 joyous
[pletcha@archimedes 07]$ python my_trace.py handle_client server_simple.py ''
python my_trace.py client client.py localhost 1061
219. net_py
my_trace.py
#!/usr/bin/env python
# Foundations of Python Network Programming - Chapter 7 - my_trace.py
# Command-line tool for tracing a single function in a program.
import linecache, sys, time
def make_tracer(funcname):
def mytrace(frame, event, arg):
if frame.f_code.co_name == funcname:
if event == 'line':
_events.append((time.time(), frame.f_code.co_filename,
frame.f_lineno))
return mytrace
return mytrace
220. net_py
my_trace.py
if __name__ == '__main__':
_events = []
if len(sys.argv) < 3:
print >>sys.stderr, 'usage: my_trace.py funcname other_script.py ...'
sys.exit(2)
sys.settrace(make_tracer(sys.argv[1]))
del sys.argv[0:2] # show the script only its own name and arguments
try:
execfile(sys.argv[0])
finally:
for t, filename, lineno in _events:
s = linecache.getline(filename, lineno)
sys.stdout.write('%9.6f %s' % (t % 60.0, s))
221. net_py
My Output:
43.308772 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
Δ = 83 μs
43.308855 s.connect((hostname, port))
Δ = 644 μs
43.309499 s.sendall(lancelot.qa[0][0])
Δ = 41 μs
43.309540 answer1 = lancelot.recv_until(s, '.') # answers end with '.'
Δ = 241 ms
43.523284 while True:
Δ = 8 μs
43.523292 question = lancelot.recv_until(client_sock, '?')
Δ = 149 ms
43.672060 answer = lancelot.qadict[question]
Δ = 9 μs
43.672069 client_sock.sendall(answer)
Δ = 55 μs
43.672124 while True:
Δ = 4 μs
43.672128 question = lancelot.recv_until(client_sock, '?')
Δ = 72 ms
43.744381 s.sendall(lancelot.qa[1][0])
skip
this iteration
224. net_py
Observations:
Server finds the answer in 10 microseconds (answer =) so
could theoretically answer 100000 questions per second.
Each sendall() takes ~60 microseconds while each recv_until()
takes ~60 milliseconds (1000 times slower).
Since receiving takes so long we can't process more than 16
questions per second with this iterative server.
The OS helps where it can. Notice that sendall() is 1000 times
faster than recv_until(). This is because the sendall() function
doesn't actually block until data is sent and ACKed. It returns
as soon as the data is delivered to the TCP layer. The OS
takes care of guaranteeing delivery.
225. net_py
Observations:
219 milliseconds between moment when client executes
connect() and server executes recv_all(). If all client requests
were coming from the same process, sequentially this means
we could not expect more than 4 sessions per second.
All the time the server is capable of answering 33000 sessions
per second.
So, communication and most of all, sequentiality really slow
things down.
So much server time not utilized means there has to be a
better way.
15-20 milliseconds for one question to be answered so
roughly 40-50 questions per second. Can we do better than
this by increasing the number of clients?
226. net_py
Benchmarks:
See page 289 for ssh -L feature
Funkload: A benchmarking tool that is written in python and
lets you run more and more copies of something you are
testing to see how things struggle with the increased load.
227. net_py
Test Routine:
Asks 10 questions instead of 3
#!/usr/bin/env python
from funkload.FunkLoadTestCase import FunkLoadTestCase
import socket, os, unittest, lancelot
SERVER_HOST = os.environ.get('LAUNCELOT_SERVER', 'localhost')
class TestLancelot(FunkLoadTestCase): # python syntax for sub-class
def test_dialog(self): # In Java & C++, receiver objects are implicit;
# in python they are explicit (self == this.
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((SERVER_HOST, lancelot.PORT))
for i in range(10):
question, answer = lancelot.qa[i % len(launcelot.qa)]
sock.sendall(question)
reply = lancelot.recv_until(sock, '.')
self.assertEqual(reply, answer)
sock.close()
if __name__ == '__main__':
unittest.main()
228. net_py
Environment Variables:
You can set a variable from the command line using SET and
make sure it is inherited by all processes run from that
command line in the future by EXPORTING it.
The authors explain why they are using environment variables
- “I can not see any way of to pass actual arguments through
to tests via Funkload command line arguments”
Run server_simple.py on separate machine.
(BenchMark)[pletcha@archimedes BenchMark]$
export LAUNCELOT_SERVER=joyous.cs.newpaltz.edu
230. net_py
Testing Funkload:
Big mixup on Lancelot-Launcelot.
(BenchMark)[pletcha@archimedes BenchMark]$
fl-run-test lancelot_tests.py TestLancelot.test_dialog
.
----------------------------------------------------------------------
Ran 1 test in 0.010s
OK
231. net_py
Benchmark run:
Typical cycle output
Cycle #7 with 16 virtual users
------------------------------
* setUpCycle hook: ... done.
* Current time: 2013-04-11T13:46:34.536010
* Starting threads: ................ done.
* Logging for 8s (until 2013-04-11T13:46:44.187746): .
........................... done.
* Waiting end of threads: ................ done.
http://www.cs.newpaltz.edu/~pletcha/NET_PY/test_dialog-20130411T134423/index.html
232. net_py
Interpretation:
Since we are sending 10 questions per connection (test) we
are answering 1320 questions per second.
We greatly outdid the original 16 questions per second in the
sequential test example.
Adding more than 3 or 4 clients really didn't help.
Remember we still only have a single-threaded server. The
reason for the improvement is that clients can be “pipelined”
with several clients getting something done at the same time.
The only thing that can't be in parallel is answering the
question.
233. net_py
Performance:
Adding clients drags down performance
Insurmountable problem: Server is talking to only one client at
a time.
# Clients #
Question
s
#
Ques/clie
nt
3 1320 403
5 1320 264
10 1320 132
15 1320 99
20 1320 66
235. net_py
Event-driven Servers:
The simple server blocks until data arrives. At that point it can
be efficient.
What would happen if we never called recv() unless we knew
data was already waiting?
Meanwhile we could be watching a whole array of connected
clients to see which one has sent us something to respond to.
237. net_py
Event-driven Servers:
while True:
for fd, event in poll.poll():
sock = sockets[fd]
# Removed closed sockets from our list.
if event & (select.POLLHUP | select.POLLERR | select.POLLNVAL):
poll.unregister(fd)
del sockets[fd]
requests.pop(sock, None)
responses.pop(sock, None)
# Accept connections from new sockets.
elif sock is listen_sock:
newsock, sockname = sock.accept()
newsock.setblocking(False)
fd = newsock.fileno()
sockets[fd] = newsock
poll.register(fd, select.POLLIN)
requests[newsock] = ''
238. net_py
Event-driven Servers:
# Collect incoming data until it forms a question.
elif event & select.POLLIN:
data = sock.recv(4096)
if not data: # end-of-file
sock.close() # makes POLLNVAL happen next time
continue
requests[sock] += data
if '?' in requests[sock]:
question = requests.pop(sock)
answer = dict(lancelot.qa)[question]
poll.modify(sock, select.POLLOUT)
responses[sock] = answer
# Send out pieces of each reply until they are all sent.
elif event & select.POLLOUT:
response = responses.pop(sock)
n = sock.send(response)
if n < len(response):
responses[sock] = response[n:]
else:
poll.modify(sock, select.POLLIN)
requests[sock] = ''
239. net_py
Event-driven Servers:
The main loop calls poll(), which blocks until something/
anything is ready.
The difference is recv() waited for a single client and poll()
waits on all clients.
In the simple server we had one of everything. In this polling
server we have an array of everything; one of each thing
dedicated to each connection.
How poll() works: We tell it what sockets to monitor and what
activity we are interested in on each socket – read or write.
When one or more sockets are ready with something, poll()
returns.
240. net_py
Event-driven Servers:
The life-span of one client:
1: A client connects and the listening socket is “ready”. poll() returns and
since it is the listening socket, it must be a completed 3WHS. We accept()
the connection and tell our poll() function we want to read from this connection.
To make sure they never block we set blocking “not allowed”.
2: When data is available, poll() returns and we read a string and append the
string to a dictionary entry for this connection.
3: We know we have an entire question when '?' arrives. At that point we ask
poll() to write to the same connection.
4: Once the socket is ready for writing (poll() has returned) we send as much of
we can of the answer and keep sending until we have sent '.'.
5: Next we swap the client socket back to listening-for-new-data mode.
6: POLLHUP, POLLERR and POLLNOVAL events occur on send() so when recv()
receives 0 bytes we do a send() to get the error on our next poll().
241. net_py
server_poll.py benchmark:
# Clients # Questioons # Ques/client
3 1800 600
5 2500 500
server_poll.py benchmark
So we see some performance degradation.
http://www.cs.newpaltz.edu/~pletcha/NET_PY/
test_dialog-20130412T081140/index.html
242. net_py
We got Errors
Some connections ended in errors – check out listen().
TCP man page:
tcp_max_syn_backlog (integer; default: see below; since Linux 2.2)
The maximum number of queued connection requests which have
still not received an acknowledgement from the connecting
client. If this number is exceeded, the kernel will begin
dropping requests. The default value of 256 is increased to
1024 when the memory present in the system is adequate or
greater (>= 128Mb), and reduced to 128 for those systems with
very low memory (<= 32Mb). It is recommended that if this
needs to be increased above 1024, TCP_SYNQ_HSIZE in
include/net/tcp.h be modified to keep
TCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel be
recompiled.
socket.listen(backlog)
Listen for connections made to the socket. The backlog argument specifies the
maximum number of queued connections and should be at least 0; the maximum value
is system-dependent (usually 5), the minimum value is forced to 0.
243. net_py
Poll vs Select
poll() code is cleaner but select(), which does the same thing,
is available on Windows.
The author's suggestion: Don't write this kind of code; use an
event-driven framework instead.
244. net_py
Non-blocking Semantics
In non-blocking mode, recv() acts as follows:
– If data is ready, it is returned
– If no data has arrived, socket.error is raised
– if the connection is closed, '' is returned.
Why does closed return data and no data return an error?
Think about the blocking situation.
– First and last can happen and behave as above.
Second situation won't happen.
– The second situation had to do something different.
245. net_py
Non-blocking Semantics:
send() semantics:
– if data is sent, its length is returned
– socket buffers full: socket.error raised
– connection closed: socket.error raised
Last case is interesting. Suppose poll() says a socket is ready
to write but before we call send(), the client sends a FIN.
Listing 7-7 doesn't code for this situation.