SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
Automated Paravirtualization of device drivers in Xen
Nikhil Pujari Vijayakumar M M Sireesh Bolla
Stony Brook University
Abstract:
Xen is an x86 virtual machine monitor which adopts a paravirtualization approach to allow multiple
operating systems to share conventional hardware without sacrificing performance or functionality.
Since it is a paravirtualization approach which provides an idealized virtual machine abstraction,
operating systems need to be ported to be able to install and run them on the Xen VMM. Instead of
emulating existing hardware devices as in full virtualization, Xen provides abstract devices which
implement a high level interface for each device category. Since the interfaces are abstract and provide
generic operations for that device class, extra effort needs to be put for providing the guest operating
systems with the peculiar functionalities that the hardware may provide. These non-generic facilities
are generally exposed by the device drivers in the form of device private ioctls, which are generally for
configuring the devices to enable/disable the non-generic functionality or to collect status information.
As of Xen 3.3, the netfront driver does not implement device private ioctls. User mode configuration
programs like ifconfig which understand the underlying devices use these ioctls. Our method
implements conversion from the local ioctl call in a Linux DomU to a remote call to the Dom0. This is
done by including a generic ioctl wrapper in netfront driver and a watch in the netback driver. This
automates the process of exposing arbitrary functionality provided by the real network hardware to the
DomU through device private ioctls. Only a specification with the ioctl numbers and buffer sizes needs
to be provided which is read by a tool which writes them to the XenStore.
Motivation:
Paravirtualization is a virtualization technique that presents a software interface to virtual machines that
is similar but not identical to that of the underlying hardware. Paravirtualization may allow the virtual
machine monitor (VMM) to be simpler or virtual machines that run on it to achieve performance closer
to non-virtualized hardware. However, operating systems must be explicitly ported to run on top of a
paravirtualized VMM.
Most full virtualization solutions provide emulated forms of simple devices. The emulated devices are
typically chosen to be common hardware, so it is likely that drivers exist already for any given guest.
Paravirtualized guests, however, need to be modified. Therefore, the requirement for
the virtual environment to use existing drivers disappears. Xen provides abstract devices which
correspond to device categories e.g. it provides an abstract block device instead an SCSI device and
IDE device. This device abstraction provides the generic calls corresponding to that device
category e.g. read and write calls for the block device. This is done to achieve efficient I/O
virtualization as opposed to emulation of devices in full virtualization. One of the important
optimizations included in this approach is the grouping of I/O operations, which improves efficiency.
Hardware manufacturers provide generic functionalities for that device class as well as additional
device specific functionalities. For example, a cd/dvd drive which falls into the block device category
provides the generic read and write capabilities, as well offer the “special”/non-generic
capability of multisessioning. An ethernet device may offer special capabilities such as jumbo frames
and checksum calculations, in addition to the generic functionalities of send and receive.
The device drivers provided by Xen to the guest operating systems need to be modified in order to
enable the guests to exploit these special features provided by the hardware. The aim of the project is to
automate this process of modification to the Xen ethernet device drivers as much as possible, given the
specifications of the ethernet device/NIC to be used.
The initial aim of the project was to evaluate the feasibility of porting arbitrary network device drivers
to Xen and whether we could automate the process. After examining, Xen’s approach and
implementation of network I/O virtualization we determined that there is little need of porting network
device drivers to Xen, since the real device driver and Xen’s split drivers are two separate components
of the Xen network I/O chain. Existing drivers could be loaded in Xen Dom0 and network I/O would
work without any modifications to Xen split drivers. Hence the project objectives were modified to
exploring and implementing ways to expose device specific functionality to DomU’s and to do that in
an automated way, given a specification of the non-generic functionalities implemented by the real
network driver.
The following sections consist of an overview of the components of Xen's I/O virtualization
architecture which we had to study and use in order to implement our mechanisms to achieve this aim.
Overview of Xen device driver virtualization:
In Xen, the Domain0 is a privileged domain i.e. a privileged VM which hosts the administrative
management and control interface. It provides
the capability to create and terminate other domains(underprivileged domains or DomU’s) and to
control their associated scheduling parameters, physical memory allocations and the access they are
given to the machine's physical disks and network devices. It also supports the creation of the virtual
network interfaces (VIFs) and virtual block interfaces(VBDs) which are used by the underprivileged
guests.
Xen implements a Split device driver model for device driver virtualization. The Dom0 is in control of
the actual hardware devices and virtual devices are exported for the DomU’s to use. Also some
domains can be given control of particular hardware devices which then become the
driver domains, but this is done only if IOMMU is available, otherwise it compromises security.
The actual hardware driver resides in the driver domain/Dom0 and virtual device driver is split in two
parts viz. the frontend driver and the backend driver. They are separated by a virtual bus, XenBus,
which is roughly modeled after a device bus such as PCI. The backend driver resides in the driver
domain/Dom0 and the frontend driver resides in the guest.
The network frontend driver i.e. netfront driver acts as a device driver for the virtual network. It
communicates with the network backend driver i.e. netback driver with the help of shared memory ring
buffers and an event channel which is used for asynchronous notifications.
The event channel is the analog of hardware interrupts. The netback driver communicates with the
hardware through the actual hardware driver.
XenStore is another important component of Xen architecture. It is a database of configuration
information to be shared between domains. In relation to device drivers, it also fulfills the function of
device tree which is generally a result of querying of an external bus such as the PCI bus. It is used to
communicate to the front end driver the information about the domain hosting the backend driver,
information about the shared memory, event channel to be used and the device specific information.
Xen networking:
Xen network interface employs two I/O ring buffers, one for incoming packets and one for outgoing.
Ring buffers are producer – consumer queues implemented in shared memory. These ring buffers are
used to transmit instructions and the actual data is transferred through shared memory pages through
the grant mechanism. A grant reference refers to the shared memory page which acts as a buffer for the
actual data transfer.
Each transmission request contains a grant reference and an offset within the granted page. This allows
transmit and received buffers to be reused, preventing the TLB from needing frequent updates.
A similar arrangement is used for receiving packets. The DomU guest inserts a receive request into
the ring indicating where to store a packet, and the Dom0 component places the contents there.
For each new DomU, Xen creates a new pair of "connected virtual ethernet interfaces", with one end in
DomU and the other in Dom0. For linux DomU's, the device name it sees is named eth0. The other end
of that virtual ethernet interface pair exists within Dom0 as interface vif<id#>.0.
The default Xen configuration uses bridging within Dom0 to allow all domains to appear on the
network as individual hosts. When xend (the Xen Daemon) starts it runs a script named network-bridge
which creates a new bridge named xenbr0. The virtual network interfaces in the Dom0 are connected to
real physical interface using this bridge. The network card runs in the promiscuous mode. Each guest
gets its own MAC address assigned to its virtual interface. This allows all the guests to appear on the
network as individual hosts.
Packet arrives at hardware, is handled by real ethernet driver and appears on peth0, which is the real
ethernet interface. The interface peth0 is bound to the bridge, so it is passed to the bridge from there.
This step is run on Ethernet level, no IP addresses are set on peth0 or bridge. Now the bridge distributes
the packet, just like a switch would. It is passed to the appropriate virtual interface based on the MAC
address and from there it is delivered to the correct guest domain.
XenStore and XenBus :
XenStore is a hierarchical namespace (similar to sysfs or Open Firmware) which is shared between
domains. The interdomain communication primitives exposed by Xen are very low-level (virtual IRQ
and shared memory). XenStore is implemented on top of these primitives and provides some higher
level operations (read a key, write a key, enumerate a directory, notify when a key changes value).
XenStore is a database, hosted by domain 0, that supports transactions and atomic operations. It's
accessible by either a Unix domain socket in Dom0, a kernel-level API, or an ioctl interface via
/proc/xen/xenbus.
XenStore is used to store information about the domains during their execution and as a mechanism of
creating and controlling DomU devices.
XenBus provides an in-kernel API used by virtual I/O drivers to interact with XenStore.
There are three main paths in XenStore:
/vm - stores configuration information about domain
/local/domain - stores information about the domain on the local node (domid, etc.)
/tool - stores information for the various tools
The /local path currently only contains one directory, /local/domain that is indexed by domain id. It
contains the running domain information. It contains directories for each of the device backends, for
example vbd for block devices and vif for network devices in the directory /local/domain/<domain-
id>/backend. It consists status entries and entries for names and ids of the various entities such as
DomU, bridge to which it is connected to, MAC address. This is the directory in which we can store we
can store configuration information specific to our netfront-netback drivers.
All Xen virtual device drivers register themselves with XenBus at initialization. Most initialization and
setup is postponed until XenBus calls the probe function, which is very similar to how the PCI probe
function gets called in real ethernet drivers.
There are two classes of API which are used to write/read/modify XenStore. One set of API is for
accessing XenStore from tools, while the the other is an in-kernel API used to access XenStore from
inside the driver code.
XenStore API for tools:
The whole set of functions can be found in the file /tools/xenstore/xs.h. It contains functions such
xs_mkdir, xs_read,xs_write,xs_directory,xs_rm which create directories,read/write entries inside
directories,read directory contents, remove entries/directories respectively. These functions are very
similar to the set of POSIX functions for file/directory operations.
These functions can be called from C programs or perl/python scripts to create/modify/destroy entries
in XenStore. Various Xen tools use them to operate on XenStore.
XenStore in-kernel API or XenBus API:
This set of functions can be found in the file /include/xen/xenbus.h. It includes functions such as
xenbus_register_frontend/backend, xenbus_read/write,xenbus_mkdir/rm,xenbus_printf/scanf,
register/unregister_xenbus_watch which registers frontend/backend drivers, create/modify/destroy
XenStore entries, set/unset watches on XenStore entries.
XenStore Transactions:
Transactions provide developers with a method for ensuring that multiple operations on the Xenstore
are seen as a single atomic operation. Any time multiple operations must be performed before any
changes are seen by watchers, a transaction must be used to encapsulate the changes.
A transaction is started by calling a function xenbus_transaction_start() on the directory contents of
which need to be changed or read. The XenStore API functions can then be used to read/write values
in the desired entries. The transaction is ended by calling xenbus_transaction_end().
Similar functions exist which can be called from userspace tools to modify or read values from
XenStore.
XenStore Watches:
A watch is the functionality provided by XenStore which allows for registering callback functions
which are invoked when a particular XenStore entry or any entry below the directory being watched, is
changed. This allows drivers or applications to respond immediately to changes in the XenStore.
Drivers can register a watch by using the function register_xenbus_watch() which takes as input a
structure of type xenbus_watch which contains the XenStore entry/directory to be watched and a
pointer to the callback function.
Design and Implementation:
Network interfaces are represented inside the Linux kernel by struct netdevice. Network drivers
populate the structure and register it with the kernel at the time of initialization. It is the very core of
network driver layer and contains all the different types of information pertaining to the interface like
the interface name, hardware information like DMA channel and IRQ assigned to the device, interface
information such as MAC address and flags, a function dispatch table with functions such as open,
close, transmit, do_ioctl, change_mtu etc.
The do_ioctl method is generally used to implement non-standard functionality specific to the device.
When the ioctl system call is invoked on a socket, the command number is one of the symbols defined
in <linux/sockios.h>, and the sock_ioctl function directly invokes a protocol-specific function. Any
ioctl command that is not recognized by the protocol layer is passed to the device layer. These device-
related ioctl commands accept a third argument from user space, a struct ifreq *. This structure is
defined in <linux/if.h>. In addition to using the standardized calls, each interface can define its own
ioctl commands. The ioctl implementation for sockets recognizes 16 commands as private to the
interface: SIOCDEVPRIVATE through SIOCDEVPRIVATE+15. When one of these commands is
recognized, dev->do_ioctl is called in the relevant interface driver. The function receives the same
struct ifreq * pointer that the general-purpose ioctl function uses:
int (*do_ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd);
The ifr pointer points to a kernel-space address that holds a copy of the structure passed by the user.
After do_ioctl returns, the structure is copied back to user space. Therefore, the driver can use the
private commands to both receive and return data. The device-specific commands can choose to use the
fields in struct ifreq, but they already convey a standardized meaning, and it's unlikely that the driver
can adapt the structure to its needs. The field ifr_data is a caddr_t item (a pointer) that is meant to be
used for device-specific needs. The driver and the program used to invoke its ioctl commands should
agree about the use of ifr_data. This pointer can point to arbitrary configuration data understood both
by the application and driver.
Rest of the methods are standard methods which are supported by every interface and hence by the
netfront interface provided by Xen to the DomU guest. We have written a generic ioctl wrapper
function, address of which is assigned to the do_ioctl member of the netfront interface.
Generally the ifr_data field points to a structure or a buffer with arbitrary amount of data. If it points to
a structure then the size of the buffer pointed by it can be derived from the structure definition. But if it
points to an arbitrary data, then the method generally followed by driver developers to communicate
the size is to encode it in first 4 bytes of the buffer. The driver first reads the size from the buffer and
then reads the rest of the buffer.
A summary of our implementation is as follows. The specification of non-standard functionalities
implemented by the real network driver is provided in the form of a list of private ioctls(indicated by
the command, which lies between SIOCDEVPRIVATE through SIOCDEVPRIVATE+15) implemented
by the real network driver and the size of the buffers pointed by the ifr_data field. If the buffer points to
arbitrary amount of data then it is encoded in specification by putting -1 in the size field. A script reads
the list of ioctls and buffer sizes from the specification and creates corresponding fields in the XenStore
in the directory /local/domain/<domid>/ioctl/, which has been created before hand to house the ioctls.
In the entry for each ioctl, entries are created for input and return values and return status.
When a private ioctl is invoked from a DomU, the ioctl wrapper function reads the size field from the
corresponding entry. It then reads the whole struct ifreq and starts a transaction to write the ifreq
structure to input entry under that ioctl in the XenStore. For each ioctl we have included a return_ready
entry under it in the XenStore. It is a Boolean entry which indicates whether the return field is ready,
i.e. whether the netback driver has written the ioctl return value in the return entry of that ioctl. After
writing the input ifreq to the input entry the netfront ioctl wrapper writes 0 to that entry and ends the
transaction. It then enters a loop to poll its status.
Netback driver registers a watch on the ioctl directory at the time of its initialization in the netback_init
function. This watch is triggered when the netfront do_ioctl writes the ioctl input to the XenStore. Then
it reads the ifreq structure from the XenStore and calls the real device drivers do_ioctl function. It
invokes the dev_get_by_name function and passes it the name of the real network interface “peth0”.
The real interface is renamed from eth0 to peth0, when Xend brings up the network bridge. This
function returns the net_device structure for the real network interface. Then do_ioctl function is
invoked on it.
The return value of the ioctl is passed in the same ifreq structure which was passed to it. This structure
is then written back by the watch to the return entry under the ioctl in the XenStore and return status is
written in its entry. The watch then toggles the return_entry value under that ioctl in the XenStore.
As soon as the return_value becomes one the netfront ioctl wrapper comes out of the loop and starts a
transaction to read the return status. If the status indicates that the ioctl call was a success then it reads
the return value, writes it back to the buffer pointed by ifreq and ends the transaction and returns the
status.
Permissions:
DomU guests can be allowed or disallowed to invoke ioctls using the XenStore permissions API. These
permissions can be set from tools, which are perl/python scripts or C programs, from the Dom0. When
we wish to disallow a particular DomU to invoke a particular ioctl we can revoke read and write
permissions for it on the path /local/domain/<domain-id>/ioctl/<ioctl-number>. This can be done by
invoking the following function from a python script in Dom0,
xstransact.SetPermissions(path, { 'dom' : dom1,
'read' : False,
'write' : False });
The corresponding C function is xs_set_permissions. The first entry denotes the domain whose read
write privileges are being revoked, and the path is the path of the ioctl entry in the XenStore. When
permission is to be granted, this function can again be invoked with the values in the read and write
fields as True.
Conclusion:
Our method automates the process of exposing non-generic functionality implemented by the real
network driver in the form of device private ioctls. XenStore provides the infrastructure to encode the
ioctl information and the input and return values. The transactions API ensures that the reads and writes
are consistent and atomic. Since reads and writes to the XenStore are essentially File I/O, they are
slower than the data transfer mechanism between frontend and backend using shared memory. But
using the data transfer mechanism to transport the input and output of the ioctl would have required
major changes to the Xen networking code. In principle, XenStore is a mechanism to store
configuration information and ioctls are also generally used for configuration purposes. Also since
ioctls do not do data transfer themselves, their execution is not as time critical as that of the data
transmit and receive.
Newer NICs provide hardware support for virtualization which enable the NIC to be shared between
different VMs efficiently and safely. Various efforts are on to enable Xen to use these large set of
diverse and evolving functionalities being provided by newer NICs. The Xen netchannel2 protocol
along with a high level network I/O virtualization management system is being developed to address
this need. The manager would relieve users of the need to make decisions and configurations that are
customized to the underlying hardware capabilities. Instead, the manager would allow users to specify
policies at a high level and then determine the appropriate low-level configurations specific to the
particular hardware environment that would implement the policies. Thus, the manager would provide
a clean separation between user-relevant policies, and the hardware and software mechanisms that are
used to implement the policies.
References:
1) Xenwiki – http://wiki.xensource.com/xenwiki/
2) Running Xen: A Hands-On Guide to the Art of Virtualization - Jeanna N. Matthews, Eli M.
Dow, Todd Deshane, Wenjin Hu, Jeremy Bongio, Patrick F. Wilbur, Brendan Johnson
3) Linux Device Drivers - Jonathan Corbet, Greg Kroah-Hartman, Alessandro Rubini
4) Understanding Linux Network Internals - Christian Benvenuti
5) Xen-devel mailing list - lists.xensource.com/xen-devel
6) Taming Heterogeneous NIC Capabilities for I/O Virtualization - Jose Renato Santos, Yoshio
Turner, Jayaram Mudigonda.

Weitere ähnliche Inhalte

Was ist angesagt?

Virtualisation overview
Virtualisation overviewVirtualisation overview
Virtualisation overviewsagaroceanic11
 
Virtual networking concepts
Virtual networking conceptsVirtual networking concepts
Virtual networking conceptswangjiayong
 
Advanced virtualization techniques for FAUmachine
Advanced virtualization techniques for FAUmachineAdvanced virtualization techniques for FAUmachine
Advanced virtualization techniques for FAUmachinewebhostingguy
 
Keynote Speech: Xen ARM Virtualization
Keynote Speech: Xen ARM VirtualizationKeynote Speech: Xen ARM Virtualization
Keynote Speech: Xen ARM VirtualizationThe Linux Foundation
 
Understanding Modern Device Drivers
Understanding Modern Device DriversUnderstanding Modern Device Drivers
Understanding Modern Device Driversasimkadav
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMvwchu
 
Installing driver
Installing driverInstalling driver
Installing driverOnline
 
NEC Virtual PC Center (VPCC)
NEC Virtual PC Center (VPCC)NEC Virtual PC Center (VPCC)
NEC Virtual PC Center (VPCC)NECIndia
 
Running without a ZFS system pool
Running without a ZFS system poolRunning without a ZFS system pool
Running without a ZFS system poolBill Pijewski
 
A Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and ContainersA Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and Containersprashant desai
 

Was ist angesagt? (20)

Xen io
Xen ioXen io
Xen io
 
Virtualisation overview
Virtualisation overviewVirtualisation overview
Virtualisation overview
 
Virtual networking concepts
Virtual networking conceptsVirtual networking concepts
Virtual networking concepts
 
Advanced virtualization techniques for FAUmachine
Advanced virtualization techniques for FAUmachineAdvanced virtualization techniques for FAUmachine
Advanced virtualization techniques for FAUmachine
 
1 virtualization
1 virtualization1 virtualization
1 virtualization
 
Low Level View of Android System Architecture
Low Level View of Android System ArchitectureLow Level View of Android System Architecture
Low Level View of Android System Architecture
 
Williams xen summit 2010
Williams   xen summit 2010Williams   xen summit 2010
Williams xen summit 2010
 
Handout2o
Handout2oHandout2o
Handout2o
 
Keynote Speech: Xen ARM Virtualization
Keynote Speech: Xen ARM VirtualizationKeynote Speech: Xen ARM Virtualization
Keynote Speech: Xen ARM Virtualization
 
Understanding Modern Device Drivers
Understanding Modern Device DriversUnderstanding Modern Device Drivers
Understanding Modern Device Drivers
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
 
L4 Microkernel :: Design Overview
L4 Microkernel :: Design OverviewL4 Microkernel :: Design Overview
L4 Microkernel :: Design Overview
 
Installing driver
Installing driverInstalling driver
Installing driver
 
Slide final
Slide finalSlide final
Slide final
 
NEC Virtual PC Center (VPCC)
NEC Virtual PC Center (VPCC)NEC Virtual PC Center (VPCC)
NEC Virtual PC Center (VPCC)
 
Technical paper
Technical paperTechnical paper
Technical paper
 
Embedded Virtualization for Mobile Devices
Embedded Virtualization for Mobile DevicesEmbedded Virtualization for Mobile Devices
Embedded Virtualization for Mobile Devices
 
I/O Scalability in Xen
I/O Scalability in XenI/O Scalability in Xen
I/O Scalability in Xen
 
Running without a ZFS system pool
Running without a ZFS system poolRunning without a ZFS system pool
Running without a ZFS system pool
 
A Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and ContainersA Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and Containers
 

Ähnlich wie Automated paravitualization of device drivers in xen

virtualization and hypervisors
virtualization and hypervisorsvirtualization and hypervisors
virtualization and hypervisorsGaurav Suri
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud ComputingBhuvanesh Hingal
 
Cloud Computing Hypervisors and Comparison Xen KVM
Cloud Computing Hypervisors and Comparison Xen KVM Cloud Computing Hypervisors and Comparison Xen KVM
Cloud Computing Hypervisors and Comparison Xen KVM cloudresearcher
 
Fullandparavirtualization.ppt
Fullandparavirtualization.pptFullandparavirtualization.ppt
Fullandparavirtualization.pptImXaib
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
APznzaamT18LaGRvfDd3vc6XGHHoq2hlFqHYsO9vYeEQXTa-sAm9oMvLFaeBQkqdEEa1z4UJVAboW...
APznzaamT18LaGRvfDd3vc6XGHHoq2hlFqHYsO9vYeEQXTa-sAm9oMvLFaeBQkqdEEa1z4UJVAboW...APznzaamT18LaGRvfDd3vc6XGHHoq2hlFqHYsO9vYeEQXTa-sAm9oMvLFaeBQkqdEEa1z4UJVAboW...
APznzaamT18LaGRvfDd3vc6XGHHoq2hlFqHYsO9vYeEQXTa-sAm9oMvLFaeBQkqdEEa1z4UJVAboW...Neha417639
 
Virtualization technolegys for amdocs
Virtualization technolegys for amdocsVirtualization technolegys for amdocs
Virtualization technolegys for amdocsSamuel Dratwa
 
Virtualization: Force driving cloud computing
Virtualization: Force driving cloud computingVirtualization: Force driving cloud computing
Virtualization: Force driving cloud computingMayank Aggarwal
 
An Introduction To Server Virtualisation
An Introduction To Server VirtualisationAn Introduction To Server Virtualisation
An Introduction To Server VirtualisationAlan McSweeney
 
Xen Euro Par07
Xen Euro Par07Xen Euro Par07
Xen Euro Par07congvc
 
virtual-machine-ppt 18030 cloud computing.pptx
virtual-machine-ppt 18030 cloud computing.pptxvirtual-machine-ppt 18030 cloud computing.pptx
virtual-machine-ppt 18030 cloud computing.pptxZarwashgulrez
 
CloudComputing_UNIT 2.pdf
CloudComputing_UNIT 2.pdfCloudComputing_UNIT 2.pdf
CloudComputing_UNIT 2.pdfkhan593595
 
CloudComputing_UNIT 2.pdf
CloudComputing_UNIT 2.pdfCloudComputing_UNIT 2.pdf
CloudComputing_UNIT 2.pdfkhan593595
 
Platform virtualization.raj
Platform virtualization.rajPlatform virtualization.raj
Platform virtualization.rajNRajaMohanReddy
 
Xen & the Art of Virtualization
Xen & the Art of VirtualizationXen & the Art of Virtualization
Xen & the Art of VirtualizationTareque Hossain
 

Ähnlich wie Automated paravitualization of device drivers in xen (20)

Parth virt
Parth virtParth virt
Parth virt
 
virtualization and hypervisors
virtualization and hypervisorsvirtualization and hypervisors
virtualization and hypervisors
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Unit II.ppt
Unit II.pptUnit II.ppt
Unit II.ppt
 
Live VM Migration
Live VM MigrationLive VM Migration
Live VM Migration
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
Cloud Computing Hypervisors and Comparison Xen KVM
Cloud Computing Hypervisors and Comparison Xen KVM Cloud Computing Hypervisors and Comparison Xen KVM
Cloud Computing Hypervisors and Comparison Xen KVM
 
Fullandparavirtualization.ppt
Fullandparavirtualization.pptFullandparavirtualization.ppt
Fullandparavirtualization.ppt
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
APznzaamT18LaGRvfDd3vc6XGHHoq2hlFqHYsO9vYeEQXTa-sAm9oMvLFaeBQkqdEEa1z4UJVAboW...
APznzaamT18LaGRvfDd3vc6XGHHoq2hlFqHYsO9vYeEQXTa-sAm9oMvLFaeBQkqdEEa1z4UJVAboW...APznzaamT18LaGRvfDd3vc6XGHHoq2hlFqHYsO9vYeEQXTa-sAm9oMvLFaeBQkqdEEa1z4UJVAboW...
APznzaamT18LaGRvfDd3vc6XGHHoq2hlFqHYsO9vYeEQXTa-sAm9oMvLFaeBQkqdEEa1z4UJVAboW...
 
Virtualization technolegys for amdocs
Virtualization technolegys for amdocsVirtualization technolegys for amdocs
Virtualization technolegys for amdocs
 
Virtualization: Force driving cloud computing
Virtualization: Force driving cloud computingVirtualization: Force driving cloud computing
Virtualization: Force driving cloud computing
 
An Introduction To Server Virtualisation
An Introduction To Server VirtualisationAn Introduction To Server Virtualisation
An Introduction To Server Virtualisation
 
Xen Euro Par07
Xen Euro Par07Xen Euro Par07
Xen Euro Par07
 
virtual-machine-ppt 18030 cloud computing.pptx
virtual-machine-ppt 18030 cloud computing.pptxvirtual-machine-ppt 18030 cloud computing.pptx
virtual-machine-ppt 18030 cloud computing.pptx
 
CloudComputing_UNIT 2.pdf
CloudComputing_UNIT 2.pdfCloudComputing_UNIT 2.pdf
CloudComputing_UNIT 2.pdf
 
CloudComputing_UNIT 2.pdf
CloudComputing_UNIT 2.pdfCloudComputing_UNIT 2.pdf
CloudComputing_UNIT 2.pdf
 
Platform virtualization.raj
Platform virtualization.rajPlatform virtualization.raj
Platform virtualization.raj
 
OSSNA18: Xen Beginners Training
OSSNA18: Xen Beginners Training OSSNA18: Xen Beginners Training
OSSNA18: Xen Beginners Training
 
Xen & the Art of Virtualization
Xen & the Art of VirtualizationXen & the Art of Virtualization
Xen & the Art of Virtualization
 

Kürzlich hochgeladen

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Kürzlich hochgeladen (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Automated paravitualization of device drivers in xen

  • 1. Automated Paravirtualization of device drivers in Xen Nikhil Pujari Vijayakumar M M Sireesh Bolla Stony Brook University Abstract: Xen is an x86 virtual machine monitor which adopts a paravirtualization approach to allow multiple operating systems to share conventional hardware without sacrificing performance or functionality. Since it is a paravirtualization approach which provides an idealized virtual machine abstraction, operating systems need to be ported to be able to install and run them on the Xen VMM. Instead of emulating existing hardware devices as in full virtualization, Xen provides abstract devices which implement a high level interface for each device category. Since the interfaces are abstract and provide generic operations for that device class, extra effort needs to be put for providing the guest operating systems with the peculiar functionalities that the hardware may provide. These non-generic facilities are generally exposed by the device drivers in the form of device private ioctls, which are generally for configuring the devices to enable/disable the non-generic functionality or to collect status information. As of Xen 3.3, the netfront driver does not implement device private ioctls. User mode configuration programs like ifconfig which understand the underlying devices use these ioctls. Our method implements conversion from the local ioctl call in a Linux DomU to a remote call to the Dom0. This is done by including a generic ioctl wrapper in netfront driver and a watch in the netback driver. This automates the process of exposing arbitrary functionality provided by the real network hardware to the DomU through device private ioctls. Only a specification with the ioctl numbers and buffer sizes needs to be provided which is read by a tool which writes them to the XenStore. Motivation: Paravirtualization is a virtualization technique that presents a software interface to virtual machines that is similar but not identical to that of the underlying hardware. Paravirtualization may allow the virtual machine monitor (VMM) to be simpler or virtual machines that run on it to achieve performance closer to non-virtualized hardware. However, operating systems must be explicitly ported to run on top of a paravirtualized VMM. Most full virtualization solutions provide emulated forms of simple devices. The emulated devices are typically chosen to be common hardware, so it is likely that drivers exist already for any given guest.
  • 2. Paravirtualized guests, however, need to be modified. Therefore, the requirement for the virtual environment to use existing drivers disappears. Xen provides abstract devices which correspond to device categories e.g. it provides an abstract block device instead an SCSI device and IDE device. This device abstraction provides the generic calls corresponding to that device category e.g. read and write calls for the block device. This is done to achieve efficient I/O virtualization as opposed to emulation of devices in full virtualization. One of the important optimizations included in this approach is the grouping of I/O operations, which improves efficiency. Hardware manufacturers provide generic functionalities for that device class as well as additional device specific functionalities. For example, a cd/dvd drive which falls into the block device category provides the generic read and write capabilities, as well offer the “special”/non-generic capability of multisessioning. An ethernet device may offer special capabilities such as jumbo frames and checksum calculations, in addition to the generic functionalities of send and receive. The device drivers provided by Xen to the guest operating systems need to be modified in order to enable the guests to exploit these special features provided by the hardware. The aim of the project is to automate this process of modification to the Xen ethernet device drivers as much as possible, given the specifications of the ethernet device/NIC to be used. The initial aim of the project was to evaluate the feasibility of porting arbitrary network device drivers to Xen and whether we could automate the process. After examining, Xen’s approach and implementation of network I/O virtualization we determined that there is little need of porting network device drivers to Xen, since the real device driver and Xen’s split drivers are two separate components of the Xen network I/O chain. Existing drivers could be loaded in Xen Dom0 and network I/O would work without any modifications to Xen split drivers. Hence the project objectives were modified to exploring and implementing ways to expose device specific functionality to DomU’s and to do that in an automated way, given a specification of the non-generic functionalities implemented by the real network driver. The following sections consist of an overview of the components of Xen's I/O virtualization architecture which we had to study and use in order to implement our mechanisms to achieve this aim. Overview of Xen device driver virtualization: In Xen, the Domain0 is a privileged domain i.e. a privileged VM which hosts the administrative management and control interface. It provides
  • 3. the capability to create and terminate other domains(underprivileged domains or DomU’s) and to control their associated scheduling parameters, physical memory allocations and the access they are given to the machine's physical disks and network devices. It also supports the creation of the virtual network interfaces (VIFs) and virtual block interfaces(VBDs) which are used by the underprivileged guests. Xen implements a Split device driver model for device driver virtualization. The Dom0 is in control of the actual hardware devices and virtual devices are exported for the DomU’s to use. Also some domains can be given control of particular hardware devices which then become the driver domains, but this is done only if IOMMU is available, otherwise it compromises security. The actual hardware driver resides in the driver domain/Dom0 and virtual device driver is split in two parts viz. the frontend driver and the backend driver. They are separated by a virtual bus, XenBus, which is roughly modeled after a device bus such as PCI. The backend driver resides in the driver domain/Dom0 and the frontend driver resides in the guest. The network frontend driver i.e. netfront driver acts as a device driver for the virtual network. It communicates with the network backend driver i.e. netback driver with the help of shared memory ring buffers and an event channel which is used for asynchronous notifications. The event channel is the analog of hardware interrupts. The netback driver communicates with the hardware through the actual hardware driver. XenStore is another important component of Xen architecture. It is a database of configuration information to be shared between domains. In relation to device drivers, it also fulfills the function of device tree which is generally a result of querying of an external bus such as the PCI bus. It is used to communicate to the front end driver the information about the domain hosting the backend driver, information about the shared memory, event channel to be used and the device specific information. Xen networking: Xen network interface employs two I/O ring buffers, one for incoming packets and one for outgoing. Ring buffers are producer – consumer queues implemented in shared memory. These ring buffers are used to transmit instructions and the actual data is transferred through shared memory pages through the grant mechanism. A grant reference refers to the shared memory page which acts as a buffer for the actual data transfer. Each transmission request contains a grant reference and an offset within the granted page. This allows
  • 4. transmit and received buffers to be reused, preventing the TLB from needing frequent updates. A similar arrangement is used for receiving packets. The DomU guest inserts a receive request into the ring indicating where to store a packet, and the Dom0 component places the contents there. For each new DomU, Xen creates a new pair of "connected virtual ethernet interfaces", with one end in DomU and the other in Dom0. For linux DomU's, the device name it sees is named eth0. The other end of that virtual ethernet interface pair exists within Dom0 as interface vif<id#>.0. The default Xen configuration uses bridging within Dom0 to allow all domains to appear on the network as individual hosts. When xend (the Xen Daemon) starts it runs a script named network-bridge which creates a new bridge named xenbr0. The virtual network interfaces in the Dom0 are connected to real physical interface using this bridge. The network card runs in the promiscuous mode. Each guest gets its own MAC address assigned to its virtual interface. This allows all the guests to appear on the network as individual hosts. Packet arrives at hardware, is handled by real ethernet driver and appears on peth0, which is the real ethernet interface. The interface peth0 is bound to the bridge, so it is passed to the bridge from there. This step is run on Ethernet level, no IP addresses are set on peth0 or bridge. Now the bridge distributes the packet, just like a switch would. It is passed to the appropriate virtual interface based on the MAC address and from there it is delivered to the correct guest domain. XenStore and XenBus : XenStore is a hierarchical namespace (similar to sysfs or Open Firmware) which is shared between domains. The interdomain communication primitives exposed by Xen are very low-level (virtual IRQ and shared memory). XenStore is implemented on top of these primitives and provides some higher level operations (read a key, write a key, enumerate a directory, notify when a key changes value). XenStore is a database, hosted by domain 0, that supports transactions and atomic operations. It's accessible by either a Unix domain socket in Dom0, a kernel-level API, or an ioctl interface via /proc/xen/xenbus. XenStore is used to store information about the domains during their execution and as a mechanism of creating and controlling DomU devices. XenBus provides an in-kernel API used by virtual I/O drivers to interact with XenStore.
  • 5. There are three main paths in XenStore: /vm - stores configuration information about domain /local/domain - stores information about the domain on the local node (domid, etc.) /tool - stores information for the various tools The /local path currently only contains one directory, /local/domain that is indexed by domain id. It contains the running domain information. It contains directories for each of the device backends, for example vbd for block devices and vif for network devices in the directory /local/domain/<domain- id>/backend. It consists status entries and entries for names and ids of the various entities such as DomU, bridge to which it is connected to, MAC address. This is the directory in which we can store we can store configuration information specific to our netfront-netback drivers. All Xen virtual device drivers register themselves with XenBus at initialization. Most initialization and setup is postponed until XenBus calls the probe function, which is very similar to how the PCI probe function gets called in real ethernet drivers. There are two classes of API which are used to write/read/modify XenStore. One set of API is for accessing XenStore from tools, while the the other is an in-kernel API used to access XenStore from inside the driver code. XenStore API for tools: The whole set of functions can be found in the file /tools/xenstore/xs.h. It contains functions such xs_mkdir, xs_read,xs_write,xs_directory,xs_rm which create directories,read/write entries inside directories,read directory contents, remove entries/directories respectively. These functions are very similar to the set of POSIX functions for file/directory operations. These functions can be called from C programs or perl/python scripts to create/modify/destroy entries in XenStore. Various Xen tools use them to operate on XenStore. XenStore in-kernel API or XenBus API: This set of functions can be found in the file /include/xen/xenbus.h. It includes functions such as xenbus_register_frontend/backend, xenbus_read/write,xenbus_mkdir/rm,xenbus_printf/scanf, register/unregister_xenbus_watch which registers frontend/backend drivers, create/modify/destroy
  • 6. XenStore entries, set/unset watches on XenStore entries. XenStore Transactions: Transactions provide developers with a method for ensuring that multiple operations on the Xenstore are seen as a single atomic operation. Any time multiple operations must be performed before any changes are seen by watchers, a transaction must be used to encapsulate the changes. A transaction is started by calling a function xenbus_transaction_start() on the directory contents of which need to be changed or read. The XenStore API functions can then be used to read/write values in the desired entries. The transaction is ended by calling xenbus_transaction_end(). Similar functions exist which can be called from userspace tools to modify or read values from XenStore. XenStore Watches: A watch is the functionality provided by XenStore which allows for registering callback functions which are invoked when a particular XenStore entry or any entry below the directory being watched, is changed. This allows drivers or applications to respond immediately to changes in the XenStore. Drivers can register a watch by using the function register_xenbus_watch() which takes as input a structure of type xenbus_watch which contains the XenStore entry/directory to be watched and a pointer to the callback function. Design and Implementation: Network interfaces are represented inside the Linux kernel by struct netdevice. Network drivers populate the structure and register it with the kernel at the time of initialization. It is the very core of network driver layer and contains all the different types of information pertaining to the interface like the interface name, hardware information like DMA channel and IRQ assigned to the device, interface information such as MAC address and flags, a function dispatch table with functions such as open, close, transmit, do_ioctl, change_mtu etc. The do_ioctl method is generally used to implement non-standard functionality specific to the device. When the ioctl system call is invoked on a socket, the command number is one of the symbols defined
  • 7. in <linux/sockios.h>, and the sock_ioctl function directly invokes a protocol-specific function. Any ioctl command that is not recognized by the protocol layer is passed to the device layer. These device- related ioctl commands accept a third argument from user space, a struct ifreq *. This structure is defined in <linux/if.h>. In addition to using the standardized calls, each interface can define its own ioctl commands. The ioctl implementation for sockets recognizes 16 commands as private to the interface: SIOCDEVPRIVATE through SIOCDEVPRIVATE+15. When one of these commands is recognized, dev->do_ioctl is called in the relevant interface driver. The function receives the same struct ifreq * pointer that the general-purpose ioctl function uses: int (*do_ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd); The ifr pointer points to a kernel-space address that holds a copy of the structure passed by the user. After do_ioctl returns, the structure is copied back to user space. Therefore, the driver can use the private commands to both receive and return data. The device-specific commands can choose to use the fields in struct ifreq, but they already convey a standardized meaning, and it's unlikely that the driver can adapt the structure to its needs. The field ifr_data is a caddr_t item (a pointer) that is meant to be used for device-specific needs. The driver and the program used to invoke its ioctl commands should agree about the use of ifr_data. This pointer can point to arbitrary configuration data understood both by the application and driver. Rest of the methods are standard methods which are supported by every interface and hence by the netfront interface provided by Xen to the DomU guest. We have written a generic ioctl wrapper function, address of which is assigned to the do_ioctl member of the netfront interface. Generally the ifr_data field points to a structure or a buffer with arbitrary amount of data. If it points to a structure then the size of the buffer pointed by it can be derived from the structure definition. But if it points to an arbitrary data, then the method generally followed by driver developers to communicate the size is to encode it in first 4 bytes of the buffer. The driver first reads the size from the buffer and then reads the rest of the buffer. A summary of our implementation is as follows. The specification of non-standard functionalities implemented by the real network driver is provided in the form of a list of private ioctls(indicated by the command, which lies between SIOCDEVPRIVATE through SIOCDEVPRIVATE+15) implemented by the real network driver and the size of the buffers pointed by the ifr_data field. If the buffer points to arbitrary amount of data then it is encoded in specification by putting -1 in the size field. A script reads the list of ioctls and buffer sizes from the specification and creates corresponding fields in the XenStore
  • 8. in the directory /local/domain/<domid>/ioctl/, which has been created before hand to house the ioctls. In the entry for each ioctl, entries are created for input and return values and return status. When a private ioctl is invoked from a DomU, the ioctl wrapper function reads the size field from the corresponding entry. It then reads the whole struct ifreq and starts a transaction to write the ifreq structure to input entry under that ioctl in the XenStore. For each ioctl we have included a return_ready entry under it in the XenStore. It is a Boolean entry which indicates whether the return field is ready, i.e. whether the netback driver has written the ioctl return value in the return entry of that ioctl. After writing the input ifreq to the input entry the netfront ioctl wrapper writes 0 to that entry and ends the transaction. It then enters a loop to poll its status. Netback driver registers a watch on the ioctl directory at the time of its initialization in the netback_init function. This watch is triggered when the netfront do_ioctl writes the ioctl input to the XenStore. Then it reads the ifreq structure from the XenStore and calls the real device drivers do_ioctl function. It invokes the dev_get_by_name function and passes it the name of the real network interface “peth0”. The real interface is renamed from eth0 to peth0, when Xend brings up the network bridge. This function returns the net_device structure for the real network interface. Then do_ioctl function is invoked on it. The return value of the ioctl is passed in the same ifreq structure which was passed to it. This structure is then written back by the watch to the return entry under the ioctl in the XenStore and return status is written in its entry. The watch then toggles the return_entry value under that ioctl in the XenStore. As soon as the return_value becomes one the netfront ioctl wrapper comes out of the loop and starts a transaction to read the return status. If the status indicates that the ioctl call was a success then it reads the return value, writes it back to the buffer pointed by ifreq and ends the transaction and returns the status. Permissions: DomU guests can be allowed or disallowed to invoke ioctls using the XenStore permissions API. These permissions can be set from tools, which are perl/python scripts or C programs, from the Dom0. When we wish to disallow a particular DomU to invoke a particular ioctl we can revoke read and write permissions for it on the path /local/domain/<domain-id>/ioctl/<ioctl-number>. This can be done by invoking the following function from a python script in Dom0,
  • 9. xstransact.SetPermissions(path, { 'dom' : dom1, 'read' : False, 'write' : False }); The corresponding C function is xs_set_permissions. The first entry denotes the domain whose read write privileges are being revoked, and the path is the path of the ioctl entry in the XenStore. When permission is to be granted, this function can again be invoked with the values in the read and write fields as True. Conclusion: Our method automates the process of exposing non-generic functionality implemented by the real network driver in the form of device private ioctls. XenStore provides the infrastructure to encode the ioctl information and the input and return values. The transactions API ensures that the reads and writes are consistent and atomic. Since reads and writes to the XenStore are essentially File I/O, they are slower than the data transfer mechanism between frontend and backend using shared memory. But using the data transfer mechanism to transport the input and output of the ioctl would have required major changes to the Xen networking code. In principle, XenStore is a mechanism to store configuration information and ioctls are also generally used for configuration purposes. Also since ioctls do not do data transfer themselves, their execution is not as time critical as that of the data transmit and receive. Newer NICs provide hardware support for virtualization which enable the NIC to be shared between different VMs efficiently and safely. Various efforts are on to enable Xen to use these large set of diverse and evolving functionalities being provided by newer NICs. The Xen netchannel2 protocol along with a high level network I/O virtualization management system is being developed to address this need. The manager would relieve users of the need to make decisions and configurations that are customized to the underlying hardware capabilities. Instead, the manager would allow users to specify policies at a high level and then determine the appropriate low-level configurations specific to the particular hardware environment that would implement the policies. Thus, the manager would provide a clean separation between user-relevant policies, and the hardware and software mechanisms that are used to implement the policies.
  • 10. References: 1) Xenwiki – http://wiki.xensource.com/xenwiki/ 2) Running Xen: A Hands-On Guide to the Art of Virtualization - Jeanna N. Matthews, Eli M. Dow, Todd Deshane, Wenjin Hu, Jeremy Bongio, Patrick F. Wilbur, Brendan Johnson 3) Linux Device Drivers - Jonathan Corbet, Greg Kroah-Hartman, Alessandro Rubini 4) Understanding Linux Network Internals - Christian Benvenuti 5) Xen-devel mailing list - lists.xensource.com/xen-devel 6) Taming Heterogeneous NIC Capabilities for I/O Virtualization - Jose Renato Santos, Yoshio Turner, Jayaram Mudigonda.