1. An Epi Info™ Database on the Internet Cloud
Taha Kass-Hout1, MD, MS and Eduardo Jezierski2
1
BioSense Program Manager, Centers for Disease Control and Prevention (CDC), Atlanta, USA
2
VP Engineering, InSTEDD, Palo Alto, California, USA
Disclaimer: The findings and conclusions made in this manuscript are those of the authors, and
do not necessarily represent the official positions of the Centers for Disease Control and
Prevention (CDC).
Modern management of disease outbreaks requires collaboration across jurisdictional and
organizational boundaries. While Epi Info™ (1, 2), CDC’s suite of tools for field collection of
disease surveillance and outbreak information, has over 1 million instances worldwide (3), there
exists no easy way to exchange data, making analysis of disease surveillance and outbreaks more
time consuming for epidemiologists. Furthermore, the problem is made more complex by the
proliferation of diverse ad-hoc tools for data collection on multiple platforms, from online
surveys to use of spreadsheets. The non-profit organization known as InSTEDD (Innovative
Support to Emergencies, Diseases, and Disasters) has been facilitating the development of a set
of open source tools to enable data sharing. Partnering with US Centers for Disease Control and
Prevention (CDC), InSTEDD demonstrated (4) the ability to rapidly, safely, and securely
exchange Epi Info™ data in a virtual — or cloud — computing environment (over Internet
infrastructure [e.g., Government cloud (5), Amazon’s EC2/S3, Google cloud/App Engine]). A
cloud environment is one in which servers and resources are remotely maintained and their use is
'virtually' available to users, drastically reducing or eliminating the need for capital investment
and additional IT support.
InSTEDD’s Mesh4x (http://code.google.com/p/mesh4x) allows for data synchronization among
different data sources regardless of technology platform or network connectivity. By including
the Mesh4x adapters for Epi Info™, epidemiologists can make their data available to all users in
their distributed project team or across different jurisdictions. In this chapter we demonstrate the
utility of Mesh4x to share data over the Internet cloud where an epidemiologist determines which
subset of her data are exchanged. This technology raises the potential to share data (e.g., during
outbreak investigation) where multiple epidemiologists are then allowed access to see each
other’s data, update the information as the outbreak unfolds, and securely exchange data with
one another.
Demonstration
A near-real time data exchange between multiple instances of Epi Info™ was enabled by
configuring Mesh4x for Internet cloud use. A client-based tool (Figure 1) was developed to
easily be used by an epidemiologist to build and configure without requiring any prior technical
knowledge. Many epidemiologists are familiar with the foodborne outbreak in Oswego, New
York, U.S.A. on April 18th, 1940. In this outbreak, over half of the participants at a potluck
church supper developed a gastro-intestinal illness. A survey was created and interviews were
conducted with 75 of the 80 people known to have been present to determine the source of the
contamination. The actual church supper was held in the Oswego county; however, we
demonstrated value of data synchronization by an imaginary scenario (6) where interviews and
data entry were conducted in different localities. Therefore, we populated fictitious addresses
2. spread across five counties in New York upstate region (Oswego, Jefferson, Lewis, Oneida, and
Wayne) then used Mesh4x tool to synchronize data across multiple Epi Info™ instances.
Figure 1: Mesh4x Client for Epi Info™ (Epi Info™ Data Exchange)
Download the Mesh4x client for Epi Info™
You can download the Mesh4x client and sample Oswego database here (under “Demo
Version”, file size is ~36.6MB): http://code.google.com/p/mesh4x/wiki/EpiInfoMesh4x. Unzip
the content of the file to your C drive (C:mesh4x). You can also participate in the active
discussion group here: http://groups.google.com/group/mesh4x
Setup a Collaborative Mesh and Share your Data
You can create a mesh (a collaborative space (supported by cloud service authentication (Basic
or SSL), users can further encrypt their storage (e.g., MS Excel, MS Access, Google
Docs/Spreadsheets, MS SQL Server, MYSQL)) where you can share and synchronize data with
others) by visiting this site: http://sync.staging.instedd.org/mesh4x
Give your mesh a name such as Epiinfo (case sensitive) as shown in the following figure:
3. Then, you will need to create the data feed (i.e., the data you want to share with others); enter the
name of your data and add your initial at the end of the name; for example, for the Oswego
database we can enter “OswegoTK” in the name field) as shown in the following figure:
You can also define the variables you want plotted on a map by entering the mapping parameters
in the “mappings” field. Here is an example of the variables that can be plotted on a map (please
refer to the sample file “example_Oswego_mappings_to_Cloud_feed_creation.txt” in the
“C:mesh4xproperties” folder:
<item.title>patient name: {Oswego/Name}</item.title>
<item.description>adress: {Oswego/Address}</item.description>
<geo.location>{geoLocation(Oswego/Address)}</geo.location>
<geo.longitude>{geoLongitude(Oswego/Address)}</geo.longitude>
<geo.latitude>{geoLatitude(Oswego/Address)}</geo.latitude>
<patient.ill>{Oswego/ILL}</patient.ill>
<patient.updateTimestamp>{Oswego/DateOnset}</patient.updateTimestamp>
Currently, there is no data in your collaborative space in the cloud:
http://sync.staging.instedd.org/mesh4x/feeds/Epiinfo/OswegoTK
Start the mesh4x client (Figure 1) by double clicking “epiinfo.jar” file in the “C:mesh4x”
folder, this will launch the “Epi Info™ Data Exchange” window. Click the “Open configuration
window” option at the bottom of the “Epi Info™ Data Exchange” window to configure the
location of the sample MS Access database. Select the “Data Sources” tab in the configuration
window as shown in the following figure, assign the name “Oswego” in the first field, browse for
4. the sample data file “Epiinfo.mdb ” in the “C:mesh4xdata” folder by clicking on the symbol (…)
in the next field, select the Oswego table from the drop down list, and then click “Save”.
Close the “Configuration” window once the steps above are completed by clicking “Close” at the
bottom left of the window.
Click the “Cloud Synchronization” option at the bottom of the “Epi Info™ Data Exchange”
window (Figure 1), enter the URL for your mesh data feed you created earlier in the URL field
(http://sync.staging.instedd.org/mesh4x/feeds/Epiinfo/OswegoTK) then click the button “Synch
Now”
In your web browser, refresh the page for the above URL (or click the icon with the cloud and
magnifying glass next to the URL field in the figure above) and you should now be able to see
all of the 47 records.
5. You can update the data in Epi Info™ then repeat the steps above and your data will be updated
in the cloud. You can share the URL with other team members and they can share their data as
described above (you only need to create the mesh group and the data feed once though). As
records are being updated or new records are added, each member can simply synchronize their
data with the cloud and everyone on the team will be in sync with the latest information.
Generate Google Earth map
You can create a map and display it in Google Earth (or KML map layers) (A free copy of
Google Earth can be downloaded from http://earth.google.com). Click the option “Open Maps”
at the bottom of the “Epi Info™ Data Exchange” window, the “Epiinfo Map Exchange” window
will appear; click on the green icon right next to the URL field to download the mapping schema
you defined in an earlier step.
Next, click on the “Open Map” icon in order to automatically geo-code the addresses you have in
your local database and generate a Google Earth map for all the cases as shown in the following
figure:
6. Conclusions
The impact of sharing data is that decisions can be made earlier and based on more complete
analysis. Analyzing only subsets of the information about a population affected by an outbreak
can lead to erroneous conclusions about the nature of the event and wasting time and response
resources. We describe an application providing a lightweight solution for sharing public health
data based on cloud computing and peer-to-peer architectures. The significant value of this
application is demonstrated, whereby epidemiologists can rapidly stand up a collaborative
environment for data exchange and system interoperability, especially when conducting an
outbreak investigation or responding to a disaster under austere operating conditions.
References
1. Epi Info™: http://www.cdc.gov/epiinfo
2. CDC takes its epidemiological software open source; Government Health IT, Dec 12, 2008
3. J. Ma et al., New frontiers for health information systems using Epi Info in developing
countries: Structured application framework for Epi Info (SAFE), Int. J. Med. Inform. (2007)
4. Epi Info™ and Mesh4x Prototype: http://kasshout.blogspot.com/2008/12/epi-info-and-
mesh4x-prototype.html
5. Towns S. 2009. Google to Launch Government Cloud. Government Technology
http://www.govtech.com/gt/724044 (accessed November 22, 2009).
6. Oswego in the Cloud: Scenario Script: http://www.slideshare.net/kasshout/oswego-in-the-
cloud-scenario-script