SlideShare a Scribd company logo
1 of 46
Download to read offline
Secure File Management Using the Public
Cloud
A	Masters	in	Cybersecurity	Practicum	Project	
Cecil	Thornhill	
ABSTRACT
The	Project	explores	the	history	and	evolution	of	document	management	
tools	through	the	emergence	of	cloud	computing	and	documents	the	
development	of	a	basic	cloud	computing	web	based	system	for	secure	
transmission	and	storage	of	confidential	information	on	a	public	cloud	
following	guidance	for	federal	computing	systems.
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	2	of	46	
	
Introduction ................................................................................................................3	
Background of the Driving Problem – Ur to the Cloud ..................................................3	
The Cloud in Context – A New Way to Provide IT .........................................................7	
Cloud Transformation Drivers......................................................................................8	
The Federal Cloud & the Secure Cloud Emerge.......................................................... 10	
Designing a Project to Demonstrate Using the Cloud ..................................................13	
Planning the Work and Implementing the Project Design ...........................................15	
Findings, Conclusions and Next Steps.........................................................................32	
References.................................................................................................................34	
Source Code Listings ..................................................................................................39	
Test Document ..........................................................................................................46
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	3	of	46	
Introduction
	
This	paper	describes	the	design	and	development	of	a	system	to	support	the	
encrypted	transfer	of	confidential	and	sensitive	Personally	Identifiable	Information	
(PII)	and	Personal	Healthcare	Information	(PHI)	to	a	commercial	cloud	based	object	
storage	system.		This	work	was	undertaken	as	a	Practicum	project	for	the	Masters	in	
Cybersecurity	program,	and	as	such	was	implemented	within	the	time	limits	of	a	
semester	session	and	was	completed	by	a	single	individual.		This	prototype	
represents	a	basic	version	of	a	web-based	system	implemented	on	a	commercial	
cloud	based	object	storage	system.	The	prototype	demonstrates	an	approach	to	
implementation	suitable	for	use	by	government	or	private	business	for	the	
collection	of	data	subject	to	extensive	regulation	such	as	HIPAA/HiTech	healthcare	
data,	or	critical	financial	data.		
	
A	general	review	of	the	context	of	the	subject	area	and	history	of	document	
management	are	provided	below,	along	with	a	review	of	the	implementation	efforts.	
Findings	and	results	are	provided	both	for	the	implementation	efforts	as	well	as	the	
actual	function	of	the	system.	Due	to	the	restricted	time	available	for	this	project,	
the	scope	was	limited	to	fit	the	schedule.	Only	basic	features	were	implemented	per	
the	design	guidance	documented	below.	To	explore	future	options	for	expansion	of	
the	project	several	experiments	designed	to	further	analyze	the	system	capacity	and	
performance	are	outlined	below.	These	options	represent	potential	future	
directions	to	further	explore	this	aspect	of	secure	delivery	of	information	
technology	functions	using	cloud-based	platforms.	
Background of the Driving Problem – Ur to the Cloud
The	need	to	exchange	documents	containing	important	information	between	
individuals,	and	enterprises	is	a	universal	necessity	in	any	organized	human	society.	
Since	the	earliest	highly	organized	human	cultures	information	about	both	private	
and	government	activities	has	been	recorded	on	physical	media	and	exchanged	
between	parties1.	Various	private	and	government	couriers	were	used	to	exchange	
documents	in	the	ancient	and	classical	world.	In	the	West,	this	practice	of	private	
courier	service	continued	after	the	fall	of	Rome.		The	Catholic	Church	acted	as	a	
primary	conduit	for	document	exchange	and	was	itself	a	prime	consumer	of	
document	exchange	services2.		
	
In	the	West,	after	the	renaissance	the	growth	of	both	the	modern	nation	state	and	
the	emergence	of	early	commerce	and	capitalism	were	both	driven	by	and	
supportive	of	the	growth	of	postal	services	open	to	private	interest.	The	needs	of	
commerce	quickly	came	to	dominate	the	traffic,	and	shape	the	evolution	of	
document	exchange	via	physical	media3.	In	the	early	United	States	the	critical	role	of	
publicly	accessible	document	exchange	was	widely	recognized	by	the	founders	of	
the	new	democracy.	The	Continental	Congress	in1775	established	the	US	Postal
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	4	of	46	
Service	to	provide	document	communications	services	to	the	emerging	new	
government	prior	to	the	declaration	of	independence4.	
	
As	a	new	and	modern	nation	cost	effective,	efficient	document	exchange	services	
from	the	new	post	office	were	essential	to	the	growth	of	the	US	economy5.	The	
growth	of	the	US	as	a	political	and	economic	power	unfolds	in	parallel	with	the	
Industrial	Revolution	in	England	and	Europe	as	well	as	the	overall	transition	of	the	
Western	world	to	what	can	be	described	as	modern	times.	New	science,	new	
industry	and	commerce	and	new	political	urgencies	all	drive	the	demand	for	the	
transmission	of	documents	and	messages	in	ever	faster	and	more	cost	effective	
forms6.		
	
It	is	within	this	accelerating	technical	and	commercial	landscape	that	the	digital	age	
is	born	in	the	US	when	Samuel	Morse	publicly	introduces	the	telegraph	to	the	world	
in	1844	with	the	famous	question	“What	Hath	God	Wrought?”	sent	from	the	US	
Capitol	to	the	train	statin	in	Baltimore,	Maryland7.	Morse’s	demonstration	was	the	
result	of	years	of	experiment	and	effort	by	hundreds	of	people	in	scores	of	countries,	
but	has	come	to	represent	the	singular	moment	of	creation	for	the	digital	era	and	
marks	the	beginning	of	the	struggle	to	understand	and	control	the	issues	stemming	
from	document	transmission	in	the	digital	realm.	All	of	the	issues	we	face	emerge	
from	this	time	forward,	such	as:		
	
• Translation	of	document	artifacts	created	by	people	into	digital	formats	and	
the	creation	of	human	readable	documents	from	digital	intermediary	formats.	
• The	necessity	to	authenticate	the	origin	of	identical	digital	data	sets	and	to	
manage	the	replication	of	copies.	
• The	need	to	enforce	privacy	and	security	during	the	transmission	process	
across	electronic	media.	
		
Many	of	these	problems	have	similar	counterparts	in	the	physical	document	
exchange	process,	but	some	such	as	the	issue	of	an	indefinite	number	of	identical	
copies	were	novel	and	all	these	issues	require	differing	solutions	for	a	physical	or	
digital	environment8.	The	telegraph	was	remarkable	successful	due	to	its	compelling	
commercial,	social	and	military	utility.	As	Du	Boff	and	Yates	note	in	their	research:	
	
“By	1851,	only	seven	years	after	the	inauguration	of	the	pioneer	Baltimore-to-
Washington	line,	the	entire	eastern	half	of	the	US	up	to	the	Mississippi	River	was	
connected	by	a	network	of	telegraph	wires	that	made	virtually	instantaneous	
communication	possible.	By	the	end	of	another	decade,	the	telegraph	had	reached	
the	west	coast,	as	well9,	10	“.	
	
	
	
The	reach	of	the	telegraph	went	well	beyond	the	borders	of	the	US,	or	even	the	
shores	of	any	one	continent	by	1851.	That	same	year	Queen	Victoria	sent	president
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	5	of	46	
Buchannan	a	congratulatory	telegram	to	mark	the	successful	completion	of	the	
Anglo-American	transatlantic	cable	project11.	Digital	documents	now	had	global	
scope,	and	the	modern	era	of	document	exchange	and	management	had	truly	
arrived.	
	
The	US	Civil	war	would	be	largely	shaped	by	the	technical	impact	of	the	telegraph	
and	railroad.		Both	the	North	and	South	ruthlessly	exploited	advances	in	
transportation	and	communication	during	the	conflict12.	Centralization	of	
information	management	and	the	need	to	confidentiality,	integrity,	and	availability	
all	emerged	as	issues.	Technical	tools	like	encryption	rapidly	became	standard	
approaches	to	meeting	these	needs13.	
	
The	patterns	of	technical	utilization	during	the	war	provided	a	model	for	future	civil	
government	and	military	use	of	digital	communications	and	for	digital	document	
transmission.	The	government’s	use	patterns	then	became	a	lesson	in	the	potential	
for	commercial	use	of	the	technology.		Veterans	of	the	war	went	on	to	utilize	the	
telegraph	as	an	essential	tool	in	post	war	America’s	business	climate.		Rapid	
communication	and	a	faster	pace	in	business	became	the	norm	as	the	US	scaled	up	
its	industry	in	the	late	19th	century.	Tracking	and	managing	documents	became	an	
ever-increasing	challenge	along	with	other	aspects	of	managing	the	growing	and	
geographically	diverse	business	enterprises	emerging.	
	
By	the	turn	of	the	20th	century	the	telegraph	provided	a	thriving	and	vital	
alternative	to	the	physical	transmission	of	messages	and	documents.	Most	messages	
and	documents	to	be	sent	by	telegraph	were	either	entered	directly	as	digital	signals	
sent	originally	by	telegraphy,	or	transcribed	by	a	human	who	read	and	re-entered	
the	data	from	the	document.	However,	all	of	the	modern	elements	of	digital	
document	communication	existed	and	were	in	some	form	of	use,	including	the	then	
under-utilized	facsimile	apparatus14.	
	
As	the	20th	century	progresses	two	more	19th	century	technologies	which	would	
come	to	have	a	major	impact	on	document	interchange	and	management	would	
continue	to	evolve	in	parallel	with	the	telegraph:	mechanical/electronic	
computation	and	photography.	Mechanical	computation	tracing	its	origin	from	
Babbage’s	Analytical	Engine	would	come	to	be	indispensible	in	tabulating	and	
managing	the	data	needed	to	run	an	increasingly	global	technical	and	industrial	
society15.	Photography	not	only	provided	a	new	and	accurate	record	of	people	and	
events,	but	with	the	development	of	fine	grained	films	in	the	20th	century,	microfilm	
would	come	to	be	the	champion	of	high	density	document	and	hence	information	
storage	media.	Despite	some	quality	drawbacks,	the	sheer	capacity	and	over	100-
year	shelf	life	of	microfilm	made	it	very	attractive	as	a	document	storage	tool.	By	the	
1930’s	microfilm	had	become	the	bulk	document	storage	medium	of	choice	for	
publications	and	libraries	as	well	as	the	federal	government16.
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	6	of	46	
The	experience	with	early	electronic	computers	in	World	War	II	and	familiarity	with	
microfilm	made	merging	the	two	technologies	appear	as	a	natural	next	step	to	
forward	thinkers.	In	1945	Vannevar	Bush,	the	wartime	head	of	the	Office	of	
Scientific	Research	and	Development	(OSRD)	would	propose	the	Memex.	Memex	
was	designed	as	an	associative	information	management	device	combining	
electronic	computer-like	functions	with	microfilm	storage,	but	was	not	fully	digital	
nor	was	it	networked17.	In	many	ways	this	project	pointed	the	way	to	modern	
information	management	tools	that	were	introduced	in	the	1960’s	but	not	fully	
realized	until	the	end	of	the	20th	century.	
	
Bush,	V.,	&	Think,	A.	W.	M.	(1945).	The	Atlantic	Monthly.	As we may think,	176(1),	
101-108.	
	
The	commercial	release	and	rapid	adoption	of	modern	computer	systems	such	as	
the	groundbreaking	IBM	360	in	the	1960’s,	and	series	of	mini-computer	systems	in	
the	1970	such	as	the	DEC	VAX	greatly	expanded	the	use	of	digital	documents	and	
created	the	modern	concept	of	a	searchable	database	filled	with	data	from	these	
documents.		The	development	of	electronic	document	publishing	systems	in	the	
1980’s	allowed	for	a	“feedback	loop”	that	allowed	digital	data	to	go	back	into	printed	
documents,	generating	a	need	to	manage	these	new	documents	with	the	computers	
used	to	generate	them	from	the	data	and	user	input.	The	growth	of	both	electronic	
data	exchange	and	document	scanning	in	the	1990’s,	to	began	to	replace	microfilm.		
	
Many	enterprises	realized	the	need	to	eliminate	paper	and	only	work	with	
electronic	versions	of	customer	documents.	The	drive	for	more	efficient	and	
convenient	delivery	of	services	as	well	as	the	need	to	reduce	the	cost	of	managing	
paper	records	continues	to	drive	the	demand	for	electronic	document	management	
tools.	By	the	1990’s	large-scale	document	management	and	document	search	
systems	such	as	FileNet	and	its	competitors	began	to	emerge	into	the	commercial	
market.	The	emergence	of	fully	digital	document	management	systems	in	wide	
spread	use	by	the	turn	of	the	21st	century	brings	the	story	of	document	management	
into	the	present	day,	where	we	see	a	predominance	of	electronic	document	systems,	
and	an	expectation	of	quick	and	universal	access	to	both	the	data	and	documents	as	
artifacts	in	every	aspect	of	life,	including	activities	that	are	private,	commercial	and	
interactions	with	the	government.		
	
As	the	demand	for	large	electronic	document	management	infrastructures	the	scale	
of	these	systems	and	related	IT	infrastructure	continued	to	expand,	placing	
significant	cost	stress	on	the	enterprise.	There	was	a	boom	in	the	construction	of	
data	centers	to	house	the	infrastructure.		At	the	same	time	that	the	physical	data	
centers	for	enterprises	were	expanding,	a	new	model	of	enterprise	computing	was	
being	developed:	Cloud	Computing.
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	7	of	46	
The Cloud in Context – A New Way to Provide IT
	
In	1999	Salesforce	popularized	the	idea	of	providing	enterprise	applications	
infrastructure	via	a	website,	and	by	2002	Amazon	started	delivering	computation	
and	storage	to	enterprises	via	the	Amazon	Web	Services	platform.	Google,	Microsoft	
and	Oracle	as	well	as	a	host	of	other	major	IT	players	quickly	followed	with	their	
own	version	of	cloud	computing	options.	These	new	cloud	services	offered	the	
speed	and	convenience	of	web	based	technology	with	the	features	of	a	large	data	
center.	An	enterprise	could	lease	and	provision	cloud	resources	with	little	time	and	
no	investment	in	up	front	costs	for	procurement	of	system	hardware.	By	2009	
options	for	cloud	computing	were	plentiful,	but	there	was	as	yet	little	generally	
accepted	evidence	about	the	reasons	for	the	shift	or	even	the	risk	and	benefits18.	
	
What	made	cloud	systems	different	from	earlier	timeshare	approaches	and	data	
center	leasing	of	physical	space?	Why	were	they	more	compelling	than	renting	or	
leasing	equipment?	While	a		detailed	examination	of	all	the	concepts	and	
considerations	leading	to	the	emergence	of	cloud	computing	is	outside	the	scope	of	
this	paper,	there	is	a	broad	narrative	that	can	be	suggested	based	on	prior	historical	
study	of	technological	change	from	steam	to	electricity	and	then	to	centralized	
generations	systems.	While	the	analogies	may	not	all	be	perfect,	they	can	be	useful	
tools	in	contextualizing	the	question	of	"why	cloud	computing	now?"	
	
	In	the	19th	century,	the	development	of	practical	steam	power	drove	a	revolution	in	
technical	change.	The	nature	of	mechanical	steam	power	was	such	that	the	steam	
engine	was	intrinsically	local,	as	mechanical	power	is	hard	to	transmit	across	
distance19
.	When	electrical	generation	first	emerged	at	the	end	of	the	19th	century,	
the	first	electrical	applications	tended	to	reproduce	this	pattern.	Long	distance	
distribution	of	power	was	hard	to	achieve,	and	so	many	facilities	used	generators	
for	local	power	production20
.	
The nature of electricity was quite different from mechanical power, and so
breakthroughs in distribution were rapid. Innovators such as Tesla and Westinghouse
quickly developed long distance transmission of electricity. This electrical power
distribution breakthrough allowed the rapid emergence of very large centralized power
stations; the most significant of these early centers was the Niagara hydroelectric station21
.
Today, most power is generated in large central stations. Power is transmitted via a
complex national grid system. The distribution grid is an amalgam of local and regional
grids22
. However this was not the end of the demand for local generators. In fact more use
of electricity lead to more demand for local generators, but for non-primary use cases
such as emergency power, or for alternate use cases such as remote or temporary power
supplies23, 24
.
The way local generation was used changed with the shift to the power grid in ways that
can be seen to parallel to shift from local data centers to cloud based data center
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	8	of	46	
operations. Wile it is true that early computers were more centralized since the mid 70's
and the emergence of the mini-computer and then micro-computer that came to
prominence in the 80's, a much more distributed pattern emerged. The mainframe and
mini-computer became the nucleus of emerging local data centers in every enterprise. As
Local Area Networks emerged they reinforced the role of the local data center as a hub
for the enterprise. Most enterprises in the 1980’s and 90’s had some form of local data
center, in a pattern not totally dissimilar to that of early electric generators.
As the networks grew in scale and speed, they began to shift the patterns of local
computing to emphasize connectivity and wider geographic area of service. When the
commercial Internet emerged in the 1990's the stage was set for a radical change, in much
the same way that the development of efficient electrical distribution across a grid
changed the pattern of an earlier technical system. Connectivity became the driving
necessity for en enterprise competing to reach its supply chain and customers by the new
network tools.
By the turn of the 21st
century, firms like Google and Amazon were experimenting with
what the came to consider a new type of computer, the Warehouse Scale Computer. By
2009 this was a documented practical new tool, as noted in Google’s landmark paper
“The Datacenter as a Computer An Introduction to the Design of Warehouse-Scale
Machines”, Luiz André Barroso and Urs Hölzle, Google Inc. 2009. This transition can be
considered as similar to the move to centrally generated electrical power sent out via the
grid. In a similar manner it will not erase local computer resources but will alter their
purpose and use cases25
.
	
As	was	the	case	for	the	change	to	more	centralized	electrical	generation,	by	the	early	
21st	century	there	was	considerable	pressure	on	IT	managers	to	consider	moving	
from	local	data	centers	to	cloud	based	systems.	For	both	general	computing	and	for	
document	management	systems	this	pressure	tends	to	come	from	two	broad	source	
categories:	Technical/Process	drivers	and	Cost	drivers.	Technical	drivers	include	
the	savings	in	deployment	time	for	servers	and	systems	at	all	points	in	the	systems	
development	lifecycle,	and	cost	drivers	are	reflected	in	the	reduced	operational	
costs	provided	by	cloud	systems26.	
Cloud Transformation Drivers
Technical and Process drivers also include considerations such as functional performance
and flexible response to business requirements. The need to be responsive in short time
frames as well as to provide the latest trends in functional support for the enterprise
business users and customers favors the quick start up times of cloud based IT services.
The wide scope of the business use case drivers goes beyond the scope of this paper, but
is important to note.
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	9	of	46	
Cost	drivers	favoring	cloud	based	IT	services	are	more	easily	understood	in	the	
context	of	document	management	as	discussed	in	this	paper.	Moving	to	cloud	based	
servers	and	storage	for	document	management	systems	represents	an	opportunity	
to	reduce	the	Total	Cost	of	Ownership	(TCO)	of	the	IT	systems.	These	costs	include	
not	only	the	cost	to	procure	the	system	components	but	also	the	cost	to	operate	
them	in	a	managed	environment,	controlled	by	the	enterprise.	Even	it	appears	there	
is	no	compelling	functional	benefit	to	be	obtained	by	the	use	of	cloud	based	systems,	
the	cost	factors	alone	are	typically	compelling	as	a	driver	for	the	decision	to	move	
document	management	systems	move	from	local	servers	and	storage	to	the	cloud.	
	
As	an	example	of	the	potential	cost	drivers,	Amazon	and	other	vendors	offer	a	
number	of	TCO	comparison	tools	that	illustrate	the	case	for	cost	savings	from	cloud-
based	operations.	While	the	vendors	clearly	have	a	vested	interest	in	promotion	of	
cloud	based	operations,	these	tools	provide	a	reasonable	starting	point	for	an	
“apples	to	apples”	estimate	of	costs	for	local	CPU	and	storage	vs.	cloud	CPU	and	
storage	options.	Considering	that	the	nature	of	document	systems	is	not	especially	
CPU	intense,	but	is	very	demanding	of	storage	subsystems	this	cost	comparison	is	a	
good	starting	point,	as	it	tends	to	reduce	the	complexity	of	the	pricing	model.		
	
For	purposes	of	comparison	here	the	Amazon	TCO	model	will	be	discussed	below	to	
examine	the	storage	costs	implications	for	a	small	(1TB)	document	store.	The	
default	model	from	Amazon	starts	with	an	assumption	of	1	TB	of	data,	that	requires	
“hot”	storage	(fast	access	for	on	demand	application	support),	full	plus	incremental	
backup	and	grows	by	1TB	per	month	in	size27.	This	is	a	good	fit	for	a	modest	
document	storage	system	and	can	be	considered	a	“ballpark”	baseline.		
	
Total	Cost	of	Ownership.	(2016).	Retrieved	July	06,	2016,	from	
http://www.backuparchive.awstcocalculator.com/		
	
Amazon’s	tool	estimates	this	storage	to	cost	about	$	308,981	per	year	for	local	SAN	
backed	up	to	tape.	The	tool	estimates	the	same	storage	using	the	cloud	option	cost	
about	$37,233	for	a	year.	The	cost	of	local	hot	storage	alone	is	estimated	at	
$129,300	for	and	$29,035	for	Amazon	S3	storage.	Based	on	the	author’s	past	
experience	in	federal	IT	document	management	systems,	these	local	storage	costs	
are	generally	within	what	could	be	considered	reasonably	relevant	and	accurate	for	
a	private	or	federal	data	center	storage	TCO	cost	ranges.		Processing	costs	estimates	
for	servers	required	in	the	storage	solution	are	also	within	the	range	of	typical	mid-
size	to	large	data	center	costs	based	on	author’s	experience	over	the	past	8	years	
with	federal	and	private	data	center	projects.	Overall,	the	Amazon	tool	does	appear	
to	produce	estimates	of	local	costs	that	can	be	considered	reasonably	viable	for	
planning	purposes.	
	
This	rough	and	quick	analysis	form	the	Amazon	TCO	tool	gives	a	good	impression	of	
the	level	of	cost	savings	possible	with	cloud-based	systems.	It	serves	as	an	example	
of	some	of	the	opportunities	presented	to	IT	managers	faced	with	a	need	to	control
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	10	of	46	
budgets	and	provide	more	services	for	less	cost.		The	potential	to	provide	the	same	
services	for	half	to	¼	the	normal	cost	of	local	systems	is	very	interesting	to	most	
enterprises	as	a	whole.	When	added	to	the	cloud	based	flexibility	to	rapidly	deploy	
and	the	freedom	to	scale	services	up	and	down,	these	factors	helps	to	explain	the	
increased	preference	for	cloud	based	IT	deployment.	This	preference	for	cloud	
computing	now	extends	beyond	the	private	sector	to	government	enterprises	
seeking	the	benefits	of	the	new	computing	models	offered	by	cloud	vendors.	
The Federal Cloud & the Secure Cloud Emerge
For the federal customer the transition to Warehouse Scale Computing and the public
cloud can be dated to 2011 when the FedRAMP initiative was established. The
FedRAMP program is based on policy guidance from President Barack Obama’s 2001
paper titled "International Strategy for Cyberspace” 28
as well as the "Cloud First" policy
authored by US CIO Vivek Kundra 29
and the “Security Authorization of Information
Systems in Cloud Computing Environments “30
memo from Federal Chief Information
Officer, Steven VanRoekel. Together these documents framed the proposed revamp of all
federal Information Technology systems:
In the introduction to his 2011 cloud security memo, VanRoekel provides some concise
notes on the compelling reasons for the federal move to cloud computing:
“Cloud computing offers a unique opportunity for the Federal Government to take
advantage of cutting edge information technologies to dramatically reduce procurement
and operating costs and greatly increase the efficiency and effectiveness of services
provided to its citizens. Consistent with the President’s International Strategy for
Cyberspace and Cloud First policy, the adoption and use of information systems operated
by cloud service providers (cloud services) by the Federal Government depends on
security, interoperability, portability, reliability, and resiliency. 30
“
Collectively,	these	three	documents	and	the	actions	they	set	in	motion	have	
transformed	the	federal	computing	landscape	since	2011	and	as	the	private	sector’s	
use	of	local	computing	has	begun	a	rapid	shift	to	the	cloud	driven	by	competition	
and	the	bottom	line,	in	the	short	space	of	5	years	the	entire	paradigm	for	IT	in	the	
federal	government	of	the	US	has	shifted	radically.	It	is	not	unreasonable	to	expect	
that	by	2020,	cloud	computing	will	be	the	norm,	not	the	exception	for	any	federal	IT	
system.	This	transition	offers	huge	opportunities,	but	brings	massive	challenges	to	
implement	secure	infrastructure	in	a	public	cloud	computing	space.	
	
Functionally,	the	conversion	from	physical	to	electronic	documents	has	a	number	of	
engineering	requirements,	but	above	and	beyond	this,	there	are	legal	and	security	
considerations	that	make	any	document	management	system	more	complex	to	
impalement	than	earlier	databases	of	disparate	facts.	Documents	as	an	entity	are	
more	than	a	collection	of	facts.	They	represent	social	and	legal	relationships	and
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	11	of	46	
agreements.	As	such	the	authenticity,	integrity,	longevity	and	confidentiality	of	the	
document	as	an	artifact	matter.	The	security	and	privacy	implications	of	the	
continued	expansion	of	electronic	exchange	of	data	in	consumer	and	commercial	
financial	transactions	was	incorporated	into	the	rules,	regulations	and	policy	
guidance	included	in	the	Gramm-Leach-Bliley	Act	of	199931.		
	
A	good	example	of	the	wide	swath	of	sensitive	data	that	needs	to	be	protected	in	
both	physical	and	electronic	transactions	is	shown	in	the	Sensitive	Data:	
Your	Money	AND	Your	Life	web	page	that	is	part	of	the	Safe	Computing	Pamphlet	
Series	from	MIT.	As	the	page	notes:	
	
“Sensitive	data	encompasses	a	wide	range	of	information	and	can	include:	your	
ethnic	or	racial	origin;	political	opinion;	religious	or	other	similar	beliefs;	
memberships;	physical	or	mental	health	details;	personal	life;	or	criminal	or	civil	
offences.	These	examples	of	information	are	protected	by	your	civil	rights.		
Sensitive	data	can	also	include	information	that	relates	to	you	as	a	consumer,	client,	
employee,	patient	or	student;	and	it	can	be	identifying	information	as	well:	your	
contact	information,	identification	cards	and	numbers,	birth	date,	and	parents’	
names.	32	“	
	
Sensitive	data	also	includes	core	identity	data	aside	from	the	information	about	any	
particular	event,	account	or	transaction,	personal	preferences,	or	self	identified	
category.		Most	useful	documents	supporting	interactions	between	people	and	
business	or	government	enterprises	contain	Personally	Identifiable	Information	
(PII),	which	is	defined	by	the	Government	as:	
	
"...any	information	about	an	individual	maintained	by	an	agency,	including	any	
information	that	can	be	used	to	distinguish	or	trace	an	individual’s	identity,	such	as	
name,	Social	Security	number,	date	and	place	of	birth,	mother’s	maiden	name,	
biometric	records,	and	any	other	personal	information	that	is	linked	or	linkable	to	
an	individual.	33,"		
	
Identity	data	is	a	special	and	critical	subset	of	sensitive	data,	as	identity	data	is	
required	to	undertake	most	of	the	other	transactions,	and	to	interact	with	essential	
financial,	government	or	healthcare	services.	As	such	this	data	must	be	protected	
from	theft	or	alteration	to	protect	individuals	and	society	as	well	as	to	ensure	the	
integrity	of	other	data	in	any	digital	system34.	In	order	to	protect	this	PII	data	the	
Government	through	the	National	Institute	of	Standards	(NIST)	defines	a	number	of	
best	practices	and	security	controls	that	form	the	basis	for	sound	management	of	
confidential	information.	35	These	controls	include	such	concepts	as:	
	
• Identification and Authentication	-	uniquely	identifying	and	authenticating	
users	before	accessing	PII
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	12	of	46	
• Access Enforcement	-	implementing	role-based	access	control	and	
configuring	it	so	that	each	user	can	access	only	the	pieces	of	data	necessary	
for	the	user‘s	role.	
• Remote Access Control	-	ensuring	that	the	communications	for	remote	
access	are	encrypted.	
• Event Auditing	-	monitor	events	that	affect	the	confidentiality	of	PII,	such	as	
unauthorized	access	to	PII.		
• Protection of Information at Rest	-	encryption	of	the	stored	information	
storage	disks.	
	
In	addition	to	these	considerations,	many	enterprises	also	need	to	handle	
documents	that	contain	both	PII	and	medical	records	or	data	from	medical	records,	
or	Protected	Heath	Information	(PHI).	Medical	records	began	to	be	stored	
electronically	in	the	1990’s.	By	the	early	part	of	the	21st	century	this	growth	in	
electronic	health	records	resulted	in	a	new	set	of	legislation	design	to	both	
encourage	the	switch	to	electronic	health	records	and	to	set	up	guidelines	and	policy	
for	managing	and	exchanging	these	records.	The	Health	Insurance	Portability	and	
Account-	ability	Act	(HIPAA)	of	1996	creates	a	set	of	guidelines	and	regulations	for	
how	enterprises	much	manage	PHI36.	
	
Building	on	HIPAA,	the	American	Recovery	and	Reinvestment	Act	of	2009	and	the	
Health	Information	Technology	for	Economic	and	Clinical	Health	Act	(HITECH)	of	
2009	added	additional	policy	restrictions,	and	security	requirements	as	well	as	
penalties	for	failure	to	comply	with	the	rules37.	These	regulations	for	PHI	both	
overlap	and	add	to	the	considerations	for	data	and	documents	containing	PII.	
	
The	HITEC	law	increased	the	number	of	covered	organizations	or	“entities”	from	
those	under	the	control	of	the	HIPAA	legislations:	
	
“Previously,	the	rules	only	applied	to	"covered	entities,"	including	such	healthcare	
organizations	as	hospitals,	physician	group	practices	and	health	insurers.	Now,	the	
rules	apply	to	any	organization	that	has	access	to	"protected	health	information.	38”	
	
HITEC	also	added	considerable	detail	and	clarification	as	well	as	new	complexity	
and	even	more	stringent	penalties	for	lack	of	compliance	or	data	exposure	or	
“breaches”.	Under	HITEC	a	breach	is	defined	as:	
	
"…the	unauthorized	acquisition,	access,	use	or	disclosure	of	protected	health	
information	which	compromises	the	security	or	privacy	of	such	information,	except	
where	the	unauthorized	person	to	whom	such	information	is	disclosed	would	not	
reasonably	have	been	able	to	retain	such	information.	38"		
	
The	result	of	the	considerations	needed	to	manage	documents	that	might	contain	
Sensitive	Data,	PII	or	PHI	or	any	combination	of	these	elements	is	that	any	
document	management	system	implemented	in	private	or	public	data	centers	must
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	13	of	46	
implement	a	wide	range	of	technical	and	procedural	steps	to	operate	in	a	secure	
manner.	Protection	of	the	security,	privacy	and	integrity	of	the	documents	and	data	
in	those	documents	becomes	a	major	part	of	the	challenge	to	designing,	building	and	
operating	any	information	system.	These	engineering	efforts	are	essential	to	
business	operations	however	they	also	become	part	of	the	cost	for	any	system,	and	
as	such	can	be	a	considerable	burden	on	the	budget	of	any	enterprise.	
Designing a Project to Demonstrate Using the Cloud
	
It	is	within	this	context	of	providing	a	secure	system	leveraging	cloud-based	benefits	
that	the	practicum	project	described	in	this	paper	was	designed.	The	goal	of	the	
project	was	to	demonstrate	a	viable	approach	to	following	the	policy	guidance	as	
provided	for	federal	IT	systems.	To	achieve	this	goal,	the	first	step	was	to	
understand	the	context	as	outlined	in	the	discussion	above.	The	next	step	was	to	
design	a	system	that	followed	sound	cybersecurity	principles	and	the	relevant	
policy	guidance.	
	
Based	on	the	demand	for	electronic	document	management	in	both	private	and	
government	enterprise,	a	basic	document	management	system	was	selected	as	the	
business	case	for	the	prototype	to	be	developed.	Document	management	provides	
an	opportunity	to	implement	some	server	side	logic	for	the	operation	of	the	user	
interface	and	for	the	selection	and	management	of	storage	systems.	Document	
management	also	provides	a	driving	problem	that	allows	for	clear	utilization	of	
storage	options,	and	thus	can	demonstrate	the	benefits	of	the	cloud	based	storage	
options	that	feature	prominently	in	the	consideration	of	cloud	advantages	of	both	
speed	of	deployment	and	lower	TCO.	These	considerations	were	incorporated	in	the	
decision	to	implement	a	document	management	system	as	the	demonstration	
project.	
	
The	scope	of	the	system	was	also	a	key	consideration.	Given	the	compressed	time	
frame	and	limited	access	to	developer	resources	that	are	intrinsic	to	a	practicum	
project,	the	functional	scope	of	the	document	management	system	would	need	to	be	
constrained.	As	a	solo	developer,	the	range	of	features	that	can	be	implemented	
would	need	to	be	limited	to	the	basic	functions	needed	to	show	proof	of	concept	for	
the	system.	In	this	case,	this	were	determined	to	be:	
	
1. The	system	would	be	implemented	on	the	Amazon	EC2	public	cloud	for	the	
compute	tier	of	the	demonstration.	
2. The	system	would	utilize	Amazon	S3	object	storage	as	opposed	to	block	
storage.	
3. The	system	would	be	implemented	using	commercially	available	Amazon	
provided	security	features	for	ensuring	Confidentiality,	Integrity	and	
Accessibility39.
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	14	of	46	
Dimov,	I.	(2013,	June	20).	Guiding	Principles	in	Information	Security	-	InfoSec	
Resources.	Retrieved	July	09,	2016,	from	
http://resources.infosecinstitute.com/guiding-principles-in-information-security/		
	
4. The	servers	used	for	the	project	would	all	be	Linux	based.	
5. The	system	would	feature	a	basic	web	interface	to	allow	demonstration	of	
the	ability	to	store	documents.	
6. The	system	would	use	Public	Key	Infrastructure	certificates	generated	
commercially	to	meet	the	need	to	support	encryption	for	both	web	and	
storage	components.	
7. The	web	components	of	the	prototype	would	use	HTTP	to	enforce	secure	
connection	to	the	cloud	based	servers	and	storage.	
8. The	system	would	utilize	a	commercial	web	server	infrastructure	suitable	for	
scaling	up	to	full-scale	operation	but	only	a	single	instance	would	be	
implemented	in	the	prototype.	
9. The	web	components	would	be	implemented	in	a	language	and	framework	
well	suited	to	large-scale	web	operations	with	the	ability	to	handle	large	
concurrent	loads.	
10. Only	a	single	demonstration	customer/vendor	would	be	implemented	in	the	
prototype.	
11. The	group	and	user	structure	would	be	developed	and	implemented	using	
the	Amazon	EC2	console	functions.		
12. Only	the	essential	administrative	and	user	groups	would	be	populated	for	the	
prototype.	
13. The	prototype	would	feature	configurable	settings	for	both	environment	and	
application	values	set	by	environment,	files,	and	Amazon	settings	tools.	The	
current	prototype	phase	would	not	introduce	a	database	subsystem	expected	
to	be	used	to	manage	configuration	in	a	fully	production	ready	version	of	the	
system.	
14. Data	files	used	in	the	prototype	would	be	minimal	versions	of	XML	files	
anticipated	to	be	used	in	an	operational	system,	but	would	only	contain	
structure	and	minimal	ID	data	not	full	payloads.		
	
In	the	case	of	a	narrowly	scoped	prototype	such	as	this	demonstration	project	it	is	
equally	critical	to	determine	what	function	is	out	of	scope.	For	this	system	this	list	
included	the	following:	
	
• The	web	interface	would	be	left	in	a	basic	state	to	demonstrate	proof	of	
function	only.	Elaboration	and	extension	of	the	GUI	would	be	outside	the	
scope	of	the	work	for	this	prototype	project.	
• There	would	be	no	restriction	on	the	documents	to	be	uploaded.	Filtering	
vendor	upload	would	be	outside	the	scope	of	work	for	this	prototype.	
• Testing	uploads	with	anti-virus/malware	tools	would	be	outside	the	scope	of	
this	prototype	project.
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	15	of	46	
• Security	testing	or	restriction	of	the	client	would	be	outside	the	scope	of	this	
project.	The	URL	to	access	the	upload	function	would	be	open	for	the	
prototype	and	the	infrastructure	for	user	management	would	not	be	
developed	in	the	prototype.	
• Load	testing	and	performance	testing	of	the	prototype	would	be	outside	the	
scope	of	this	phase	of	the	project.	
• No	search	capacity	would	be	implemented	to	index	the	data	stored	in	the	S3	
subsystem	in	the	prototype	project.	
	
Proof	of	concept	was	thus	defined	as:	
	
A) The	establishment	of	the	cloud	based	infrastructure	to	securely	store	
documents.	
B) The	implementation	of	the	required	minimal	web	and	application	servers	
with	the	code	required	to	support	upload	of	documents.	
C) The	successful	upload	of	test	documents	to	the	prototype	system	using	a	
secure	web	service.	
	
While	the	scope	of	the	project	may	appear	modest	and	the	number	of	restrictions	
for	the	phase	to	be	implemented	in	the	practicum	course	period	an	numerous,	these	
scope	limitations	proved	vital	to	completion	of	the	project	in	the	anticipated	period.	
The	subtle	challenges	to	implementation	of	this	proof	of	concept	feature	set	proved	
more	than	adequate	to	occupy	the	time	available	and	provided	considerable	scope	
for	learning	and	valuable	information	for	future	projects	based	on	cloud	computing,	
as	detailed	in	the	subsequent	sections	of	this	paper.	
Planning the Work and Implementing the Project Design
	
To	move	to	implementation,	the	next	phase	of	the	Software	Development	Lifecycle	
(SDLC)	the	requirements	and	scope	limitations	listed	above	were	used	to	develop	a	
basic	project	plan	for	the	project	consisting	of	two	main	phases:		
	
A)	The	technical	implementation	of	the	infrastructure	and	code	through	to	proof	of	
concept.		
B)	The	documentation	of	the	project	work	and	production	of	this	report/paper.	
	
The	project	management	of	any	implementation	process	for	a	project	is	a	critical	
success	factor	for	any	enterprise	no	matter	how	large	of	small.	This	is	very	true	for	
cloud	computing	projects	as	they	often	represent	a	significant	departure	from	
existing	IT	systems	and	processed	for	an	enterprise.	This	was	the	case	in	this	project	
as	well.	
	
While	no	formal	GNATT	or	PERT	chart	was	developed	for	the	project	plan,	as	there	
was	no	need	to	transmit	the	plan	to	multiple	team	members,	an	informal	breakdown
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	16	of	46	
was	used	to	guide	the	technical	implementation	in	an	attempt	to	keep	it	on	
schedule:	
	
Week	1:	 Establish	the	required	Amazon	EC2	accounts	and	provision	a	basic	
server	with	a	secure	management	account	for	remote	administration	
of	the	cloud	systems.	
Week	2:	 Procure	the	required	PKI	certificates	and	then	configure	the	
certificates	needed	to	secure	access	to	the	servers,	and	any	S3	storage	
used	by	the	system.	Configure	the	S3	Storage.	
Week	3:	 Obtain	and	install	the	required	commercial	web	server	and	
application	server	to	work	together	and	utilize	a	secure	HTTP	
configuration	for	system	access.	Implement	any	language	framework	
needed	for	application	code	development.	
Week	4:	 Research	and	develop	the	required	application	code	to	demonstrate	
file	upload	and	reach	proof	of	concept.	Create	any	required	data	files	
for	testing.	
Weeks	5-8:		 Document	the	project	and	produce	the	final	report/paper.	
	
	
In	practice	this	proposed	8	week	schedule	would	slip	by	about	4	weeks	due	to	about	
2	weeks	of	extra	work	caused	by	the	complexity	and	unexpected	issues	found	in	the	
system	and	code	development	implementation	and	about	2	weeks	of	delays	in	the	
write	up	caused	by	the	author’s	relocation	to	a	new	address.	These	delays	in	
schedule	are	not	atypical	of	many	IT	projects.	They	serve	to	illustrate	the	
importance	of	both	planning	and	anticipation	of	potential	unexpected	factors	when	
implementing	new	systems	that	are	not	well	understood	in	advance	by	the	teams	
involved.		Allowing	slack	in	any	IT	schedule,	and	especially	those	for	new	systems	is	
key	to	a	successful	outcome	as	it	allows	flexibility	to	deal	with	unexpected	aspects	of	
the	new	system.	
	
The	very	first	tasks	to	be	undertaken	in	the	execution	of	the	project	plan	for	this	
project	was	to	establish	the	required	Amazon	Elastic	Compute	Cloud	(Amazon	EC2)	
accounts.	EC2	is	the	basic	cloud	infrastructure	service	provided	by	Amazon.	This	
service	provides	user	management,	security,	system	provisioning,	billing	and	
reporting	features	for	Amazon’s	cloud	computing	platform.	It	is	the	central	point	for	
administration	of	any	hosted	project	such	as	the	prototype	under	discussion	in	this	
paper40.	
	
Because	the	author	was	an	existing	Amazon	customer	with	prior	EC2	accounts,	the	
existing	identification	and	billing	credentials	could	be	used	for	this	project	as	well.	
Both	identity	and	billing	credentials	are	critical	components	for	this	and	any	other	
cloud	based	project	on	Amazon	or	any	other	cloud	vendor.	It	is	axiomatic	that	the	
identity	of	at	least	one	responsible	party,	either	an	individual	or	institution,	must	be	
known	for	the	cloud	vendor	to	establish	systems	and	accounts	in	its	infrastructure.	
This	party	acts	as	the	“anchor”	for	any	future	security	chain	to	be	established.	The
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	17	of	46	
primary	account	will	act	as	the	ultimate	system	owner	and	will	be	responsible	for	
the	system’s	use	or	abuse	and	for	any	costs	incurred.	Below	is	an	example	home	
screen	for	the	author’s	project	on	EC2:	
	
	
	
Responsibility	for	costs	is	the	other	key	aspect	of	the	primary	EC2	account.	While	
cloud	computing	may	offer	cost	savings	benefits,	it	is	by	no	means	a	free	service.	
Every	aspect	of	the	EC2	system	is	monetized	and	tracked	in	great	detail	to	ensure	
correct	and	complete	billing	for	any	features	used	by	an	account	holder.	Some	basis	
for	billing	must	be	provided	at	the	time	any	account	is	established.	In	the	case	of	this	
project	all	expenses	for	the	EC2	features	used	would	be	billed	back	to	the	author’s	
credit	account	previously	established	with	Amazon.		
	
In	any	cloud	project	it	is	vital	that	each	team	member	committing	to	additional	
infrastructure	have	the	understanding	that	there	will	be	a	bill	for	each	feature	used.	
Amazon	and	most	cloud	vendors	offer	a	number	of	planning	and	budgeting	tools	for	
projecting	the	costs	of	features	before	making	a	commitment.	This	is	helpful,	but	is	
not	a	substitute	for	clearly	communicating	and	planning	for	costs	in	advance	among	
the	development	team	members	and	project	owners,	stakeholders	and	managers.	In	
the	case	of	this	project,	while	the	author	did	reference	the	budgeting	tools	to	note	
costs	estimates,	communication	and	decisions	were	simple	due	to	the	singular	team	
size.	Below	is	an	example	of	the	billing	report	console:
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	18	of	46	
	
	
Establishment	of	the	basic	account	for	the	project	was,	as	indicated	simple	due	to	
the	author	having	an	existing	EC2	account.	To	provision	a	server,	it	was	necessary	to	
determine	the	configuration	most	appropriate	for	the	project’s	needs,	and	then	
determine	the	Amazon	Availability	Zone	where	the	server	should	be	located.		The	
server	configuration	would	be	decided	by	estimating	the	required	performance	
characteristics	needed	to	host	the	required	software	and	execute	the	application	
features	for	the	anticipated	user	load.		
	
In	this	case,	all	these	parameters	were	scoped	to	be	minimal	for	the	prototype	to	be	
created,	reducing	the	capacity	of	virtual	server	required.	Based	on	the	author’s	
experience	with	Linux	servers	a	small	configuration	would	meet	the	needs	of	the	
project.	Using	the	descriptive	materials	provided	by	Amazon	detailing	the	server	
performance,	a	modest	configuration	of	server	was	selected	to	host	the	project:	
	
• t2.micro:	1	GiB	of	memory,	1	vCPU,	6	CPU	Credits/hour,	EBS-only,	32	bit	or	
64-bit	platform41	
	
When	the	server	was	provisioned	RedHat	was	selected	as	the	OS.	Other	Linux	
distributions	and	even	Windows	operating	systems	were	available	from	Amazon	
EC2.	Red	Hat	was	selected	in	order	to	maintain	the	maximum	compatibility	to	
systems	now	in	use	by	the	federal	systems	currently	approved	for	use	in	production	
systems	per	the	author’s	personal	experience.	Use	of	Red	Hat	Linux	also	makes	
getting	support	and	documentation	of	any	open	source	tools	from	the	Internet	
easier	as	this	is	a	popular	distribution	for	web	based	systems.		Below	is	a	release	
description	from	the	virtual	instance	as	configured	on	EC2	for	this	project:
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	19	of	46	
	
	
By	default	the	server	was	provisioned	in	the	same	zone	as	the	author’s	prior	EC2	
instances,	which	was	us-west-2	(Oregon).	An	Availability	Zone	(zone)	is	the	Amazon	
data	center	used	to	host	the	instance.		Availability	zones	are	designed	to	offer	
isolation	from	each	other	in	the	event	of	service	disruption	in	any	one	zone.	Each	
zone	operates	to	the	published	Service	Level	Agreement	provided	by	Amazon42.	
Understanding	the	concept	of	zone	isolation	and	the	key	provisions	of	the	SLA	
provided	by	a	cloud	vendor	are	important	to	the	success	of	any	cloud	based	project.	
Highly	distributed	applications	or	those	needed	advanced	fault	tolerance	and	load	
balancing	might	choose	to	host	in	multiple	zones.		
	
For	the	purposed	of	this	project	a	single	zone	and	the	SLA	offered	by	Amazon	was	
sufficient	for	successful	operation.	However,	the	default	zone	allocation	was	
problematic	and	was	the	first	unexpected	implementation	issue.	Almost	all	EC2	
features	are	offered	in	the	main	US	zones,	but	us-east-1	(N.	Virginia)	does	have	a	few	
more	options	available	than	us-west-2	(Oregon).	In	order	to	explore	the	
implications	and	effort	needed	to	migrate	between	zones	and	ensure	access	to	all	
potential	features,	the	author	decided	to	migrate	the	project	server	to	the	us-east-1	
zone.	
	
Migration	involved	a	backup	of	the	configured	server,	which	appeared	to	be	prudent	
operational	activity	anyway.	Following	the	backup,	the	general	expectation	was	that	
the	instance	could	be	restored	directly	in	the	desired	location	and	then	the	old	
instance	could	be	removed.	In	general	this	expectation	proved	to	be	sound,	but	the	
exact	steps	were	not	so	direct.	Some	of	the	complexity	was	strictly	due	to	needing	to	
allow	for	replication	time.	Some	of	the	complexity	proved	to	be	due	to	the	use	of	a	
Elastic	IP	address	that	creates	a	public	IP	address	for	the	server.	
	
An	AWS	Elastic	IP	provided	a	static	public	IP	that	can	then	be	associated	with	any	
instance	on	EC2,	allowing	public	DNS	configuration	to	then	be	re-mapped	as	needed	
to	any	collection	of	EC2	servers.		The	author	had	a	prior	Elastic	IP	and	expected	to
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	20	of	46	
just	re-use	it	for	this	project,	but	as	noted	in	the	AWS	EC2	documentation	“An	Elastic	
IP	address	is	for	use	in	a	specific	region	only43”.	This	created	an	issue	when	the	
instance	was	migrated	across	zones.		
	
Once	the	problem	was	understood,	the	solution	was	to	release	the	old	Elastic	IP	and	
generate	a	new	Elastic	IP	that	could	be	mapped	using	DNS.	This	new	Elastic	IP	could	
be	associated	with	the	servers	now	restored	to	the	us-east-1	(N.	Virginia).	This	step	
wound	up	taking	quite	a	bit	of	time	to	debug	and	fix	in	the	first	week,	and	was	to	
lead	to	the	next	unexpected	issues	with	DNS.		
	
None	of	this	work	was	so	complex	as	to	put	the	project	at	risk.	This	required	IP	
change	does	illustrate	the	fact	that	understanding	the	SLA	and	restrictions	of	each	
cloud	feature	is	critical.		Small	issues	like	requiring	a	change	of	IP	address	can	have	
big	implications	for	other	work	in	a	project.	Decisions	to	provision	across	zones	are	
easy	in	the	cloud,	but	can	have	unintended	consequences,	such	as	this	IP	address	
change	and	the	subsequent	work	in	DNS	that	generated.	All	of	these	issues	take	
resources	and	cost	time	in	a	project	schedule.	
	
An	existing	domain,	Juggernit.com,	already	registered	to	the	author	was	the	
expected	target	domain.	Since	one	of	the	requirements	for	the	project	was	to	get	a	
Public	Key	for	the	project	site,	it	was	essential	to	have	a	publicly	registered	Internet	
domain	to	use	for	the	PKI.	Once	the	public	IP	was	re-established	in	the	new	us-east-
1	zone,	and	connectivity	was	confirmed	by	accessing	the	instance	using	SSL,	the	next	
unexpected	task	was	moving	the	DNS	entries	for	the	instance	from	the	current	
registrar.	This	would	also	include	learning	to	configure	the	Amazon	Elastic	Load	
Balancer	and	then	map	the	domain	to	it.		
	
The	load	balancer	forwards	any	HTTP	or	HTTPS	traffic	to	the	HTTPS	secure	instance.	
The	HTTPS	instance	is	the	final	target	for	the	project.	Amazon	Elastic	Load	
Balancing	is	a	service	that	both	distributes	incoming	application	traffic	across	
multiple	Amazon	EC2	instances,	and	allows	for	complex	forwarding	to	support	
forcing	secure	access	to	a	domain.	In	this	instance	while	the	project	would	not	have	
many	servers	in	the	prototype	phase,	the	use	of	load	balancing	would	reflect	the	“to	
be”	state	of	a	final	production	instance	and	allow	secure	operations	in	even	
development	and	preliminary	phases	of	the	project	used	for	the	practicum	scope.	
The	load	balancer	configuration	would	require	a	domain	record	of	the	form:	
	
juggerload1-123781548.us-east-1.elb.amazonaws.com	(A	Record)	
	
As	noted	in	the	Amazon		web	site,	you	should	not	actually	use	an	“A	Record”	in	your	
DNS	for	a	domain	under	load	balancing:	
	
Because	the	set	of	IP	addresses	associated	with	a	LoadBalancer	can	change	over	
time,	you	should	never	create	an	"A	record”	with	any	specific	IP	address.	If	you	want	
to	use	a	friendly	DNS	name	for	your	load	balancer	instead	of	the	name	generated	by
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	21	of	46	
the	Elastic	Load	Balancing	service,	you	should	create	a	CNAME	record	for	the	
LoadBalancer	DNS	name,	or	use	Amazon	Route	53	to	create	a	hosted	zone.	For	more	
information,	see	Using	Domain	Names	With	Elastic	Load	Balancing44.	
	
The	Juggernit.com	domain	was	being	managed	by	Network	Solutions.	Unfortunately	
the	GUI	used	by	Network	Solutions	did	not	allow	for	the	entry	of	the	CNAME	record	
formats	needed	for	the	EC2.		This	required	moving	the	domain	out	of	the	control	of	
Network	Solutions	and	into	the	Amazon	Route53	domain	management	service.	The	
Route	53	service	has	a	variety	of	sophisticated	options,	but	most	critically,	it	
interoperates	well	with	other	Amazon	EC2	offerings	including	the	load	balancing	
features45.	
	
Route	53	is	a	good	example	of	not	only	an	unexpected	issue	that	must	be	overcome	
to	migrate	to	the	cloud,	but	how	the	nature	of	the	cloud	platform	creates	a	small	
“ecosystem”	around	the	cloud	vendor.	Even	when	striving	for	maximum	standards	
compliance	and	openness,	the	nature	of	the	cloud	platform	offerings	such	as	load	
balancing	tend	to	create	interoperations	issues	with	older	Internet	offerings	like	
those	for	DNS	from	Network	Solutions,	which	date	from	the	origin	of	the	
commercial	Internet.	The	author	had	used	Network	Solutions	DNS	since	the	late	
1990’s,	but	in	this	instance	there	was	no	quick	path	to	a	solution	other	than	
migration	to	the	Amazon	Route	53	offering.		
	
The	Juggernit.com	domain	would	need	to	be	linked	to	the	public	IP	of	the	instance,	
and	pragmatically	this	was	only	achievable	via	Route	53	services.	Once	the	situation	
was	analyzed	after	consultation	with	both	Network	Solutions	and	Amazon	support,	
the	decision	to	move	to	Route	53	was	made.	The	changes	were	relatively	quick	and	
simple	using	the	Network	Solutions	and	Amazon	web	consoles.	Waiting	for	the	DNS	
changes	to	propagate	imposed	some	additional	time,	but	as	with	the	zone	migration,	
the	delay	was	not	critical	to	the	project	schedule.	
	
With	the	server,	public	IP	address	and	DNS	issues	resolved	PKI	certificate	
generation	could	be	attempted.	The	author	was	relatively	experienced	in	generation	
and	use	of	PKI	credentials,	but	once	again	the	continued	evolution	of	the	Internet	
environment	and	of	cloud	computing	standards	was	to	provide	unexpected	
challenges	to	the	actual	implementation	experience.		
	
There	are	many	vendors	offering	certificates	suitable	for	this	practicum	project,	
including	Amazon’s	own	new	PKI	service.	The	author	selected	Network	Solutions	as	
a	PKI	provider.	Using	another	commercial	certificate	vendor	offered	an	opportunity	
to	explore	the	interoperation	of	Amazon’s	platform	with	other	public	offerings.	
Network	Solutions	also	has	a	long	history	with	the	commercial	Internet	and	has	a	
well-regarded	if	not	inexpensive	certificate	business46.	
	
The	certificates	were	issued	in	a	package	including	both	the	typical	root	certificate	
most	Internet	developers	are	used	to,	as	well	as	a	number	of	intermediate
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	22	of	46	
certificates	that	were	less	familiar	to	the	author.	In	most	cases	inside	an	enterprise,	
certificates	are	issued	for	enterprise	resources	by	trusted	systems	and	all	the	
intermediate	certificates	are	often	in	place	already.	This	was	not	the	case	for	the	
Amazon	EC2	infrastructure	for	this	project.	In	this	instance,	not	only	was	the	root	
certificate	needed,	but	also	all	the	intermediates	must	be	manually	bundled	into	the	
uploaded	package47.		This	was	a	new	process	for	the	author	and	management	of	
intermediate	certificates	represented	another	unexpected	task.	
	
The	need	to	include	the	intermediate	certificates	in	the	upload	to	Amazon	was	not	
immediately	apparent	and	debugging	the	reason	why	uploading	just	the	root	
certificate	did	not	work	(as	with	prior	systems)	was	going	to	involve	a	major	
research	effort	and	many	hours	of	support	diagnostics	with	each	vendor	involved.	
To	make	the	issue	more	complex,	there	was	documentation	the	Amazon	support	
team	found	for	some	certificate	vendors	and	there	was	documentation	for	cloud	
service	vendors	found	by	Network	Solutions	support,	but	neither	firm	had	
documents	for	working	with	certificates	or	cloud	services	from	the	other	–	this	was	
the	one	case	not	documented	anywhere.	
	
The	Network	solution	certificates	were	issued	using	a	new	naming	format	that	did	
not	follow	either	the	older	Network	Solutions	documentation	to	identify	the	proper	
chaining	order.	Amazon	was	also	not	totally	sure	what	orders	would	constitute	a	
working	package.	A	number	of	orders	had	to	be	tried	and	tested	one	at	a	time	and	
then	the	errors	diagnosed	for	clues	as	to	the	more	correct	order	needed	in	the	
concatenate	command.	On	top	of	this,	the	actual	Linux	command	to	concatenate	and	
hence	chain	the	certificates	was	not	exactly	correct	when	attempted.	This	was	due	
to	the	text	format	at	the	end	of	the	issued	certificates.	Manual	editing	of	the	files	was	
needed	to	fix	the	inaccurate	number	of	delimiters	left	in	the	resulting	text	file.		
	
The	final	command	needed	for	the	Amazon	load	balancer	was	determined	to	be:	
	
amazon_cert_chain.crt;	for	i	in	DV_NetworkSolutionsDVServerCA2.crt	
DV_USERTrustRSACertificationAuthority.crt	AddTrustExternalCARoot.crt	;	do	cat	
"$i"	>>	amazon_cert_chain.crt;	echo	""	>>	amazon_cert_chain.crt;	done	
	
This	back	and	forth	diagnostic	work	for	certificate	chains	represented	a	major	
unexpected	source	of	complexity	and	extra	work.	Again,	this	did	not	disrupt	the	
execution	schedule	beyond	a	recoverable	limit.	The	experience	with	certificate	
chaining	was	a	valuable	learning	opportunity	on	the	pragmatic	use	of	PKI	tools.	The	
author	has	subsequently	come	across	a	number	of	federal	IT	workers	encountering	
these	challenges	as	more	and	more	systems	start	to	include	components	from	
outside	vendors	in	the	internal	enterprise	infrastructure.	
	
After	the	installation	of	the	certificates,	the	next	major	configuration	tasks	were	the	
installation	and	configuration	of	the	web	server	and	the	application	server	
platforms	on	the	EC2	instance.	Nginx	is	the	web	server	used	on	the	project,	and
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	23	of	46	
Node.JS	and	the	Express	framework	is	used	as	the	application	server.	Each	of	these	
subsystems	provided	further	opportunities	for	learning	as	they	were	installed.	
	
Nginx	was	selected	to	provide	an	opportunity	to	gain	experience	with	this	very	
popular	commercial	platform	as	well	as	due	to	its	reputation	for	high	performance	
and	excellent	ability	to	scale	and	support	very	high	traffic	web	sites.	Nginx	was	
designed	from	the	start	to	address	the	C10K	problem	(10,000	concurrent	
connections)	using	an	asynchronous,	non-blocking,	event-driven	connection-
handling	algorithm48.	This	is	very	different	from	the	approach	taken	by	Apache	or	
many	other	available	web	servers.	In	the	author’s	experience	many	web	sites	that	
start	out	with	more	traditional	web	servers	such	as	Apache,	experience	significant	
scale	issues	as	they	grow	due	to	high	volumes	of	concurrent	users.	Starting	with	
Nginx	was	an	attempt	to	avoid	problem	this	by	design,	though	installation	and	
configuration	of	the	web	server	was	more	complex	
	
The	open	source	version	of	Nginx	was	used	for	the	project,	as	a	concession	to	cost	
management.	Downloading	the	correct	code	did	prove	to	be	somewhat	of	an	issue,	
as	it	was	not	easy	to	find	the	correct	repositories	for	the	current	package	and	then	it	
turned	out	the	application	had	to	be	updated	before	it	could	function.		It	was	also	
critical	to	verify	the	firewall	status	once	the	system	was	providing	connections.		
	
The	Amazon	install	of	Red	Hat	Linux	turns	out	to	disable	the	default	firewalls	and	
instead	use	the	Amazon	built	in	firewalls	for	the	site.	This	actually	provides	a	very	
feature	rich	GUI	firewall	configuration	but	is	another	non-standard	operations	detail	
for	those	familiar	with	typical	Red	Hat	stand-alone	server	operations.	The	firewall	
was	another	implementation	detail	that	could	not	easily	be	anticipated.		
	
After	the	firewall	was	sorted	out	there	remained	considerable	research	to	
determine	how	to	configure	the	Nginx	web	server	to	utilize	HTTPS	based	on	the	
certificates	for	the	domain.	Again	the	issue	turned	out	to	be	due	to	the	chaining	
requirements	for	the	certificate.	In	this	case,	Nginx	needed	a	separate	and	different	
concatenated	package	in	this	format:	
	
cat	WWW.JUGGERNIT.COM.crt	AddTrustExternalCARoot.crt	
DV_NetworkSolutionsDVServerCA2.crt	DV_USERTrustRSACertificationAuthority.crt			
>>	cert_chain.crt	
	
After	determining	the	correct	concatenation	format	needed	for	Nginx	and	making	
the	appropriate	uploads	of	concatenated	files,	HTTPS	services	were	available	end	to	
end.	However,	Nginx	does	not	provide	dynamic	web	services.	To	serve	dynamic	
content	it	would	be	necessary	to	install	and	configure	the	Node.JS	Web	Application	
Server	and	the	Express	framework.		
	
Node.JS	(Node)	is	an	open	source	server-based	implementation	of	the	JavaScript	
language	originally	developed	by	Ryan	Dahl	in	2009	using	both	original	code	and
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	24	of	46	
material	from	the	Google	V8	JavaScript	engine.	Most	significantly,	Node	is	event-
driven,	and	uses	a	non-blocking	I/O	model.	This	makes	Node	both	very	fast	and	very	
easy	to	scale.	Node	is	extremely	well	suited	to	situations	like	the	C10K	problem,	and	
web	sites	that	scale	quickly	and	efficiently.	Being	based	on	JavaScript,	Node	is	Object	
oriented	and	offers	a	huge	open	source	support	base	of	modules	and	libraries,	
accessed	using	the	Node	Package	Manager	(NPM).	
	
Express	is	a	minimal	and	flexible	Node.js	web	application	framework	based	on	many	
of	the	ideas	about	web	site	design	and	development	taken	from	the	Ruby	of	Rails	
framework	project.	Express	offers	a	set	of	standard	libraries	and	allows	users	to	mix	
in	many	other	NPM	tool	to	create	web	sites	base	on	the	original	Ruby	on	Rails	
principle	of	“convention	over	configuration”	by	providing	a	common	structure	for	
web	apps49.	
	
Installation	of	Node	on	the	server	was	done	using	the	standard	Red	Hat	Package	
Manager	tools.	Once	Node	is	installed,	the	Node	Package	Manager	(NPM)	system	can	
be	used	to	bootstrap	load	any	other	packages	such	as	the	Express	framework.	In	a	
production	system	it	is	expected	that	the	web	server	and	the	application	server	
would	be	hosted	on	separate	hardware	instances,	but	since	the	practicum	was	to	be	
subject	to	only	a	small	load,	both	serves	can	run	on	the	same	instance	of	Linux	with	
little	impact.	
	
While	Node	comes	with	its	own	dynamic	web	server	to	respond	to	request	for	
dynamic	web	content,	it	is	not	well	suited	to	heavy-duty	serving	on	the	font	end.	
Nginx	is	design	for	the	task	of	responding	to	high	volumes	of	initial	user	inquiries.	
The	combination	of	a	high	performance	web	server	(Nginx)	and	some	number	(N)	
application	server	instances	(such	as	Node)	is	a	widely	accepted	pattern	that	
supports	large	scale	web	systems.	Implementation	of	this	design	pattern	was	a	goal	
of	the	prototype,	to	pre-test	integration	all	the	constituent	components	even	prior	to	
any	load	testing	of	the	system.	Deployment	and	configuration	of	Nginx	and	Node	to	
the	single	Linux	server	fulfills	this	requirement	and	provides	a	working	model	that	
can	be	expanded	to	multiple	servers	as	needed	in	the	future.	
	
In	order	to	smoothly	transfer	web	browser	request	from	users	to	the	application	
server	domain,	the	web	server	must	act	as	a	reverse	proxy	for	the	application	server.	
To	accomplish	this	with	Nginx	requires	the	addition	of	directives	to	the	Nginx	
configuration	file	inside	the	“server”	section	of	the	configuration	file.	These	
commands	will	instruct	the	web	server	to	forward	web	traffic	(HTTPS)	request	for	
dynamic	pages	targeted	at	the	DNS	domain	from	Nginx	to	Node.JS.	This	is	a	
relatively	standard	forwarding	for	Nginx	and	only	requires	a	small	amount	of	
research	to	verify	the	correct	server	configuration	directive	as	shown	in	this	
example	from	the	Nginx	documentation:	
	
server	{	
#here is the code to redirect to node on 3000
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	25	of	46	
location / {
proxy_set_header X-Forwarded-For
$remote_addr;
proxy_set_header Host $http_host;
proxy_pass "http://127.0.0.1:3000";
}}	
	
Note	that	this	is	just	an	example	for	use	on	Local	Host	with	a	Node.JS	engine	running	
on	port	3000	(any	port	will	suffice).	The	critical	issue	is	to	configure	Nginx	to	act	as	
a	reverse	proxy	to	the	Node.JS	engine.	Nginx	will	then	send	traffic	to	the	configured	
port	for	the	Node.JS	application	instance.	Node.JS	and	Express	thenuse	a	RESTFUL	
approach	to	routing	to	the	application	logic	based	on	parsing	the	URL.		
	
The	reverse	proxy	configuration	will	ensure	that	when	traffic	comes	into	the	Nginx	
server	with	the	format	“HTTPS://Juggernit.com/someurl”	it	will	be	handled	by	the	
appropriate	logic	section	of	the	Node.JS	applications	as	configured	in	the	Express	
framework.	The	Express	listener	will	catch	the	traffic	on	port	3000	and	use	the	
route	handler	code	in	express	to	parse	the	URL	after	the	slash	and	ensure	that	the	
proper	logic	for	that	route	is	launched	to	provide	the	service	requested.	This	is	a	
well	established	RESTFUL	web	design	pattern,	first	widely	popularized	in	Ruby	on	
Rails	and	adopted	by	a	number	of	web	frameworks	for	languages	such	as	Java,	Node	
or	Python,	etc.	
	
Implementing	this	pattern	requires	that	both	Nginx	and	Node	be	installed	on	the	
server	to	be	used	as	a	pre-requisite.	In	addition,	the	Express	framework	for	web	
applications	used	by	Node	must	also	be	loaded	to	allow	at	least	a	basic	test	of	the	
forwarding	process.	All	of	this	code	is	available	as	open	source,	so	access	to	the	
needed	components	was	not	a	blocker	for	the	project.	Each	of	these	components	
was	first	loaded	onto	the	Author’s	local	Unix	system	(a	Macbook	Pro	using	OSX).	
This	allowed	for	independent	and	integration	testing	of	the	Nginx	web	server,	the	
Node	application	server	and	the	Express	web	framework.	By	altering	the	
configuration	file	and	adding	the	appropriate	directives	as	noted	above,	the	reverse	
proxy	configuration	and	function	could	be	tested	locally	as	well	against	the	local	
host	IP	address.	
	
After	validation	of	the	configuration	requirements	locally	on	the	Author’s	
development	station,	the	web	server	and	application	server	needed	to	both	be	
installed	on	the	cloud	server.	As	noted	above,	Nginx	was	actually	loaded	on	the	
cloud	server	earlier	to	allow	for	configuration	of	the	domain	and	HTTPS	secure	
access	to	the	site.	This	left	only	the	installation	of	the	Node	and	Express	application	
server	components.	While	conceptually	easy,	in	practice	loading	Node	also	proved	
to	provide	unexpected	challenges.	The	7.x	Red	Hat	version	of	Linux	installed	on	the	
cloud	server	supports	Node	in	the	RPM	package	manager	system.	However	the	
available	RPM	version	was	only	a	0.10.xx	version.	The	current	version	of	Node	is
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	26	of	46	
4.4.x.	The	stable	development	version	installed	on	the	Author’s	local	system	was	
4.4.5	(provided	from	the	Node	web	site).	
There	are	substantial	syntax	and	function	differences	between	the	earlier	version	of	
Node	and	the	current	version.	This	required	that	the	Node	install	on	the	cloud	
server	be	updated,	and	that	proved	to	require	help	from	the	Amazon	support	team,	
as	following	the	default	upgrade	instructions	did	not	work.	Again,	the	delay	was	not	
large,	but	cost	a	couple	days	between	testing,	exploration	of	options,	and	final	
correction	of	the	blocking	issues.	The	final	install	of	a	current	4.4.x	version	of	Node	
required	a	complete	uninstall	of	the	default	version,	as	upgrading	resulted	in	locked	
RPM	packages.		
	
After	cleaning	up	the	old	install	and	loading	the	new	Node	version,	the	cloud	server	
was	conformed	to	the	required	Node	version.	The	Express	framework	was	loaded	
on	the	server	via	the	standard	command	line	Node	Package	Manager	(NPM)	tool.		A	
simple	“Hello	World”	test	web	application	was	created	in	Express/Node	and	again	
the	function	of	both	the	Nginx	and	Node	servers	was	validated.		
	
To	accomplish	the	verification	of	web	and	application	server	function	an	Amazon	
firewall	change	was	required	to	allow	Node	to	respond	directly	to	traffic	pointed	at	
the	IP	address	of	the	server	and	the	port	number	(3000)	of	the	Node	server	was	
needed.	This	firewall	rule	addition	allowed	testing	of	HTTPS	traffic	targeted	at	the	
domain	name,	which	was	served	by	Nginx.	HTTP	traffic	directed	to	the	IP	address	
and	port	3000	could	then	be	tested	at	the	same	time,	as	this	traffic	was	served	by	
the	test	Node/Express	application.	
	
To	complete	the	integration,	the	next	step	was	to	reconfigure	the	Nginx	server	to	act	
as	a	reverse	proxy.	The	Nginx	configuration	file	was	backed	up,	and	then	the	reverse	
proxy	directives	as	shown	above	were	added	to	the	Nginx	configuration	file,	and	
Nginx	was	reloaded	to	reflect	the	changes.	At	this	point,	Nginx	no	longer	provided	its	
default	static	web	page	to	request	sent	to	HTTPS://Juggernit.com.	Instead,	Nginx	
forwarded	the	HTTPS	traffic	to	the	Node	application	server,	still	under	the	secure	
connection,	and	Node	responded	with	the	default	“Hello	World”	page	as	configured	
in	the	Express	test	application.	This	state	represented	a	complete	integration	of	
Nginx	and	Node	for	the	project.	The	server	was	backed	up	and	the	next	stage	of	
work	to	implement	the	upload	logic	to	store	data	on	the	Amazon	S3	object	store	
could	continue.	
	
The	two	major	tasks	required	to	finish	the	site	configuration	and	functional	
completion	of	the	prototype	project	were:	
	
• Establishment	of	an	Amazon	S3	storage	area	(know	as	a	“bucket”	on	
Amazon)	
• Coding	server	and	client	logic	to	access	the	S3	storage	via	HTTPS
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	27	of	46	
The	first	of	these	tasks	could	be	accomplished	directly	via	the	Amazon	EC2	
management	console.	For	the	prototype	there	was	no	requirement	for	a	custom	web	
interface	to	create	S3	storage,	and	no	requirement	for	any	automatic	storage	
assignment	or	management.	In	a	fully	realized	production	application	it	is	possible	
that	application	based	management	of	storage	might	be	desirable,	but	this	is	a	
system	feature	requirement	highly	subject	to	enterprise	policy	and	business	case	
needs.	However,	even	when	using	the	Amazon	interface	to	manage	S3	storage	as	in	
this	project,	there	was	still	a	need	to	consider	the	user	and	group	structure	in	order	
to	manage	access	security	to	the	S3	storage.	
	
As	discussed	earlier	in	the	paper,	a	default	EC2	account	assumes	that	the	owner	is	
granted	all	access	to	all	resources	configured	by	that	owner	in	the	Amazon	cloud	
infrastructure.	For	this	reason,	it	is	important	to	create	separate	administrative	
accounts	for	resources	that	require	finer	grained	access	and	might	also	require	
access	restrictions.	In	a	fully	realized	web	application	hosted	on	local	servers,	this	
user	and	group	management	is	often	done	at	the	application	level.	For	this	
prototype	these	considerations	were	to	be	managed	by	the	Amazon	EC2	interface.	
	
Prior	to	setting	up	a	storage	area	on	the	S3	object	storage,	the	administrator	group	
named	“admins”	was	created,	with	full	permissions	to	manage	the	site	resources.	
Another	group	called	“partners”	with	access	to	the	S3	storage,	but	not	other	site	
resources	for	management	of	servers	was	created.	A	user	named	“testone”	was	then	
created	and	added	to	the	“partners”	group.	The	Author	used	the	primary	Amazon	
identity	to	build	and	manage	the	site,	but	the	administrative	group	was	constructed	
so	that	any	future	web	based	management	functions	could	be	separated	from	user-
oriented	functions	of	the	prototype	web	application.	
	
With	the	users	and	groups	established,	the	S3	storage	called	“ctprojectbucketone”	
was	created	using	the	standard	Amazon	GUI.		Below	is	a	screenshot	showing	this	
bucket:
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	28	of	46	
To	manage	access	rights,	the	S3	storage	was	then	assigned	a	Cross-Origin	Resource	
Sharing		(CORS)	access	policy	that	allowed	GET,	POST	and	PUT	permissions	to	the	
S3	storage.	As	shown	below:	
	
	
	
The	“partner”	group	was	assigned	access	to	this	storage	by	providing	them	with	the	
resource	keys.	With	the	creation	of	the	S3	Object	Storage	“bucket”,	the	remaining	
task	to	reach	functional	proof	of	concept	for	the	prototype	project	was	to	construct	
the	JavaScript	application	code	to	access	the	S3	storage	bucket	securely	from	the	
Internet.		
	
To	create	the	logic	for	bucket	access	there	were	a	number	of	pre-requisite	steps	not	
emphasized	so	far.	The	most	significant	of	these	steps	was	to	develop	at	least	a	basic	
familiarity	with	Node.JS	and	JavaScript.	While	the	author	posses	some	number	of	
years	of	experience	with	using	JavaScript	in	a	casual	manner	for	other	web	
applications,	site	development	in	JavaScript	was	a	very	different	proposition.	Node	
also	has	its	own	“ecosystem”	of	tools	and	libraries,	much	like	any	emerging	open	
source	project.	Some	understanding	of	these	was	also	essential	to	succeed	in	
creating	the	code	required	to	achieve	a	proof	of	concept	function	for	the	prototype	
site.	
	
As	a	starting	point	the	main	Node	site,	https://nodejs.org/en/,		provided	an	
essential	reference.	In	addition	the	author	referenced	two	very	useful	textbooks:	
	
• Kiessling,	Manuel.	"The	node	beginner	book."	Available at [last accessed: 18
March 2013]: http://www. nodebeginner. org	(2011).	
• Kiessling,	Manuel.	“The	Node	Craftsman	Book.	“.Available at [last accessed: 25
October 2015]: https://leanpub.com/nodecraftsman)(2015).	
	
These	proved	to	be	essential	in	providing	both	background	on	Node,	and	some	
guidance	on	the	use	of	the	Express	application	framework.	In	addition	a	number	of	
other	small	Node	library	packages	were	key	to	creating	the	required	code,	
specifically:
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	29	of	46	
	
• Node	Package	Manager	(NPM)	–	a	Node	tool	for	getting	and	managing	Node	
packages	(library’s	of	function).	https://www.npmjs.com		
• EXPRESS-	a	Node	library	providing	an	application	framework	for	RESTFUL	
web	applications	based	on	the	concepts	from	Ruby	on	Rails.	
https://expressjs.com		
• Dotenv	–	a	Node	library	to	allow	loading	environment	variables	from	a	
configuration	file	with	the	extension	.env.	This	was	used	to	allow	passing	
critical	values	such	as	security	keys	for	S3	storage	in	a	secure	manner	from	
the	server	to	a	client.	https://www.npmjs.com/package/dotenv		
• EJS	–	a	Node	library	that	allows	embedded	JavaScript	in	an	HTML	file.	This	
was	used	to	add	the	required	logic	to	communicate	to	the	server	components	
of	the	application	and	then	access	the	S3	bucket	from	the	client	page	using	
values	securely	passed	over	HTTPS.	https://www.npmjs.com/package/ejs		
• AWS-SDK	–	a	Node	library	provided	by	Amazon	to	support	basic	functions	
for	the	S3	storage	service	to	be	accessed	by	Node	code.	
https://www.npmjs.com/package/aws-sdk		
	
As	a	newcomer	to	Node,	the	most	critical	problem	in	creation	of	this	code	for	the	
Author	was	a	lack	of	standard	examples	to	S3	access	using	a	common	approach	at	a	
sufficiently	simple	level	of	clear	explanation.	There	are	actually	at	least	dozens	of	
sample	approaches	to	integration	of	S3	storage	in	Node	projects,	but	almost	all	use	
very	idiosyncratic	sets	of	differing	libraries	or	don’t	address	some	critical	but	basic	
aspect	of	the	prototype	such	as	secure	access.	There	are	also	a	number	of	very	
sophisticated	and	complete	examples	that	are	almost	incompressible	to	the	Node	
novice.	This	inability	to	find	a	clear	and	functional	pattern	to	learn	from	was	a	major	
delay	of	over	a	week	and	a	half	in	completion	of	the	final	steps	of	the	prototype.	
	
After	considerable	reading,	coding,	and	searching	for	reference	models,	the	Author	
finally	came	across	a	tutorial	from	Dr.	Will	Webberly	of	the		Cardiff	University	
School	of	Computer	Science	&	Informatics.	The	author	read,	studied	and	analyzed	
the	example	provided.	The	next	step	was	to	create	several	test	programs	to	adapt	
the	approach	used	by	Dr.	Webberly	in	the	Heroku	cloud	instance	he	documented	to	
a	local	Node	Express	project50.	After	some	trial	and	error	and	some	correspondence	
with	Dr.	Webberly	via	email,	a	working	set	of	code	emerged.	
	
The	final	proof	of	concept	function	was	a	minimal	web	application	based	on	the	
patter	used	by	Dr.	Webberly	and	running	in	a	cloud	based	server	as	an	Express	
application	using	local	variables	on	the	Amazon	EC2	server.	The	server	code	
provides	a	restful	service	over	HTTPS	to	allow	a	client	web	page	executing	on	the	
remote	PC	or	device	to	upload	to	the	S3	storage	using	HTPS.	Below	is	a	screenshot	of	
some	of	the	server	side	code:
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	30	of	46	
	
	
The	upload	page	logic	is	provided	by	the	project	web	site,	as	is	the	back	end	server	
logic.	Since	the	client	page	is	running	on	a	remote	device,	the	entire	transfer	is	done	
using	client	resources.	The	prototype	project	site	provides	only	context	and	security	
data,	but	is	not	used	to	manage	the	upload.	This	frees	server	side	resources	from	the	
work	of	the	transfer	and	thus	creates	a	higher	performance	distributed	system.	The	
exchange	of	logic	and	credentials	is	all	done	over	the	HTTPS	protocol	with	the	client,	
as	is	the	subsequent	file	upload.	This	provides	a	secure	method	of	access	to	the	
cloud	based	S3	storage.		
	
Client	side	data	from	the	partner	is	encrypted	in	transfer	and	no	other	parties	
besides	the	partner	and	the	prototype	project	operations	teams	have	access	to	the	
S3	bucket.	For	purposes	of	the	prototype	only	one	client	identity	and	one	bucket	
were	produced.	In	a	fully	realized	system,	there	could	be	unique	buckets	for	each	
client,	subject	to	the	security	and	business	rules	required	by	the	use	case	of	the	
system.	
	
After	establishing	that	the	Node	logic	was	in	fact	working	and	successfully	uploaded	
files	to	the	S3	storage,	a	small	set	of	sample	health	records	based	on	the	Veterans	
Administration	Disability	Benefits	Questionnaires	(DBQs)51	were	constructed.	
Below	is	a	sample	of	one	of	these	files:
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	31	of	46	
	
	
These	simulated	DBQ	records	were	then	uploaded	as	a	test,	and	verified	as	correct	
using	the	Amazon	S3	GUI	to	access	the	documents	for	verification.	PDF	format	was	
used	for	the	test	files	to	make	them	directly	readable	via	standard	viewing	tools.		
Here	is	a	screenshot	of	the	uploaded	test	files	in	the	Amazon	S3	bucket:	
	
	
	
This	test	represents	uploading	the	sort	of	sensitive	and	confidential	data	expected	to	
be	collected	and	managed	in	any	finished	system	based	on	the	prototype	project.	
While	basic	in	its	function	creation	and	upload	of	these	documents	provided	the	
final	steps	in	the	implementation	of	this	phase	of	the	prototype	project.	Below	is	a	
screen	shot	showing	the	selection	of	a	DBQ	for	upload	using	the	client	side	web	
page:
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	32	of	46	
	
	
	
Storing	these	files	represents	the	completion	of	the	major	design	goals	of	the	project	
and	the	completion	of	the	implementation	phase,	and	the	prototype	project	itself.	
Findings, Conclusions and Next Steps
	
While	achieving	the	successful	secure	upload	of	the	test	documents	to	the	prototype	
meets	the	objectives	set	out	for	this	project,	it	represents	only	the	first	milestone	in	
extending	the	system	to	a	more	full	featured	platform,	and	exploration	of	additional	
topics	of	interest	in	this	area.	The	architecture	implemented	offers	a	good	example	
of	the	latest	non-blocking,	asynchronous	approach	to	serving	web	content.	These	
designs	exploit	CPU	resources	in	very	different	ways	than	traditional	code	and	web	
frameworks,	and	there	is	ample	room	for	scale	and	load	testing	to	measure	the	
actual	capacity	of	these	systems	to	perform	on	64	but	architectures.	
	
The	asynchronous	and	distributed	client	controlled	approach	to	storage	access	also	
provides	an	opportunity	to	test	the	capacity	of	the	S3	interface	to	support	
concurrent	access.	The	Results	should	provide	tuning	direction	about	the	number	
and	partition	rules	for	the	S3	storage.	A	larger	scale	simulation	with	many	more	
virtual	clients	would	be	a	natural	approach	to	measuring	the	capacity	of	this	use	
pattern.	
	
The	web	site	functions	also	offer	an	opportunity	to	expand	the	functionality	of	the	
system	and	demonstrate	more	advance	fine	grain	access	controls	supported	by	the	
user	and	group	model.	At	a	minimum	a	database	of	administrators	and	partners	can	
be	created	to	both	lock	the	site	down	from	casual	access,	and	to	explore	the	minimal	
levels	of	access	needed	to	still	meet	all	functional	needs.	Driving	each	role	to	he	
absolute	lowest	level	of	privilege	will	likely	require	trial	and	error,	but	should	be	a	
benefit	in	assuring	the	site	has	a	minimal	profile	to	any	potential	attackers.
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	33	of	46	
In	addition	to	these	operations	oriented	future	areas	of	research,	once	a	larger	data	
set	is	simulated	the	ability	of	the	S3	storage	to	support	search	indexing	on	the	XML	
data	is	a	rich	area	of	exploration.	There	is	emerging	federal	guidance	on	the	best	
practice	for	meta-data	tagging	of	PII	and	PHI	data,	and	this	prototype	would	allow	
for	an	easy	way	to	create	versions	of	S3	buckets	with	a	variety	of	meta-data	patterns	
and	then	determine	the	most	efficient	search	and	index	options	for	each	with	a	
higher	volume	of	simulated	data.	An	expanded	prototype	could	act	as	a	test	platform	
for	future	production	systems,	revealing	both	physical	and	logical	performance	
metrics.	
	
Each	of	these	future	options	provides	scope	to	expand	the	project,	but	the	basic	
implementation	also	provides	some	important	benefits:	
	
• The	implementation	of	the	system	shows	that	it	is	pragmatic	to	store	
sensitive	data	on	a	public	cloud	based	system	using	PKI	infrastructure	to	
protect	the	data	from	both	external	in	cloud	vendor	access.	
• The	design	of	the	prototype	shows	that	modest	cloud	resources	can	in	fact	be	
used	to	host	a	site	with	the	capacity	to	provide	distributed	workload	using	
HTTPS	to	secure	the	data	streams	and	leverage	client	resources	to	support	
data	upload,	not	just	central	server	capacity.	
• The	prototype	shows	that	it	is	relatively	easy	to	use	Object	Storage	to	acquire	
semi-structured	data	such	as	XML.	This	validates	use	of	an	Object	Store	as	a	
form	of	document	management	tool	beyond	block	storage.	
• The	establishment	of	the	project	in	only	a	few	weeks	with	limited	staff	house	
shows	the	cost	and	speed	advantages	of	the	cloud	as	opposed	to	local	
physical	servers.	
• The	experience	with	both	the	cloud	and	new	web	servers	and	languages	
demonstrates	the	importance	of	flexible	scheduling	and	allowing	for	the	
unexpected.	Even	on	projects	that	leverage	many	off	the	shelf	components	
unexpected	challenges	often	show	up	and	consume	time	and	resources.		
	
The	prototype	produced	as	a	result	of	this	project	does	meet	the	guidance	for	
building	secure	projects	on	a	public	infrastructure.	It	allows	PII	and	PHI	data	to	be	
transferred	to	an	enterprise	via	secure	web	services,	and	demonstrates	an	approach	
that	can	satisfy	many	enterprises	and	the	guidelines	for	HIPAA	and	HiTech	data	
handling.	The	architecture	used	demonstrates	how	a	scalable	web	service	model	can	
be	implemented	using	a	cloud	infrastructure	by	a	small	team	in	a	limited	time.	The	
model	does	only	provide	a	basic	proof	of	concept	but	offers	easy	opportunities	to	
expand	to	explore	a	number	of	additional	questions.	As	such	the	resulting	site	can	
be	considered	a	success	at	meetings	it	design	goals,	and	the	information	generated	
in	the	site	development	can	be	employed	by	both	the	Author	and	others	for	future	
work	in	cloud	computing	implementation	for	secure	digital	document	storage.
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	34	of	46	
References
	
1. Oppenheim,	A.	L.	(Ed.).	(1967).	Letters	from	Mesopotamia:	Official	business,	
and	private	letters	on	clay	tablets	from	two	millennia.	University	of	Chicago	
Press.		Page	1-10	
	
2. Fang,	I.	(2014).	Alphabet	to	Internet:	Media	in	Our	Lives.	Routledge.			Page	
90-91	
	
3. Noam,	E.	M.	(1992).	Telecommunications	in	Europe	(pp.	363-368).	New	York:	
Oxford	University	Press.		Page	15-17	
	
4. Moroney,	R.	L.	(1983).	History	of	the	US	Postal	Service,	1775-1982	(Vol.	100).	
The	Service.	
	
5. John,	R.	R.	(2009).	Spreading	the	news:	The	American	postal	system	from	
Franklin	to	Morse.	Harvard	University	Press.		Page	1-25	
	
6. Johnson,	P.	(2013).	The	birth	of	the	modern:	world	society	1815-1830.	
Hachette	UK.	
	
7. Currie,	R.	(2013,	May	29).	HistoryWired:	A	few	of	our	favorite	things.	
Retrieved	May	15,	2016,	from	http://historywired.si.edu/detail.cfm?ID=324		
	
8. Standage,	T.	(1998).	The	Victorian	Internet:	The	remarkable	story	of	the	
telegraph	and	the	nineteenth	century's	online	pioneers.	London:	Weidenfeld	
&	Nicolson.	
	
9. Yates,	J.	(1986).	The	telegraph's	effect	on	nineteenth	century	markets	and	
firms.	Business	and	Economic	History,	149-163.	
	
10. Du	Boff,	R.	B.	(1980).	Business	Demand	and	the	Development	of	the	
Telegraph	in	the	United	States,	1844–1860.	Business	History	Review,	54(04),	
459-479.	
	
11. Gordon,	J.	S.	(2002).	A	thread	across	the	ocean:	the	heroic	story	of	the	
transatlantic	cable.	Bloomsbury	Publishing	USA.	
	
12. Ross,	C.	D.	(2000).	Trial	by	fire:	science,	technology	and	the	Civil	War.	White	
Mane	Pub.	
	
13. Bates,	D.	H.	(1995).	Lincoln	in	the	telegraph	office:	recollections	of	the	United	
States	Military	Telegraph	Corps	during	the	Civil	War.	U	of	Nebraska	Press.
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	35	of	46	
14. Coopersmith,	J.	(2015).	Faxed:	The	Rise	and	Fall	of	the	Fax	Machine.	JHU	
Press.	
	
15. Cortada,	J.	W.	(2000).	Before	the	computer:	IBM,	NCR,	Burroughs,	and	
Remington	Rand	and	the	industry	they	created,	1865-1956.	Princeton	
University	Press.	
	
16. Smith,	E.	(2016,	June	14).	The	Strange	History	of	Microfilm,	Which	Will	Be	
With	Us	for	Centuries.	Retrieved	June	22,	2016,	from	
http://www.atlasobscura.com/articles/the-strange-history-of-microfilm-
which-will-be-with-us-for-centuries		
	
17. Bush,	V.,	&	Think,	A.	W.	M.	(1945).	The	Atlantic	Monthly.	As	we	may	think,	
176(1),	101-108.	
	
18. Mohamed,	A.	(2015,	November).	A	history	of	cloud	computing.	Retrieved	July	
07,	2016,	from	http://www.computerweekly.com/feature/A-history-of-
cloud-computing		
	
19. Electric	Light	and	Power	System	-	The	Edison	Papers.	(n.d.).	Retrieved	July	13,	
2016,	from	http://edison.rutgers.edu/power.htm		
	
20. The	discovery	of	electicity	-	CitiPower	and	Powercor.	(n.d.).	Retrieved	July	13,	
2016,	from	https://www.powercor.com.au/media/1251/fact-sheet-
electricity-in-early-victoria-and-through-the-years.pdf		
	
21. Powering	A	Generation:	Power	History	#1.	(n.d.).	Retrieved	July	13,	2016,	
from	http://americanhistory.si.edu/powering/past/prehist.htm		
	
	
22. Electricity	-	Switch	Energy	Project	Documentary	Film	and	...	(n.d.).	Retrieved	
July	13,	2016,	from	
http://www.switchenergyproject.com/education/CurriculaPDFs/SwitchCur
ricula-Secondary-Electricity/SwitchCurricula-Secondary-
ElectricityFactsheet.pdf	
	
23. Tita,	B.	(2012,	November	6).	A	Sales	Surge	for	Generator	Maker	-	WSJ.	
Retrieved	July	13,	2016,	from	
http://www.wsj.com/articles/SB100014241278873248941045781033340
72599870	
	
24. Residential	Generators,	3rd	Edition	-	U.S.	Market	and	World	Data.	(n.d.).	
Retrieved	July	13,	2016,	from	
https://www.giiresearch.com/report/sbi227838-residential-generators-
3rd-edition-us-market-world.html
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	36	of	46	
	
	
25. Barroso,	L.	A.,	Clidaras,	J.,	&	Hölzle,	U.	(2013).	The	datacenter	as	a	computer:	
An	introduction	to	the	design	of	warehouse-scale	machines.	Synthesis	
lectures	on	computer	architecture,	8(3),	1-154.	
	
26. West,	B.	C.	(2014).	Factors	That	Influence	Application	Migration	To	Cloud	
Computing	In	Government	Organizations:	A	Conjoint	Approach.	
	
27. Total	Cost	of	Ownership.	(2016).	Retrieved	July	06,	2016,	from	
http://www.backuparchive.awstcocalculator.com/		
	
28. United	States.	White	House	Office,	&	Obama,	B.	(2011).	International	Strategy	
for	Cyberspace:	Prosperity,	Security,	and	Openness	in	a	Networked	World.	
White	House.	
	
29. Kundra,	V.	(2011).	Federal	cloud	computing	strategy.	
	
30. VanRoekel,	S.	(2011,	December	8).	MEMORANDUM	FOR	CHIEF	
INFORMATION	OFFICERS.	Retrieved	July	13,	2016,	from	
https://www.fedramp.gov/files/2015/03/fedrampmemo.pdf		
	
31. Code,	U.	S.	(1999).	Gramm-Leach-Bliley	Act.	Gramm-Leach-Bliley	Act/AHIMA,	
American	Health	Information	Management	Association.	
	
32. What	is	Sensitive	Data?	Protecting	Financial	Information	...	(2008).	Retrieved	
June	19,	2016,	from	
http://ist.mit.edu/sites/default/files/migration/topics/security/pamphlets/
protectingdata.pdf		
	
33. Government	Accountability	Office	(GAO)	Report	08-343,	Protecting	
Personally	Identifiable	Information,	January	2008,	
http://www.gao.gov/new.items/d08343.pdf	
	
34. (Wilshusen,	G.	C.,	&	Powner,	D.	A.	(2009).	Cybersecurity:	Continued	efforts	
are	needed	to	protect	information	systems	from	evolving	threats	(No.	GAO-
10-230T).	GOVERNMENT	ACCOUNTABILITY	OFFICE	WASHINGTON	DC.)	
	
35. McCallister,	E.,	Grance,	T.,	&	Scarfone,	K.	(2010,	April).	Guide	to	Protecting	the	
Confidentiality	of	Personally	...	Retrieved	July	13,	2016,	from	
http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf	
	
36. Act,	A.	C.	C.	O.	U.	N.	T.	A.	B.	I.	L.	I.	T.	Y.	(1996).	Health	insurance	portability	and	
accountability	act	of	1996.	Public	law,	104,	191.
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	37	of	46	
37. Graham,	C.	M.	(2010).	HIPAA	and	HITECH	Compliance:	An	Exploratory	Study	
of	Healthcare	Facilities	Ability	to	Protect	Patient	Health	Information.	
Proceedings	of	the	Northeast	Business	&	Economics	Association.	
	
38. Anderson,	H.	(2010,	February	8).	The	Essential	Guide	to	HITECH	Act.	
Retrieved	June	19,	2016,	from	
http://www.healthcareinfosecurity.com/essential-guide-to-hitech-act-a-
2053		
	
39. Dimov,	I.	(2013,	June	20).	Guiding	Principles	in	Information	Security	-	InfoSec	
Resources.	Retrieved	July	09,	2016,	from	
http://resources.infosecinstitute.com/guiding-principles-in-information-
security/		
	
40. Amazon	Web	Services	(AWS)	-	Cloud	Computing	Services.	(n.d.).	Retrieved	
July	10,	2016,	from	https://aws.amazon.com/		
	
41. EC2	Instance	Types	–	Amazon	Web	Services	(AWS).	(2016).	Retrieved	July	10,	
2016,	from	https://aws.amazon.com/ec2/instance-types/		
	
42. Regions	and	Availability	Zones.	(2016,	January).	Retrieved	July	13,	2016,	
from	http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-
regions-availability-zones.html	
	
43. Elastic	IP	Addresses.	(2016).	Retrieved	July	10,	2016,	from	
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-
addresses-eip.html		
	
44. AWS	|	Elastic	Load	Balancing	-	Cloud	Network	Load	Balancer.	(2016).	
Retrieved	July	10,	2016,	from	
https://aws.amazon.com/elasticloadbalancing/		
	
45. AWS	|	Amazon	Route	53	-	Domain	Name	Server	-	DNS	Service.	(2016).	
Retrieved	July	10,	2016,	from	https://aws.amazon.com/route53/		
	
46. SSL	Security	Solutions.	(2016).	Retrieved	July	10,	2016,	from	
http://www.networksolutions.com/SSL-certificates/index.jsp		
	
47. What	is	the	SSL	Certificate	Chain?	(2016).	Retrieved	July	10,	2016,	from	
https://support.dnsimple.com/articles/what-is-ssl-certificate-chain/	
	
48. Ellingwood,	J.	(2015,	January	28).	Apache	vs	Nginx:	Practical	Considerations	|	
DigitalOcean.	Retrieved	July	10,	2016,	from	
https://www.digitalocean.com/community/tutorials/apache-vs-nginx-
practical-considerations
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	38	of	46	
	
49. Node.js	Introduction.	(2016).	Retrieved	July	10,	2016,	from	
http://www.tutorialspoint.com/nodejs/nodejs_introduction.htm		
	
50. Webberly,	W.	(2016,	May	23).	Direct	to	S3	File	Uploads	in	Node.js	|	Heroku	
Dev	Center.	Retrieved	July	12,	2016,	from	
https://devcenter.heroku.com/articles/s3-upload-node#summary		
	
51. Compensation.	(2013,	October	22).	Retrieved	July	12,	2016,	from	
http://www.benefits.va.gov/compensation/dbq_disabilityexams.asp
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	39	of	46	
Source Code Listings
	
App.js – this is the server side logic for the project:
	
/*	
Cecil	Thornhill	
5/26/2016	
Based	on	code	examples	and	samples	from	Will	Webberly	and	Amazon	for	S3	
uploads	
*/	
	
/*	
In	learning	how	to	interface	to	S3	via	Node	JS	and	JavaScript	I	started	with	code	
from	a	tutorial	provided	by	Dr.	Will	Webberly	who	was	a	computer	science	lecturer	
at	Cardiff	University	and	is	now	CTO	at	Simply	Di	Ideas.	Will	was	kind	enough	to	
correspond	with	my	and	address	questions	on	the	concepts	and	use	cases	involved	
in	my	project.	The	original	article	I	referenced	is	at:	
https://devcenter.heroku.com/articles/s3-upload-node#initial-setup	
*/	
	
/*	
This	is	the	main	logic	for	the	server	side	of	the	proof	of	concept	demo	for	my	project.	
The	code	here	supports	the	features	required	to		
allow	the	client	to	security	load	a	file	to	the	S3	storage	site.	The	simple	proof	pages	
and	this	core	logic	do	not	attempt	to	implement	
any	user	authentication,	authorization	or	administration	of	the	site.	Those	funcitons	
are	pre-selected	via	the	structure	of	the	users	and	groups	
built	in	the	S3	interface	for	this	demo.	All	these	aspects	would	be	expected	in	a	more	
full	featured	site	design,	but	are	not	required	to		
establish	the	functional	proof	of	concept	for	the	main	secure	upload	of	files	
functionality.	
*/	
	
/*	
	Licensed	under	the	Apache	License,	Version	2.0	(the	"License");	you	may	not	use	
this	file	except	in	compliance	with	the	License.	You	may	obtain	a	copy	of	the	License	
at	
	http://www.apache.org/licenses/LICENSE-2.0	
	Unless	required	by	applicable	law	or	agreed	to	in	writing,	software	distributed	
under	the	License	is	distributed	on	an	"AS	IS"	BASIS,	WITHOUT	WARRANTIES	OR	
CONDITIONS	OF	ANY	KIND,	either	express	or	implied.	See	the	License	for	the	
specific	language	governing	permissions	and	limitations	under	the	License.	
*/
Secure File Management Using the Public Cloud
Masters	of	Cybersecurity	Practicum	Project,	ISM	6905–	Cecil	Thornhill	
	
Masters	Project	CThornhill	v2	final.docx7/13/16	 Page	40	of	46	
	
/*	
	*	Import	required	packages.	
	*	Packages	should	be	installed	with	"npm	install".	
	*/	
	
/*		
CT	-	I	am	using	local	variable	for	the	development	versions	of	this	demo	site.		
Below	I	requre	dotenv	to	allow	local	config	management,	so	this	demo	can	
runwithout	setting	envirionment	variables	on	the	server	
which	is	the	more	correct	final	operations	configuration	practice	on	a	deployed	
systems	to	prevent	exposing	the	values	in	the		
open	production	environment.	Of	course	it	is	much	easier	to	manage	local	values	
from	this	resource	file	in	the	development	phase	
so	that	is	the	way	I	went	for	the	the	current	demo	code.	
*/	
var	dotenv	=	require('dotenv');	
dotenv.load();	
/*	
To	ensure	that	we	got	the	values	we	expexted	I	also	show	the	variables	now	in	
process.env	-	now	with	the	values	from	the	.env	added	
on	the	console.	Of	course	this	is	not	something	to	do	in	the	final	production	system.	
*/	
	
console.log(process.env)	
	
const	express	=	require('express');	
const	aws	=	require('aws-sdk');	
	
/*	
	*	Set-up	and	run	the	Express	app.	
CT	-	note	we	are	ruuning	on	port	3000	in	this	case.	It	is	important	to	foraward	your	
web	traffic	from	the	NGINX	server	to		
the	proper	port	via	setting	up	the	reverse	proxy	configuration	in	the	NGINX	server,	
so	that	traffic	gets	through	from	the	web	
server	to	the	applicaiton	server.	
	*/	
const	app	=	express();	
app.set('views',	'./views');	
app.use(express.static('./public'));	
app.engine('html',	require('ejs').renderFile);	
app.listen(process.env.PORT	||	3000);	
	
/*	
	*	Load	the	S3	information	from	the	environment	variables.
Masters Project CThornhill v2 final
Masters Project CThornhill v2 final
Masters Project CThornhill v2 final
Masters Project CThornhill v2 final
Masters Project CThornhill v2 final
Masters Project CThornhill v2 final

More Related Content

Viewers also liked

LinkedTV D6.4 Scenario Demonstrators v2
LinkedTV D6.4 Scenario Demonstrators v2LinkedTV D6.4 Scenario Demonstrators v2
LinkedTV D6.4 Scenario Demonstrators v2LinkedTV
 
MANAGEMENT RESEARCH PROJECT
MANAGEMENT RESEARCH  PROJECTMANAGEMENT RESEARCH  PROJECT
MANAGEMENT RESEARCH PROJECTAllcance Digital
 
Transport Research Project Final Report
Transport Research Project Final ReportTransport Research Project Final Report
Transport Research Project Final ReportKartik Tiwari
 
CRM IN TELESHOPPING INDUSTRY
CRM IN TELESHOPPING INDUSTRYCRM IN TELESHOPPING INDUSTRY
CRM IN TELESHOPPING INDUSTRYShriyansh Gupta
 
MANAGEMENT RESEARCH PROJECT
MANAGEMENT RESEARCH PROJECTMANAGEMENT RESEARCH PROJECT
MANAGEMENT RESEARCH PROJECTERICK MAINA
 
Market Research on Distribution System of Pepsi Project Report
Market Research on Distribution System of Pepsi Project ReportMarket Research on Distribution System of Pepsi Project Report
Market Research on Distribution System of Pepsi Project ReportAbhishek Keshri
 

Viewers also liked (6)

LinkedTV D6.4 Scenario Demonstrators v2
LinkedTV D6.4 Scenario Demonstrators v2LinkedTV D6.4 Scenario Demonstrators v2
LinkedTV D6.4 Scenario Demonstrators v2
 
MANAGEMENT RESEARCH PROJECT
MANAGEMENT RESEARCH  PROJECTMANAGEMENT RESEARCH  PROJECT
MANAGEMENT RESEARCH PROJECT
 
Transport Research Project Final Report
Transport Research Project Final ReportTransport Research Project Final Report
Transport Research Project Final Report
 
CRM IN TELESHOPPING INDUSTRY
CRM IN TELESHOPPING INDUSTRYCRM IN TELESHOPPING INDUSTRY
CRM IN TELESHOPPING INDUSTRY
 
MANAGEMENT RESEARCH PROJECT
MANAGEMENT RESEARCH PROJECTMANAGEMENT RESEARCH PROJECT
MANAGEMENT RESEARCH PROJECT
 
Market Research on Distribution System of Pepsi Project Report
Market Research on Distribution System of Pepsi Project ReportMarket Research on Distribution System of Pepsi Project Report
Market Research on Distribution System of Pepsi Project Report
 

Similar to Masters Project CThornhill v2 final

CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...ambitlick
 
Secure Cloud Storage
Secure Cloud StorageSecure Cloud Storage
Secure Cloud StorageALIN BABU
 
Rapport eucalyptus cloud computing
Rapport eucalyptus cloud computingRapport eucalyptus cloud computing
Rapport eucalyptus cloud computingBilal ZIANE
 
Rapport eucalyptus cloud computing
Rapport eucalyptus cloud computingRapport eucalyptus cloud computing
Rapport eucalyptus cloud computingBilal ZIANE
 
Cloud Computing Adoption and the Impact of Information Security
Cloud Computing Adoption and the Impact of Information SecurityCloud Computing Adoption and the Impact of Information Security
Cloud Computing Adoption and the Impact of Information SecurityBelinda Edwards
 
MSc Dissertation on cloud Deekshant Jeerakun
MSc Dissertation on cloud Deekshant JeerakunMSc Dissertation on cloud Deekshant Jeerakun
MSc Dissertation on cloud Deekshant JeerakunDeekshant Jeerakun. MBCS
 
Integrated-Security-Solution-for-the-virtual-data-center-and-cloud
Integrated-Security-Solution-for-the-virtual-data-center-and-cloudIntegrated-Security-Solution-for-the-virtual-data-center-and-cloud
Integrated-Security-Solution-for-the-virtual-data-center-and-cloudJohn Atchison
 
Two competing approaches to hybrid cloud
Two competing approaches to hybrid cloudTwo competing approaches to hybrid cloud
Two competing approaches to hybrid cloudPrincipled Technologies
 
Cloud computing security_perspective
Cloud computing security_perspectiveCloud computing security_perspective
Cloud computing security_perspectivesolaigoundan
 
Flask: Flux Advanced Security Kernel. A Project Report
Flask: Flux Advanced Security Kernel. A Project ReportFlask: Flux Advanced Security Kernel. A Project Report
Flask: Flux Advanced Security Kernel. A Project ReportLuis Espinal
 
Project final report
Project final reportProject final report
Project final reportALIN BABU
 
Cisco Cloud Computing White Paper
Cisco Cloud Computing White PaperCisco Cloud Computing White Paper
Cisco Cloud Computing White Paperlamcindoe
 
Intrusion Detection on Public IaaS - Kevin L. Jackson
Intrusion Detection on Public IaaS  - Kevin L. JacksonIntrusion Detection on Public IaaS  - Kevin L. Jackson
Intrusion Detection on Public IaaS - Kevin L. JacksonGovCloud Network
 
Security issues in cloud
Security issues in cloudSecurity issues in cloud
Security issues in cloudWipro
 
The Death Of Computer Forensics: Digital Forensics After the Singularity
The Death Of Computer Forensics: Digital Forensics After the SingularityThe Death Of Computer Forensics: Digital Forensics After the Singularity
The Death Of Computer Forensics: Digital Forensics After the SingularityTech and Law Center
 

Similar to Masters Project CThornhill v2 final (20)

Seminor Documentation
Seminor DocumentationSeminor Documentation
Seminor Documentation
 
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...
CloudAnalyst: A CloudSim-based Tool for Modelling and Analysis of Large Scale...
 
Secure Cloud Storage
Secure Cloud StorageSecure Cloud Storage
Secure Cloud Storage
 
Rapport eucalyptus cloud computing
Rapport eucalyptus cloud computingRapport eucalyptus cloud computing
Rapport eucalyptus cloud computing
 
Rapport eucalyptus cloud computing
Rapport eucalyptus cloud computingRapport eucalyptus cloud computing
Rapport eucalyptus cloud computing
 
Cloud Computing Adoption and the Impact of Information Security
Cloud Computing Adoption and the Impact of Information SecurityCloud Computing Adoption and the Impact of Information Security
Cloud Computing Adoption and the Impact of Information Security
 
MSc Dissertation on cloud Deekshant Jeerakun
MSc Dissertation on cloud Deekshant JeerakunMSc Dissertation on cloud Deekshant Jeerakun
MSc Dissertation on cloud Deekshant Jeerakun
 
Integrated-Security-Solution-for-the-virtual-data-center-and-cloud
Integrated-Security-Solution-for-the-virtual-data-center-and-cloudIntegrated-Security-Solution-for-the-virtual-data-center-and-cloud
Integrated-Security-Solution-for-the-virtual-data-center-and-cloud
 
Two competing approaches to hybrid cloud
Two competing approaches to hybrid cloudTwo competing approaches to hybrid cloud
Two competing approaches to hybrid cloud
 
Microservices.pdf
Microservices.pdfMicroservices.pdf
Microservices.pdf
 
Cloud computing security_perspective
Cloud computing security_perspectiveCloud computing security_perspective
Cloud computing security_perspective
 
Flask: Flux Advanced Security Kernel. A Project Report
Flask: Flux Advanced Security Kernel. A Project ReportFlask: Flux Advanced Security Kernel. A Project Report
Flask: Flux Advanced Security Kernel. A Project Report
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Project final report
Project final reportProject final report
Project final report
 
Cisco Cloud Computing White Paper
Cisco Cloud Computing White PaperCisco Cloud Computing White Paper
Cisco Cloud Computing White Paper
 
Intrusion Detection on Public IaaS - Kevin L. Jackson
Intrusion Detection on Public IaaS  - Kevin L. JacksonIntrusion Detection on Public IaaS  - Kevin L. Jackson
Intrusion Detection on Public IaaS - Kevin L. Jackson
 
venpo045-thesis-report
venpo045-thesis-reportvenpo045-thesis-report
venpo045-thesis-report
 
Security issues in cloud
Security issues in cloudSecurity issues in cloud
Security issues in cloud
 
The Death Of Computer Forensics: Digital Forensics After the Singularity
The Death Of Computer Forensics: Digital Forensics After the SingularityThe Death Of Computer Forensics: Digital Forensics After the Singularity
The Death Of Computer Forensics: Digital Forensics After the Singularity
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 

Masters Project CThornhill v2 final

  • 1. Secure File Management Using the Public Cloud A Masters in Cybersecurity Practicum Project Cecil Thornhill ABSTRACT The Project explores the history and evolution of document management tools through the emergence of cloud computing and documents the development of a basic cloud computing web based system for secure transmission and storage of confidential information on a public cloud following guidance for federal computing systems.
  • 2. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 2 of 46 Introduction ................................................................................................................3 Background of the Driving Problem – Ur to the Cloud ..................................................3 The Cloud in Context – A New Way to Provide IT .........................................................7 Cloud Transformation Drivers......................................................................................8 The Federal Cloud & the Secure Cloud Emerge.......................................................... 10 Designing a Project to Demonstrate Using the Cloud ..................................................13 Planning the Work and Implementing the Project Design ...........................................15 Findings, Conclusions and Next Steps.........................................................................32 References.................................................................................................................34 Source Code Listings ..................................................................................................39 Test Document ..........................................................................................................46
  • 3. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 3 of 46 Introduction This paper describes the design and development of a system to support the encrypted transfer of confidential and sensitive Personally Identifiable Information (PII) and Personal Healthcare Information (PHI) to a commercial cloud based object storage system. This work was undertaken as a Practicum project for the Masters in Cybersecurity program, and as such was implemented within the time limits of a semester session and was completed by a single individual. This prototype represents a basic version of a web-based system implemented on a commercial cloud based object storage system. The prototype demonstrates an approach to implementation suitable for use by government or private business for the collection of data subject to extensive regulation such as HIPAA/HiTech healthcare data, or critical financial data. A general review of the context of the subject area and history of document management are provided below, along with a review of the implementation efforts. Findings and results are provided both for the implementation efforts as well as the actual function of the system. Due to the restricted time available for this project, the scope was limited to fit the schedule. Only basic features were implemented per the design guidance documented below. To explore future options for expansion of the project several experiments designed to further analyze the system capacity and performance are outlined below. These options represent potential future directions to further explore this aspect of secure delivery of information technology functions using cloud-based platforms. Background of the Driving Problem – Ur to the Cloud The need to exchange documents containing important information between individuals, and enterprises is a universal necessity in any organized human society. Since the earliest highly organized human cultures information about both private and government activities has been recorded on physical media and exchanged between parties1. Various private and government couriers were used to exchange documents in the ancient and classical world. In the West, this practice of private courier service continued after the fall of Rome. The Catholic Church acted as a primary conduit for document exchange and was itself a prime consumer of document exchange services2. In the West, after the renaissance the growth of both the modern nation state and the emergence of early commerce and capitalism were both driven by and supportive of the growth of postal services open to private interest. The needs of commerce quickly came to dominate the traffic, and shape the evolution of document exchange via physical media3. In the early United States the critical role of publicly accessible document exchange was widely recognized by the founders of the new democracy. The Continental Congress in1775 established the US Postal
  • 4. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 4 of 46 Service to provide document communications services to the emerging new government prior to the declaration of independence4. As a new and modern nation cost effective, efficient document exchange services from the new post office were essential to the growth of the US economy5. The growth of the US as a political and economic power unfolds in parallel with the Industrial Revolution in England and Europe as well as the overall transition of the Western world to what can be described as modern times. New science, new industry and commerce and new political urgencies all drive the demand for the transmission of documents and messages in ever faster and more cost effective forms6. It is within this accelerating technical and commercial landscape that the digital age is born in the US when Samuel Morse publicly introduces the telegraph to the world in 1844 with the famous question “What Hath God Wrought?” sent from the US Capitol to the train statin in Baltimore, Maryland7. Morse’s demonstration was the result of years of experiment and effort by hundreds of people in scores of countries, but has come to represent the singular moment of creation for the digital era and marks the beginning of the struggle to understand and control the issues stemming from document transmission in the digital realm. All of the issues we face emerge from this time forward, such as: • Translation of document artifacts created by people into digital formats and the creation of human readable documents from digital intermediary formats. • The necessity to authenticate the origin of identical digital data sets and to manage the replication of copies. • The need to enforce privacy and security during the transmission process across electronic media. Many of these problems have similar counterparts in the physical document exchange process, but some such as the issue of an indefinite number of identical copies were novel and all these issues require differing solutions for a physical or digital environment8. The telegraph was remarkable successful due to its compelling commercial, social and military utility. As Du Boff and Yates note in their research: “By 1851, only seven years after the inauguration of the pioneer Baltimore-to- Washington line, the entire eastern half of the US up to the Mississippi River was connected by a network of telegraph wires that made virtually instantaneous communication possible. By the end of another decade, the telegraph had reached the west coast, as well9, 10 “. The reach of the telegraph went well beyond the borders of the US, or even the shores of any one continent by 1851. That same year Queen Victoria sent president
  • 5. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 5 of 46 Buchannan a congratulatory telegram to mark the successful completion of the Anglo-American transatlantic cable project11. Digital documents now had global scope, and the modern era of document exchange and management had truly arrived. The US Civil war would be largely shaped by the technical impact of the telegraph and railroad. Both the North and South ruthlessly exploited advances in transportation and communication during the conflict12. Centralization of information management and the need to confidentiality, integrity, and availability all emerged as issues. Technical tools like encryption rapidly became standard approaches to meeting these needs13. The patterns of technical utilization during the war provided a model for future civil government and military use of digital communications and for digital document transmission. The government’s use patterns then became a lesson in the potential for commercial use of the technology. Veterans of the war went on to utilize the telegraph as an essential tool in post war America’s business climate. Rapid communication and a faster pace in business became the norm as the US scaled up its industry in the late 19th century. Tracking and managing documents became an ever-increasing challenge along with other aspects of managing the growing and geographically diverse business enterprises emerging. By the turn of the 20th century the telegraph provided a thriving and vital alternative to the physical transmission of messages and documents. Most messages and documents to be sent by telegraph were either entered directly as digital signals sent originally by telegraphy, or transcribed by a human who read and re-entered the data from the document. However, all of the modern elements of digital document communication existed and were in some form of use, including the then under-utilized facsimile apparatus14. As the 20th century progresses two more 19th century technologies which would come to have a major impact on document interchange and management would continue to evolve in parallel with the telegraph: mechanical/electronic computation and photography. Mechanical computation tracing its origin from Babbage’s Analytical Engine would come to be indispensible in tabulating and managing the data needed to run an increasingly global technical and industrial society15. Photography not only provided a new and accurate record of people and events, but with the development of fine grained films in the 20th century, microfilm would come to be the champion of high density document and hence information storage media. Despite some quality drawbacks, the sheer capacity and over 100- year shelf life of microfilm made it very attractive as a document storage tool. By the 1930’s microfilm had become the bulk document storage medium of choice for publications and libraries as well as the federal government16.
  • 6. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 6 of 46 The experience with early electronic computers in World War II and familiarity with microfilm made merging the two technologies appear as a natural next step to forward thinkers. In 1945 Vannevar Bush, the wartime head of the Office of Scientific Research and Development (OSRD) would propose the Memex. Memex was designed as an associative information management device combining electronic computer-like functions with microfilm storage, but was not fully digital nor was it networked17. In many ways this project pointed the way to modern information management tools that were introduced in the 1960’s but not fully realized until the end of the 20th century. Bush, V., & Think, A. W. M. (1945). The Atlantic Monthly. As we may think, 176(1), 101-108. The commercial release and rapid adoption of modern computer systems such as the groundbreaking IBM 360 in the 1960’s, and series of mini-computer systems in the 1970 such as the DEC VAX greatly expanded the use of digital documents and created the modern concept of a searchable database filled with data from these documents. The development of electronic document publishing systems in the 1980’s allowed for a “feedback loop” that allowed digital data to go back into printed documents, generating a need to manage these new documents with the computers used to generate them from the data and user input. The growth of both electronic data exchange and document scanning in the 1990’s, to began to replace microfilm. Many enterprises realized the need to eliminate paper and only work with electronic versions of customer documents. The drive for more efficient and convenient delivery of services as well as the need to reduce the cost of managing paper records continues to drive the demand for electronic document management tools. By the 1990’s large-scale document management and document search systems such as FileNet and its competitors began to emerge into the commercial market. The emergence of fully digital document management systems in wide spread use by the turn of the 21st century brings the story of document management into the present day, where we see a predominance of electronic document systems, and an expectation of quick and universal access to both the data and documents as artifacts in every aspect of life, including activities that are private, commercial and interactions with the government. As the demand for large electronic document management infrastructures the scale of these systems and related IT infrastructure continued to expand, placing significant cost stress on the enterprise. There was a boom in the construction of data centers to house the infrastructure. At the same time that the physical data centers for enterprises were expanding, a new model of enterprise computing was being developed: Cloud Computing.
  • 7. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 7 of 46 The Cloud in Context – A New Way to Provide IT In 1999 Salesforce popularized the idea of providing enterprise applications infrastructure via a website, and by 2002 Amazon started delivering computation and storage to enterprises via the Amazon Web Services platform. Google, Microsoft and Oracle as well as a host of other major IT players quickly followed with their own version of cloud computing options. These new cloud services offered the speed and convenience of web based technology with the features of a large data center. An enterprise could lease and provision cloud resources with little time and no investment in up front costs for procurement of system hardware. By 2009 options for cloud computing were plentiful, but there was as yet little generally accepted evidence about the reasons for the shift or even the risk and benefits18. What made cloud systems different from earlier timeshare approaches and data center leasing of physical space? Why were they more compelling than renting or leasing equipment? While a detailed examination of all the concepts and considerations leading to the emergence of cloud computing is outside the scope of this paper, there is a broad narrative that can be suggested based on prior historical study of technological change from steam to electricity and then to centralized generations systems. While the analogies may not all be perfect, they can be useful tools in contextualizing the question of "why cloud computing now?" In the 19th century, the development of practical steam power drove a revolution in technical change. The nature of mechanical steam power was such that the steam engine was intrinsically local, as mechanical power is hard to transmit across distance19 . When electrical generation first emerged at the end of the 19th century, the first electrical applications tended to reproduce this pattern. Long distance distribution of power was hard to achieve, and so many facilities used generators for local power production20 . The nature of electricity was quite different from mechanical power, and so breakthroughs in distribution were rapid. Innovators such as Tesla and Westinghouse quickly developed long distance transmission of electricity. This electrical power distribution breakthrough allowed the rapid emergence of very large centralized power stations; the most significant of these early centers was the Niagara hydroelectric station21 . Today, most power is generated in large central stations. Power is transmitted via a complex national grid system. The distribution grid is an amalgam of local and regional grids22 . However this was not the end of the demand for local generators. In fact more use of electricity lead to more demand for local generators, but for non-primary use cases such as emergency power, or for alternate use cases such as remote or temporary power supplies23, 24 . The way local generation was used changed with the shift to the power grid in ways that can be seen to parallel to shift from local data centers to cloud based data center
  • 8. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 8 of 46 operations. Wile it is true that early computers were more centralized since the mid 70's and the emergence of the mini-computer and then micro-computer that came to prominence in the 80's, a much more distributed pattern emerged. The mainframe and mini-computer became the nucleus of emerging local data centers in every enterprise. As Local Area Networks emerged they reinforced the role of the local data center as a hub for the enterprise. Most enterprises in the 1980’s and 90’s had some form of local data center, in a pattern not totally dissimilar to that of early electric generators. As the networks grew in scale and speed, they began to shift the patterns of local computing to emphasize connectivity and wider geographic area of service. When the commercial Internet emerged in the 1990's the stage was set for a radical change, in much the same way that the development of efficient electrical distribution across a grid changed the pattern of an earlier technical system. Connectivity became the driving necessity for en enterprise competing to reach its supply chain and customers by the new network tools. By the turn of the 21st century, firms like Google and Amazon were experimenting with what the came to consider a new type of computer, the Warehouse Scale Computer. By 2009 this was a documented practical new tool, as noted in Google’s landmark paper “The Datacenter as a Computer An Introduction to the Design of Warehouse-Scale Machines”, Luiz André Barroso and Urs Hölzle, Google Inc. 2009. This transition can be considered as similar to the move to centrally generated electrical power sent out via the grid. In a similar manner it will not erase local computer resources but will alter their purpose and use cases25 . As was the case for the change to more centralized electrical generation, by the early 21st century there was considerable pressure on IT managers to consider moving from local data centers to cloud based systems. For both general computing and for document management systems this pressure tends to come from two broad source categories: Technical/Process drivers and Cost drivers. Technical drivers include the savings in deployment time for servers and systems at all points in the systems development lifecycle, and cost drivers are reflected in the reduced operational costs provided by cloud systems26. Cloud Transformation Drivers Technical and Process drivers also include considerations such as functional performance and flexible response to business requirements. The need to be responsive in short time frames as well as to provide the latest trends in functional support for the enterprise business users and customers favors the quick start up times of cloud based IT services. The wide scope of the business use case drivers goes beyond the scope of this paper, but is important to note.
  • 9. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 9 of 46 Cost drivers favoring cloud based IT services are more easily understood in the context of document management as discussed in this paper. Moving to cloud based servers and storage for document management systems represents an opportunity to reduce the Total Cost of Ownership (TCO) of the IT systems. These costs include not only the cost to procure the system components but also the cost to operate them in a managed environment, controlled by the enterprise. Even it appears there is no compelling functional benefit to be obtained by the use of cloud based systems, the cost factors alone are typically compelling as a driver for the decision to move document management systems move from local servers and storage to the cloud. As an example of the potential cost drivers, Amazon and other vendors offer a number of TCO comparison tools that illustrate the case for cost savings from cloud- based operations. While the vendors clearly have a vested interest in promotion of cloud based operations, these tools provide a reasonable starting point for an “apples to apples” estimate of costs for local CPU and storage vs. cloud CPU and storage options. Considering that the nature of document systems is not especially CPU intense, but is very demanding of storage subsystems this cost comparison is a good starting point, as it tends to reduce the complexity of the pricing model. For purposes of comparison here the Amazon TCO model will be discussed below to examine the storage costs implications for a small (1TB) document store. The default model from Amazon starts with an assumption of 1 TB of data, that requires “hot” storage (fast access for on demand application support), full plus incremental backup and grows by 1TB per month in size27. This is a good fit for a modest document storage system and can be considered a “ballpark” baseline. Total Cost of Ownership. (2016). Retrieved July 06, 2016, from http://www.backuparchive.awstcocalculator.com/ Amazon’s tool estimates this storage to cost about $ 308,981 per year for local SAN backed up to tape. The tool estimates the same storage using the cloud option cost about $37,233 for a year. The cost of local hot storage alone is estimated at $129,300 for and $29,035 for Amazon S3 storage. Based on the author’s past experience in federal IT document management systems, these local storage costs are generally within what could be considered reasonably relevant and accurate for a private or federal data center storage TCO cost ranges. Processing costs estimates for servers required in the storage solution are also within the range of typical mid- size to large data center costs based on author’s experience over the past 8 years with federal and private data center projects. Overall, the Amazon tool does appear to produce estimates of local costs that can be considered reasonably viable for planning purposes. This rough and quick analysis form the Amazon TCO tool gives a good impression of the level of cost savings possible with cloud-based systems. It serves as an example of some of the opportunities presented to IT managers faced with a need to control
  • 10. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 10 of 46 budgets and provide more services for less cost. The potential to provide the same services for half to ¼ the normal cost of local systems is very interesting to most enterprises as a whole. When added to the cloud based flexibility to rapidly deploy and the freedom to scale services up and down, these factors helps to explain the increased preference for cloud based IT deployment. This preference for cloud computing now extends beyond the private sector to government enterprises seeking the benefits of the new computing models offered by cloud vendors. The Federal Cloud & the Secure Cloud Emerge For the federal customer the transition to Warehouse Scale Computing and the public cloud can be dated to 2011 when the FedRAMP initiative was established. The FedRAMP program is based on policy guidance from President Barack Obama’s 2001 paper titled "International Strategy for Cyberspace” 28 as well as the "Cloud First" policy authored by US CIO Vivek Kundra 29 and the “Security Authorization of Information Systems in Cloud Computing Environments “30 memo from Federal Chief Information Officer, Steven VanRoekel. Together these documents framed the proposed revamp of all federal Information Technology systems: In the introduction to his 2011 cloud security memo, VanRoekel provides some concise notes on the compelling reasons for the federal move to cloud computing: “Cloud computing offers a unique opportunity for the Federal Government to take advantage of cutting edge information technologies to dramatically reduce procurement and operating costs and greatly increase the efficiency and effectiveness of services provided to its citizens. Consistent with the President’s International Strategy for Cyberspace and Cloud First policy, the adoption and use of information systems operated by cloud service providers (cloud services) by the Federal Government depends on security, interoperability, portability, reliability, and resiliency. 30 “ Collectively, these three documents and the actions they set in motion have transformed the federal computing landscape since 2011 and as the private sector’s use of local computing has begun a rapid shift to the cloud driven by competition and the bottom line, in the short space of 5 years the entire paradigm for IT in the federal government of the US has shifted radically. It is not unreasonable to expect that by 2020, cloud computing will be the norm, not the exception for any federal IT system. This transition offers huge opportunities, but brings massive challenges to implement secure infrastructure in a public cloud computing space. Functionally, the conversion from physical to electronic documents has a number of engineering requirements, but above and beyond this, there are legal and security considerations that make any document management system more complex to impalement than earlier databases of disparate facts. Documents as an entity are more than a collection of facts. They represent social and legal relationships and
  • 11. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 11 of 46 agreements. As such the authenticity, integrity, longevity and confidentiality of the document as an artifact matter. The security and privacy implications of the continued expansion of electronic exchange of data in consumer and commercial financial transactions was incorporated into the rules, regulations and policy guidance included in the Gramm-Leach-Bliley Act of 199931. A good example of the wide swath of sensitive data that needs to be protected in both physical and electronic transactions is shown in the Sensitive Data: Your Money AND Your Life web page that is part of the Safe Computing Pamphlet Series from MIT. As the page notes: “Sensitive data encompasses a wide range of information and can include: your ethnic or racial origin; political opinion; religious or other similar beliefs; memberships; physical or mental health details; personal life; or criminal or civil offences. These examples of information are protected by your civil rights. Sensitive data can also include information that relates to you as a consumer, client, employee, patient or student; and it can be identifying information as well: your contact information, identification cards and numbers, birth date, and parents’ names. 32 “ Sensitive data also includes core identity data aside from the information about any particular event, account or transaction, personal preferences, or self identified category. Most useful documents supporting interactions between people and business or government enterprises contain Personally Identifiable Information (PII), which is defined by the Government as: "...any information about an individual maintained by an agency, including any information that can be used to distinguish or trace an individual’s identity, such as name, Social Security number, date and place of birth, mother’s maiden name, biometric records, and any other personal information that is linked or linkable to an individual. 33," Identity data is a special and critical subset of sensitive data, as identity data is required to undertake most of the other transactions, and to interact with essential financial, government or healthcare services. As such this data must be protected from theft or alteration to protect individuals and society as well as to ensure the integrity of other data in any digital system34. In order to protect this PII data the Government through the National Institute of Standards (NIST) defines a number of best practices and security controls that form the basis for sound management of confidential information. 35 These controls include such concepts as: • Identification and Authentication - uniquely identifying and authenticating users before accessing PII
  • 12. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 12 of 46 • Access Enforcement - implementing role-based access control and configuring it so that each user can access only the pieces of data necessary for the user‘s role. • Remote Access Control - ensuring that the communications for remote access are encrypted. • Event Auditing - monitor events that affect the confidentiality of PII, such as unauthorized access to PII. • Protection of Information at Rest - encryption of the stored information storage disks. In addition to these considerations, many enterprises also need to handle documents that contain both PII and medical records or data from medical records, or Protected Heath Information (PHI). Medical records began to be stored electronically in the 1990’s. By the early part of the 21st century this growth in electronic health records resulted in a new set of legislation design to both encourage the switch to electronic health records and to set up guidelines and policy for managing and exchanging these records. The Health Insurance Portability and Account- ability Act (HIPAA) of 1996 creates a set of guidelines and regulations for how enterprises much manage PHI36. Building on HIPAA, the American Recovery and Reinvestment Act of 2009 and the Health Information Technology for Economic and Clinical Health Act (HITECH) of 2009 added additional policy restrictions, and security requirements as well as penalties for failure to comply with the rules37. These regulations for PHI both overlap and add to the considerations for data and documents containing PII. The HITEC law increased the number of covered organizations or “entities” from those under the control of the HIPAA legislations: “Previously, the rules only applied to "covered entities," including such healthcare organizations as hospitals, physician group practices and health insurers. Now, the rules apply to any organization that has access to "protected health information. 38” HITEC also added considerable detail and clarification as well as new complexity and even more stringent penalties for lack of compliance or data exposure or “breaches”. Under HITEC a breach is defined as: "…the unauthorized acquisition, access, use or disclosure of protected health information which compromises the security or privacy of such information, except where the unauthorized person to whom such information is disclosed would not reasonably have been able to retain such information. 38" The result of the considerations needed to manage documents that might contain Sensitive Data, PII or PHI or any combination of these elements is that any document management system implemented in private or public data centers must
  • 13. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 13 of 46 implement a wide range of technical and procedural steps to operate in a secure manner. Protection of the security, privacy and integrity of the documents and data in those documents becomes a major part of the challenge to designing, building and operating any information system. These engineering efforts are essential to business operations however they also become part of the cost for any system, and as such can be a considerable burden on the budget of any enterprise. Designing a Project to Demonstrate Using the Cloud It is within this context of providing a secure system leveraging cloud-based benefits that the practicum project described in this paper was designed. The goal of the project was to demonstrate a viable approach to following the policy guidance as provided for federal IT systems. To achieve this goal, the first step was to understand the context as outlined in the discussion above. The next step was to design a system that followed sound cybersecurity principles and the relevant policy guidance. Based on the demand for electronic document management in both private and government enterprise, a basic document management system was selected as the business case for the prototype to be developed. Document management provides an opportunity to implement some server side logic for the operation of the user interface and for the selection and management of storage systems. Document management also provides a driving problem that allows for clear utilization of storage options, and thus can demonstrate the benefits of the cloud based storage options that feature prominently in the consideration of cloud advantages of both speed of deployment and lower TCO. These considerations were incorporated in the decision to implement a document management system as the demonstration project. The scope of the system was also a key consideration. Given the compressed time frame and limited access to developer resources that are intrinsic to a practicum project, the functional scope of the document management system would need to be constrained. As a solo developer, the range of features that can be implemented would need to be limited to the basic functions needed to show proof of concept for the system. In this case, this were determined to be: 1. The system would be implemented on the Amazon EC2 public cloud for the compute tier of the demonstration. 2. The system would utilize Amazon S3 object storage as opposed to block storage. 3. The system would be implemented using commercially available Amazon provided security features for ensuring Confidentiality, Integrity and Accessibility39.
  • 14. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 14 of 46 Dimov, I. (2013, June 20). Guiding Principles in Information Security - InfoSec Resources. Retrieved July 09, 2016, from http://resources.infosecinstitute.com/guiding-principles-in-information-security/ 4. The servers used for the project would all be Linux based. 5. The system would feature a basic web interface to allow demonstration of the ability to store documents. 6. The system would use Public Key Infrastructure certificates generated commercially to meet the need to support encryption for both web and storage components. 7. The web components of the prototype would use HTTP to enforce secure connection to the cloud based servers and storage. 8. The system would utilize a commercial web server infrastructure suitable for scaling up to full-scale operation but only a single instance would be implemented in the prototype. 9. The web components would be implemented in a language and framework well suited to large-scale web operations with the ability to handle large concurrent loads. 10. Only a single demonstration customer/vendor would be implemented in the prototype. 11. The group and user structure would be developed and implemented using the Amazon EC2 console functions. 12. Only the essential administrative and user groups would be populated for the prototype. 13. The prototype would feature configurable settings for both environment and application values set by environment, files, and Amazon settings tools. The current prototype phase would not introduce a database subsystem expected to be used to manage configuration in a fully production ready version of the system. 14. Data files used in the prototype would be minimal versions of XML files anticipated to be used in an operational system, but would only contain structure and minimal ID data not full payloads. In the case of a narrowly scoped prototype such as this demonstration project it is equally critical to determine what function is out of scope. For this system this list included the following: • The web interface would be left in a basic state to demonstrate proof of function only. Elaboration and extension of the GUI would be outside the scope of the work for this prototype project. • There would be no restriction on the documents to be uploaded. Filtering vendor upload would be outside the scope of work for this prototype. • Testing uploads with anti-virus/malware tools would be outside the scope of this prototype project.
  • 15. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 15 of 46 • Security testing or restriction of the client would be outside the scope of this project. The URL to access the upload function would be open for the prototype and the infrastructure for user management would not be developed in the prototype. • Load testing and performance testing of the prototype would be outside the scope of this phase of the project. • No search capacity would be implemented to index the data stored in the S3 subsystem in the prototype project. Proof of concept was thus defined as: A) The establishment of the cloud based infrastructure to securely store documents. B) The implementation of the required minimal web and application servers with the code required to support upload of documents. C) The successful upload of test documents to the prototype system using a secure web service. While the scope of the project may appear modest and the number of restrictions for the phase to be implemented in the practicum course period an numerous, these scope limitations proved vital to completion of the project in the anticipated period. The subtle challenges to implementation of this proof of concept feature set proved more than adequate to occupy the time available and provided considerable scope for learning and valuable information for future projects based on cloud computing, as detailed in the subsequent sections of this paper. Planning the Work and Implementing the Project Design To move to implementation, the next phase of the Software Development Lifecycle (SDLC) the requirements and scope limitations listed above were used to develop a basic project plan for the project consisting of two main phases: A) The technical implementation of the infrastructure and code through to proof of concept. B) The documentation of the project work and production of this report/paper. The project management of any implementation process for a project is a critical success factor for any enterprise no matter how large of small. This is very true for cloud computing projects as they often represent a significant departure from existing IT systems and processed for an enterprise. This was the case in this project as well. While no formal GNATT or PERT chart was developed for the project plan, as there was no need to transmit the plan to multiple team members, an informal breakdown
  • 16. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 16 of 46 was used to guide the technical implementation in an attempt to keep it on schedule: Week 1: Establish the required Amazon EC2 accounts and provision a basic server with a secure management account for remote administration of the cloud systems. Week 2: Procure the required PKI certificates and then configure the certificates needed to secure access to the servers, and any S3 storage used by the system. Configure the S3 Storage. Week 3: Obtain and install the required commercial web server and application server to work together and utilize a secure HTTP configuration for system access. Implement any language framework needed for application code development. Week 4: Research and develop the required application code to demonstrate file upload and reach proof of concept. Create any required data files for testing. Weeks 5-8: Document the project and produce the final report/paper. In practice this proposed 8 week schedule would slip by about 4 weeks due to about 2 weeks of extra work caused by the complexity and unexpected issues found in the system and code development implementation and about 2 weeks of delays in the write up caused by the author’s relocation to a new address. These delays in schedule are not atypical of many IT projects. They serve to illustrate the importance of both planning and anticipation of potential unexpected factors when implementing new systems that are not well understood in advance by the teams involved. Allowing slack in any IT schedule, and especially those for new systems is key to a successful outcome as it allows flexibility to deal with unexpected aspects of the new system. The very first tasks to be undertaken in the execution of the project plan for this project was to establish the required Amazon Elastic Compute Cloud (Amazon EC2) accounts. EC2 is the basic cloud infrastructure service provided by Amazon. This service provides user management, security, system provisioning, billing and reporting features for Amazon’s cloud computing platform. It is the central point for administration of any hosted project such as the prototype under discussion in this paper40. Because the author was an existing Amazon customer with prior EC2 accounts, the existing identification and billing credentials could be used for this project as well. Both identity and billing credentials are critical components for this and any other cloud based project on Amazon or any other cloud vendor. It is axiomatic that the identity of at least one responsible party, either an individual or institution, must be known for the cloud vendor to establish systems and accounts in its infrastructure. This party acts as the “anchor” for any future security chain to be established. The
  • 17. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 17 of 46 primary account will act as the ultimate system owner and will be responsible for the system’s use or abuse and for any costs incurred. Below is an example home screen for the author’s project on EC2: Responsibility for costs is the other key aspect of the primary EC2 account. While cloud computing may offer cost savings benefits, it is by no means a free service. Every aspect of the EC2 system is monetized and tracked in great detail to ensure correct and complete billing for any features used by an account holder. Some basis for billing must be provided at the time any account is established. In the case of this project all expenses for the EC2 features used would be billed back to the author’s credit account previously established with Amazon. In any cloud project it is vital that each team member committing to additional infrastructure have the understanding that there will be a bill for each feature used. Amazon and most cloud vendors offer a number of planning and budgeting tools for projecting the costs of features before making a commitment. This is helpful, but is not a substitute for clearly communicating and planning for costs in advance among the development team members and project owners, stakeholders and managers. In the case of this project, while the author did reference the budgeting tools to note costs estimates, communication and decisions were simple due to the singular team size. Below is an example of the billing report console:
  • 18. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 18 of 46 Establishment of the basic account for the project was, as indicated simple due to the author having an existing EC2 account. To provision a server, it was necessary to determine the configuration most appropriate for the project’s needs, and then determine the Amazon Availability Zone where the server should be located. The server configuration would be decided by estimating the required performance characteristics needed to host the required software and execute the application features for the anticipated user load. In this case, all these parameters were scoped to be minimal for the prototype to be created, reducing the capacity of virtual server required. Based on the author’s experience with Linux servers a small configuration would meet the needs of the project. Using the descriptive materials provided by Amazon detailing the server performance, a modest configuration of server was selected to host the project: • t2.micro: 1 GiB of memory, 1 vCPU, 6 CPU Credits/hour, EBS-only, 32 bit or 64-bit platform41 When the server was provisioned RedHat was selected as the OS. Other Linux distributions and even Windows operating systems were available from Amazon EC2. Red Hat was selected in order to maintain the maximum compatibility to systems now in use by the federal systems currently approved for use in production systems per the author’s personal experience. Use of Red Hat Linux also makes getting support and documentation of any open source tools from the Internet easier as this is a popular distribution for web based systems. Below is a release description from the virtual instance as configured on EC2 for this project:
  • 19. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 19 of 46 By default the server was provisioned in the same zone as the author’s prior EC2 instances, which was us-west-2 (Oregon). An Availability Zone (zone) is the Amazon data center used to host the instance. Availability zones are designed to offer isolation from each other in the event of service disruption in any one zone. Each zone operates to the published Service Level Agreement provided by Amazon42. Understanding the concept of zone isolation and the key provisions of the SLA provided by a cloud vendor are important to the success of any cloud based project. Highly distributed applications or those needed advanced fault tolerance and load balancing might choose to host in multiple zones. For the purposed of this project a single zone and the SLA offered by Amazon was sufficient for successful operation. However, the default zone allocation was problematic and was the first unexpected implementation issue. Almost all EC2 features are offered in the main US zones, but us-east-1 (N. Virginia) does have a few more options available than us-west-2 (Oregon). In order to explore the implications and effort needed to migrate between zones and ensure access to all potential features, the author decided to migrate the project server to the us-east-1 zone. Migration involved a backup of the configured server, which appeared to be prudent operational activity anyway. Following the backup, the general expectation was that the instance could be restored directly in the desired location and then the old instance could be removed. In general this expectation proved to be sound, but the exact steps were not so direct. Some of the complexity was strictly due to needing to allow for replication time. Some of the complexity proved to be due to the use of a Elastic IP address that creates a public IP address for the server. An AWS Elastic IP provided a static public IP that can then be associated with any instance on EC2, allowing public DNS configuration to then be re-mapped as needed to any collection of EC2 servers. The author had a prior Elastic IP and expected to
  • 20. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 20 of 46 just re-use it for this project, but as noted in the AWS EC2 documentation “An Elastic IP address is for use in a specific region only43”. This created an issue when the instance was migrated across zones. Once the problem was understood, the solution was to release the old Elastic IP and generate a new Elastic IP that could be mapped using DNS. This new Elastic IP could be associated with the servers now restored to the us-east-1 (N. Virginia). This step wound up taking quite a bit of time to debug and fix in the first week, and was to lead to the next unexpected issues with DNS. None of this work was so complex as to put the project at risk. This required IP change does illustrate the fact that understanding the SLA and restrictions of each cloud feature is critical. Small issues like requiring a change of IP address can have big implications for other work in a project. Decisions to provision across zones are easy in the cloud, but can have unintended consequences, such as this IP address change and the subsequent work in DNS that generated. All of these issues take resources and cost time in a project schedule. An existing domain, Juggernit.com, already registered to the author was the expected target domain. Since one of the requirements for the project was to get a Public Key for the project site, it was essential to have a publicly registered Internet domain to use for the PKI. Once the public IP was re-established in the new us-east- 1 zone, and connectivity was confirmed by accessing the instance using SSL, the next unexpected task was moving the DNS entries for the instance from the current registrar. This would also include learning to configure the Amazon Elastic Load Balancer and then map the domain to it. The load balancer forwards any HTTP or HTTPS traffic to the HTTPS secure instance. The HTTPS instance is the final target for the project. Amazon Elastic Load Balancing is a service that both distributes incoming application traffic across multiple Amazon EC2 instances, and allows for complex forwarding to support forcing secure access to a domain. In this instance while the project would not have many servers in the prototype phase, the use of load balancing would reflect the “to be” state of a final production instance and allow secure operations in even development and preliminary phases of the project used for the practicum scope. The load balancer configuration would require a domain record of the form: juggerload1-123781548.us-east-1.elb.amazonaws.com (A Record) As noted in the Amazon web site, you should not actually use an “A Record” in your DNS for a domain under load balancing: Because the set of IP addresses associated with a LoadBalancer can change over time, you should never create an "A record” with any specific IP address. If you want to use a friendly DNS name for your load balancer instead of the name generated by
  • 21. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 21 of 46 the Elastic Load Balancing service, you should create a CNAME record for the LoadBalancer DNS name, or use Amazon Route 53 to create a hosted zone. For more information, see Using Domain Names With Elastic Load Balancing44. The Juggernit.com domain was being managed by Network Solutions. Unfortunately the GUI used by Network Solutions did not allow for the entry of the CNAME record formats needed for the EC2. This required moving the domain out of the control of Network Solutions and into the Amazon Route53 domain management service. The Route 53 service has a variety of sophisticated options, but most critically, it interoperates well with other Amazon EC2 offerings including the load balancing features45. Route 53 is a good example of not only an unexpected issue that must be overcome to migrate to the cloud, but how the nature of the cloud platform creates a small “ecosystem” around the cloud vendor. Even when striving for maximum standards compliance and openness, the nature of the cloud platform offerings such as load balancing tend to create interoperations issues with older Internet offerings like those for DNS from Network Solutions, which date from the origin of the commercial Internet. The author had used Network Solutions DNS since the late 1990’s, but in this instance there was no quick path to a solution other than migration to the Amazon Route 53 offering. The Juggernit.com domain would need to be linked to the public IP of the instance, and pragmatically this was only achievable via Route 53 services. Once the situation was analyzed after consultation with both Network Solutions and Amazon support, the decision to move to Route 53 was made. The changes were relatively quick and simple using the Network Solutions and Amazon web consoles. Waiting for the DNS changes to propagate imposed some additional time, but as with the zone migration, the delay was not critical to the project schedule. With the server, public IP address and DNS issues resolved PKI certificate generation could be attempted. The author was relatively experienced in generation and use of PKI credentials, but once again the continued evolution of the Internet environment and of cloud computing standards was to provide unexpected challenges to the actual implementation experience. There are many vendors offering certificates suitable for this practicum project, including Amazon’s own new PKI service. The author selected Network Solutions as a PKI provider. Using another commercial certificate vendor offered an opportunity to explore the interoperation of Amazon’s platform with other public offerings. Network Solutions also has a long history with the commercial Internet and has a well-regarded if not inexpensive certificate business46. The certificates were issued in a package including both the typical root certificate most Internet developers are used to, as well as a number of intermediate
  • 22. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 22 of 46 certificates that were less familiar to the author. In most cases inside an enterprise, certificates are issued for enterprise resources by trusted systems and all the intermediate certificates are often in place already. This was not the case for the Amazon EC2 infrastructure for this project. In this instance, not only was the root certificate needed, but also all the intermediates must be manually bundled into the uploaded package47. This was a new process for the author and management of intermediate certificates represented another unexpected task. The need to include the intermediate certificates in the upload to Amazon was not immediately apparent and debugging the reason why uploading just the root certificate did not work (as with prior systems) was going to involve a major research effort and many hours of support diagnostics with each vendor involved. To make the issue more complex, there was documentation the Amazon support team found for some certificate vendors and there was documentation for cloud service vendors found by Network Solutions support, but neither firm had documents for working with certificates or cloud services from the other – this was the one case not documented anywhere. The Network solution certificates were issued using a new naming format that did not follow either the older Network Solutions documentation to identify the proper chaining order. Amazon was also not totally sure what orders would constitute a working package. A number of orders had to be tried and tested one at a time and then the errors diagnosed for clues as to the more correct order needed in the concatenate command. On top of this, the actual Linux command to concatenate and hence chain the certificates was not exactly correct when attempted. This was due to the text format at the end of the issued certificates. Manual editing of the files was needed to fix the inaccurate number of delimiters left in the resulting text file. The final command needed for the Amazon load balancer was determined to be: amazon_cert_chain.crt; for i in DV_NetworkSolutionsDVServerCA2.crt DV_USERTrustRSACertificationAuthority.crt AddTrustExternalCARoot.crt ; do cat "$i" >> amazon_cert_chain.crt; echo "" >> amazon_cert_chain.crt; done This back and forth diagnostic work for certificate chains represented a major unexpected source of complexity and extra work. Again, this did not disrupt the execution schedule beyond a recoverable limit. The experience with certificate chaining was a valuable learning opportunity on the pragmatic use of PKI tools. The author has subsequently come across a number of federal IT workers encountering these challenges as more and more systems start to include components from outside vendors in the internal enterprise infrastructure. After the installation of the certificates, the next major configuration tasks were the installation and configuration of the web server and the application server platforms on the EC2 instance. Nginx is the web server used on the project, and
  • 23. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 23 of 46 Node.JS and the Express framework is used as the application server. Each of these subsystems provided further opportunities for learning as they were installed. Nginx was selected to provide an opportunity to gain experience with this very popular commercial platform as well as due to its reputation for high performance and excellent ability to scale and support very high traffic web sites. Nginx was designed from the start to address the C10K problem (10,000 concurrent connections) using an asynchronous, non-blocking, event-driven connection- handling algorithm48. This is very different from the approach taken by Apache or many other available web servers. In the author’s experience many web sites that start out with more traditional web servers such as Apache, experience significant scale issues as they grow due to high volumes of concurrent users. Starting with Nginx was an attempt to avoid problem this by design, though installation and configuration of the web server was more complex The open source version of Nginx was used for the project, as a concession to cost management. Downloading the correct code did prove to be somewhat of an issue, as it was not easy to find the correct repositories for the current package and then it turned out the application had to be updated before it could function. It was also critical to verify the firewall status once the system was providing connections. The Amazon install of Red Hat Linux turns out to disable the default firewalls and instead use the Amazon built in firewalls for the site. This actually provides a very feature rich GUI firewall configuration but is another non-standard operations detail for those familiar with typical Red Hat stand-alone server operations. The firewall was another implementation detail that could not easily be anticipated. After the firewall was sorted out there remained considerable research to determine how to configure the Nginx web server to utilize HTTPS based on the certificates for the domain. Again the issue turned out to be due to the chaining requirements for the certificate. In this case, Nginx needed a separate and different concatenated package in this format: cat WWW.JUGGERNIT.COM.crt AddTrustExternalCARoot.crt DV_NetworkSolutionsDVServerCA2.crt DV_USERTrustRSACertificationAuthority.crt >> cert_chain.crt After determining the correct concatenation format needed for Nginx and making the appropriate uploads of concatenated files, HTTPS services were available end to end. However, Nginx does not provide dynamic web services. To serve dynamic content it would be necessary to install and configure the Node.JS Web Application Server and the Express framework. Node.JS (Node) is an open source server-based implementation of the JavaScript language originally developed by Ryan Dahl in 2009 using both original code and
  • 24. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 24 of 46 material from the Google V8 JavaScript engine. Most significantly, Node is event- driven, and uses a non-blocking I/O model. This makes Node both very fast and very easy to scale. Node is extremely well suited to situations like the C10K problem, and web sites that scale quickly and efficiently. Being based on JavaScript, Node is Object oriented and offers a huge open source support base of modules and libraries, accessed using the Node Package Manager (NPM). Express is a minimal and flexible Node.js web application framework based on many of the ideas about web site design and development taken from the Ruby of Rails framework project. Express offers a set of standard libraries and allows users to mix in many other NPM tool to create web sites base on the original Ruby on Rails principle of “convention over configuration” by providing a common structure for web apps49. Installation of Node on the server was done using the standard Red Hat Package Manager tools. Once Node is installed, the Node Package Manager (NPM) system can be used to bootstrap load any other packages such as the Express framework. In a production system it is expected that the web server and the application server would be hosted on separate hardware instances, but since the practicum was to be subject to only a small load, both serves can run on the same instance of Linux with little impact. While Node comes with its own dynamic web server to respond to request for dynamic web content, it is not well suited to heavy-duty serving on the font end. Nginx is design for the task of responding to high volumes of initial user inquiries. The combination of a high performance web server (Nginx) and some number (N) application server instances (such as Node) is a widely accepted pattern that supports large scale web systems. Implementation of this design pattern was a goal of the prototype, to pre-test integration all the constituent components even prior to any load testing of the system. Deployment and configuration of Nginx and Node to the single Linux server fulfills this requirement and provides a working model that can be expanded to multiple servers as needed in the future. In order to smoothly transfer web browser request from users to the application server domain, the web server must act as a reverse proxy for the application server. To accomplish this with Nginx requires the addition of directives to the Nginx configuration file inside the “server” section of the configuration file. These commands will instruct the web server to forward web traffic (HTTPS) request for dynamic pages targeted at the DNS domain from Nginx to Node.JS. This is a relatively standard forwarding for Nginx and only requires a small amount of research to verify the correct server configuration directive as shown in this example from the Nginx documentation: server { #here is the code to redirect to node on 3000
  • 25. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 25 of 46 location / { proxy_set_header X-Forwarded-For $remote_addr; proxy_set_header Host $http_host; proxy_pass "http://127.0.0.1:3000"; }} Note that this is just an example for use on Local Host with a Node.JS engine running on port 3000 (any port will suffice). The critical issue is to configure Nginx to act as a reverse proxy to the Node.JS engine. Nginx will then send traffic to the configured port for the Node.JS application instance. Node.JS and Express thenuse a RESTFUL approach to routing to the application logic based on parsing the URL. The reverse proxy configuration will ensure that when traffic comes into the Nginx server with the format “HTTPS://Juggernit.com/someurl” it will be handled by the appropriate logic section of the Node.JS applications as configured in the Express framework. The Express listener will catch the traffic on port 3000 and use the route handler code in express to parse the URL after the slash and ensure that the proper logic for that route is launched to provide the service requested. This is a well established RESTFUL web design pattern, first widely popularized in Ruby on Rails and adopted by a number of web frameworks for languages such as Java, Node or Python, etc. Implementing this pattern requires that both Nginx and Node be installed on the server to be used as a pre-requisite. In addition, the Express framework for web applications used by Node must also be loaded to allow at least a basic test of the forwarding process. All of this code is available as open source, so access to the needed components was not a blocker for the project. Each of these components was first loaded onto the Author’s local Unix system (a Macbook Pro using OSX). This allowed for independent and integration testing of the Nginx web server, the Node application server and the Express web framework. By altering the configuration file and adding the appropriate directives as noted above, the reverse proxy configuration and function could be tested locally as well against the local host IP address. After validation of the configuration requirements locally on the Author’s development station, the web server and application server needed to both be installed on the cloud server. As noted above, Nginx was actually loaded on the cloud server earlier to allow for configuration of the domain and HTTPS secure access to the site. This left only the installation of the Node and Express application server components. While conceptually easy, in practice loading Node also proved to provide unexpected challenges. The 7.x Red Hat version of Linux installed on the cloud server supports Node in the RPM package manager system. However the available RPM version was only a 0.10.xx version. The current version of Node is
  • 26. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 26 of 46 4.4.x. The stable development version installed on the Author’s local system was 4.4.5 (provided from the Node web site). There are substantial syntax and function differences between the earlier version of Node and the current version. This required that the Node install on the cloud server be updated, and that proved to require help from the Amazon support team, as following the default upgrade instructions did not work. Again, the delay was not large, but cost a couple days between testing, exploration of options, and final correction of the blocking issues. The final install of a current 4.4.x version of Node required a complete uninstall of the default version, as upgrading resulted in locked RPM packages. After cleaning up the old install and loading the new Node version, the cloud server was conformed to the required Node version. The Express framework was loaded on the server via the standard command line Node Package Manager (NPM) tool. A simple “Hello World” test web application was created in Express/Node and again the function of both the Nginx and Node servers was validated. To accomplish the verification of web and application server function an Amazon firewall change was required to allow Node to respond directly to traffic pointed at the IP address of the server and the port number (3000) of the Node server was needed. This firewall rule addition allowed testing of HTTPS traffic targeted at the domain name, which was served by Nginx. HTTP traffic directed to the IP address and port 3000 could then be tested at the same time, as this traffic was served by the test Node/Express application. To complete the integration, the next step was to reconfigure the Nginx server to act as a reverse proxy. The Nginx configuration file was backed up, and then the reverse proxy directives as shown above were added to the Nginx configuration file, and Nginx was reloaded to reflect the changes. At this point, Nginx no longer provided its default static web page to request sent to HTTPS://Juggernit.com. Instead, Nginx forwarded the HTTPS traffic to the Node application server, still under the secure connection, and Node responded with the default “Hello World” page as configured in the Express test application. This state represented a complete integration of Nginx and Node for the project. The server was backed up and the next stage of work to implement the upload logic to store data on the Amazon S3 object store could continue. The two major tasks required to finish the site configuration and functional completion of the prototype project were: • Establishment of an Amazon S3 storage area (know as a “bucket” on Amazon) • Coding server and client logic to access the S3 storage via HTTPS
  • 27. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 27 of 46 The first of these tasks could be accomplished directly via the Amazon EC2 management console. For the prototype there was no requirement for a custom web interface to create S3 storage, and no requirement for any automatic storage assignment or management. In a fully realized production application it is possible that application based management of storage might be desirable, but this is a system feature requirement highly subject to enterprise policy and business case needs. However, even when using the Amazon interface to manage S3 storage as in this project, there was still a need to consider the user and group structure in order to manage access security to the S3 storage. As discussed earlier in the paper, a default EC2 account assumes that the owner is granted all access to all resources configured by that owner in the Amazon cloud infrastructure. For this reason, it is important to create separate administrative accounts for resources that require finer grained access and might also require access restrictions. In a fully realized web application hosted on local servers, this user and group management is often done at the application level. For this prototype these considerations were to be managed by the Amazon EC2 interface. Prior to setting up a storage area on the S3 object storage, the administrator group named “admins” was created, with full permissions to manage the site resources. Another group called “partners” with access to the S3 storage, but not other site resources for management of servers was created. A user named “testone” was then created and added to the “partners” group. The Author used the primary Amazon identity to build and manage the site, but the administrative group was constructed so that any future web based management functions could be separated from user- oriented functions of the prototype web application. With the users and groups established, the S3 storage called “ctprojectbucketone” was created using the standard Amazon GUI. Below is a screenshot showing this bucket:
  • 28. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 28 of 46 To manage access rights, the S3 storage was then assigned a Cross-Origin Resource Sharing (CORS) access policy that allowed GET, POST and PUT permissions to the S3 storage. As shown below: The “partner” group was assigned access to this storage by providing them with the resource keys. With the creation of the S3 Object Storage “bucket”, the remaining task to reach functional proof of concept for the prototype project was to construct the JavaScript application code to access the S3 storage bucket securely from the Internet. To create the logic for bucket access there were a number of pre-requisite steps not emphasized so far. The most significant of these steps was to develop at least a basic familiarity with Node.JS and JavaScript. While the author posses some number of years of experience with using JavaScript in a casual manner for other web applications, site development in JavaScript was a very different proposition. Node also has its own “ecosystem” of tools and libraries, much like any emerging open source project. Some understanding of these was also essential to succeed in creating the code required to achieve a proof of concept function for the prototype site. As a starting point the main Node site, https://nodejs.org/en/, provided an essential reference. In addition the author referenced two very useful textbooks: • Kiessling, Manuel. "The node beginner book." Available at [last accessed: 18 March 2013]: http://www. nodebeginner. org (2011). • Kiessling, Manuel. “The Node Craftsman Book. “.Available at [last accessed: 25 October 2015]: https://leanpub.com/nodecraftsman)(2015). These proved to be essential in providing both background on Node, and some guidance on the use of the Express application framework. In addition a number of other small Node library packages were key to creating the required code, specifically:
  • 29. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 29 of 46 • Node Package Manager (NPM) – a Node tool for getting and managing Node packages (library’s of function). https://www.npmjs.com • EXPRESS- a Node library providing an application framework for RESTFUL web applications based on the concepts from Ruby on Rails. https://expressjs.com • Dotenv – a Node library to allow loading environment variables from a configuration file with the extension .env. This was used to allow passing critical values such as security keys for S3 storage in a secure manner from the server to a client. https://www.npmjs.com/package/dotenv • EJS – a Node library that allows embedded JavaScript in an HTML file. This was used to add the required logic to communicate to the server components of the application and then access the S3 bucket from the client page using values securely passed over HTTPS. https://www.npmjs.com/package/ejs • AWS-SDK – a Node library provided by Amazon to support basic functions for the S3 storage service to be accessed by Node code. https://www.npmjs.com/package/aws-sdk As a newcomer to Node, the most critical problem in creation of this code for the Author was a lack of standard examples to S3 access using a common approach at a sufficiently simple level of clear explanation. There are actually at least dozens of sample approaches to integration of S3 storage in Node projects, but almost all use very idiosyncratic sets of differing libraries or don’t address some critical but basic aspect of the prototype such as secure access. There are also a number of very sophisticated and complete examples that are almost incompressible to the Node novice. This inability to find a clear and functional pattern to learn from was a major delay of over a week and a half in completion of the final steps of the prototype. After considerable reading, coding, and searching for reference models, the Author finally came across a tutorial from Dr. Will Webberly of the Cardiff University School of Computer Science & Informatics. The author read, studied and analyzed the example provided. The next step was to create several test programs to adapt the approach used by Dr. Webberly in the Heroku cloud instance he documented to a local Node Express project50. After some trial and error and some correspondence with Dr. Webberly via email, a working set of code emerged. The final proof of concept function was a minimal web application based on the patter used by Dr. Webberly and running in a cloud based server as an Express application using local variables on the Amazon EC2 server. The server code provides a restful service over HTTPS to allow a client web page executing on the remote PC or device to upload to the S3 storage using HTPS. Below is a screenshot of some of the server side code:
  • 30. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 30 of 46 The upload page logic is provided by the project web site, as is the back end server logic. Since the client page is running on a remote device, the entire transfer is done using client resources. The prototype project site provides only context and security data, but is not used to manage the upload. This frees server side resources from the work of the transfer and thus creates a higher performance distributed system. The exchange of logic and credentials is all done over the HTTPS protocol with the client, as is the subsequent file upload. This provides a secure method of access to the cloud based S3 storage. Client side data from the partner is encrypted in transfer and no other parties besides the partner and the prototype project operations teams have access to the S3 bucket. For purposes of the prototype only one client identity and one bucket were produced. In a fully realized system, there could be unique buckets for each client, subject to the security and business rules required by the use case of the system. After establishing that the Node logic was in fact working and successfully uploaded files to the S3 storage, a small set of sample health records based on the Veterans Administration Disability Benefits Questionnaires (DBQs)51 were constructed. Below is a sample of one of these files:
  • 31. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 31 of 46 These simulated DBQ records were then uploaded as a test, and verified as correct using the Amazon S3 GUI to access the documents for verification. PDF format was used for the test files to make them directly readable via standard viewing tools. Here is a screenshot of the uploaded test files in the Amazon S3 bucket: This test represents uploading the sort of sensitive and confidential data expected to be collected and managed in any finished system based on the prototype project. While basic in its function creation and upload of these documents provided the final steps in the implementation of this phase of the prototype project. Below is a screen shot showing the selection of a DBQ for upload using the client side web page:
  • 32. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 32 of 46 Storing these files represents the completion of the major design goals of the project and the completion of the implementation phase, and the prototype project itself. Findings, Conclusions and Next Steps While achieving the successful secure upload of the test documents to the prototype meets the objectives set out for this project, it represents only the first milestone in extending the system to a more full featured platform, and exploration of additional topics of interest in this area. The architecture implemented offers a good example of the latest non-blocking, asynchronous approach to serving web content. These designs exploit CPU resources in very different ways than traditional code and web frameworks, and there is ample room for scale and load testing to measure the actual capacity of these systems to perform on 64 but architectures. The asynchronous and distributed client controlled approach to storage access also provides an opportunity to test the capacity of the S3 interface to support concurrent access. The Results should provide tuning direction about the number and partition rules for the S3 storage. A larger scale simulation with many more virtual clients would be a natural approach to measuring the capacity of this use pattern. The web site functions also offer an opportunity to expand the functionality of the system and demonstrate more advance fine grain access controls supported by the user and group model. At a minimum a database of administrators and partners can be created to both lock the site down from casual access, and to explore the minimal levels of access needed to still meet all functional needs. Driving each role to he absolute lowest level of privilege will likely require trial and error, but should be a benefit in assuring the site has a minimal profile to any potential attackers.
  • 33. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 33 of 46 In addition to these operations oriented future areas of research, once a larger data set is simulated the ability of the S3 storage to support search indexing on the XML data is a rich area of exploration. There is emerging federal guidance on the best practice for meta-data tagging of PII and PHI data, and this prototype would allow for an easy way to create versions of S3 buckets with a variety of meta-data patterns and then determine the most efficient search and index options for each with a higher volume of simulated data. An expanded prototype could act as a test platform for future production systems, revealing both physical and logical performance metrics. Each of these future options provides scope to expand the project, but the basic implementation also provides some important benefits: • The implementation of the system shows that it is pragmatic to store sensitive data on a public cloud based system using PKI infrastructure to protect the data from both external in cloud vendor access. • The design of the prototype shows that modest cloud resources can in fact be used to host a site with the capacity to provide distributed workload using HTTPS to secure the data streams and leverage client resources to support data upload, not just central server capacity. • The prototype shows that it is relatively easy to use Object Storage to acquire semi-structured data such as XML. This validates use of an Object Store as a form of document management tool beyond block storage. • The establishment of the project in only a few weeks with limited staff house shows the cost and speed advantages of the cloud as opposed to local physical servers. • The experience with both the cloud and new web servers and languages demonstrates the importance of flexible scheduling and allowing for the unexpected. Even on projects that leverage many off the shelf components unexpected challenges often show up and consume time and resources. The prototype produced as a result of this project does meet the guidance for building secure projects on a public infrastructure. It allows PII and PHI data to be transferred to an enterprise via secure web services, and demonstrates an approach that can satisfy many enterprises and the guidelines for HIPAA and HiTech data handling. The architecture used demonstrates how a scalable web service model can be implemented using a cloud infrastructure by a small team in a limited time. The model does only provide a basic proof of concept but offers easy opportunities to expand to explore a number of additional questions. As such the resulting site can be considered a success at meetings it design goals, and the information generated in the site development can be employed by both the Author and others for future work in cloud computing implementation for secure digital document storage.
  • 34. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 34 of 46 References 1. Oppenheim, A. L. (Ed.). (1967). Letters from Mesopotamia: Official business, and private letters on clay tablets from two millennia. University of Chicago Press. Page 1-10 2. Fang, I. (2014). Alphabet to Internet: Media in Our Lives. Routledge. Page 90-91 3. Noam, E. M. (1992). Telecommunications in Europe (pp. 363-368). New York: Oxford University Press. Page 15-17 4. Moroney, R. L. (1983). History of the US Postal Service, 1775-1982 (Vol. 100). The Service. 5. John, R. R. (2009). Spreading the news: The American postal system from Franklin to Morse. Harvard University Press. Page 1-25 6. Johnson, P. (2013). The birth of the modern: world society 1815-1830. Hachette UK. 7. Currie, R. (2013, May 29). HistoryWired: A few of our favorite things. Retrieved May 15, 2016, from http://historywired.si.edu/detail.cfm?ID=324 8. Standage, T. (1998). The Victorian Internet: The remarkable story of the telegraph and the nineteenth century's online pioneers. London: Weidenfeld & Nicolson. 9. Yates, J. (1986). The telegraph's effect on nineteenth century markets and firms. Business and Economic History, 149-163. 10. Du Boff, R. B. (1980). Business Demand and the Development of the Telegraph in the United States, 1844–1860. Business History Review, 54(04), 459-479. 11. Gordon, J. S. (2002). A thread across the ocean: the heroic story of the transatlantic cable. Bloomsbury Publishing USA. 12. Ross, C. D. (2000). Trial by fire: science, technology and the Civil War. White Mane Pub. 13. Bates, D. H. (1995). Lincoln in the telegraph office: recollections of the United States Military Telegraph Corps during the Civil War. U of Nebraska Press.
  • 35. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 35 of 46 14. Coopersmith, J. (2015). Faxed: The Rise and Fall of the Fax Machine. JHU Press. 15. Cortada, J. W. (2000). Before the computer: IBM, NCR, Burroughs, and Remington Rand and the industry they created, 1865-1956. Princeton University Press. 16. Smith, E. (2016, June 14). The Strange History of Microfilm, Which Will Be With Us for Centuries. Retrieved June 22, 2016, from http://www.atlasobscura.com/articles/the-strange-history-of-microfilm- which-will-be-with-us-for-centuries 17. Bush, V., & Think, A. W. M. (1945). The Atlantic Monthly. As we may think, 176(1), 101-108. 18. Mohamed, A. (2015, November). A history of cloud computing. Retrieved July 07, 2016, from http://www.computerweekly.com/feature/A-history-of- cloud-computing 19. Electric Light and Power System - The Edison Papers. (n.d.). Retrieved July 13, 2016, from http://edison.rutgers.edu/power.htm 20. The discovery of electicity - CitiPower and Powercor. (n.d.). Retrieved July 13, 2016, from https://www.powercor.com.au/media/1251/fact-sheet- electricity-in-early-victoria-and-through-the-years.pdf 21. Powering A Generation: Power History #1. (n.d.). Retrieved July 13, 2016, from http://americanhistory.si.edu/powering/past/prehist.htm 22. Electricity - Switch Energy Project Documentary Film and ... (n.d.). Retrieved July 13, 2016, from http://www.switchenergyproject.com/education/CurriculaPDFs/SwitchCur ricula-Secondary-Electricity/SwitchCurricula-Secondary- ElectricityFactsheet.pdf 23. Tita, B. (2012, November 6). A Sales Surge for Generator Maker - WSJ. Retrieved July 13, 2016, from http://www.wsj.com/articles/SB100014241278873248941045781033340 72599870 24. Residential Generators, 3rd Edition - U.S. Market and World Data. (n.d.). Retrieved July 13, 2016, from https://www.giiresearch.com/report/sbi227838-residential-generators- 3rd-edition-us-market-world.html
  • 36. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 36 of 46 25. Barroso, L. A., Clidaras, J., & Hölzle, U. (2013). The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture, 8(3), 1-154. 26. West, B. C. (2014). Factors That Influence Application Migration To Cloud Computing In Government Organizations: A Conjoint Approach. 27. Total Cost of Ownership. (2016). Retrieved July 06, 2016, from http://www.backuparchive.awstcocalculator.com/ 28. United States. White House Office, & Obama, B. (2011). International Strategy for Cyberspace: Prosperity, Security, and Openness in a Networked World. White House. 29. Kundra, V. (2011). Federal cloud computing strategy. 30. VanRoekel, S. (2011, December 8). MEMORANDUM FOR CHIEF INFORMATION OFFICERS. Retrieved July 13, 2016, from https://www.fedramp.gov/files/2015/03/fedrampmemo.pdf 31. Code, U. S. (1999). Gramm-Leach-Bliley Act. Gramm-Leach-Bliley Act/AHIMA, American Health Information Management Association. 32. What is Sensitive Data? Protecting Financial Information ... (2008). Retrieved June 19, 2016, from http://ist.mit.edu/sites/default/files/migration/topics/security/pamphlets/ protectingdata.pdf 33. Government Accountability Office (GAO) Report 08-343, Protecting Personally Identifiable Information, January 2008, http://www.gao.gov/new.items/d08343.pdf 34. (Wilshusen, G. C., & Powner, D. A. (2009). Cybersecurity: Continued efforts are needed to protect information systems from evolving threats (No. GAO- 10-230T). GOVERNMENT ACCOUNTABILITY OFFICE WASHINGTON DC.) 35. McCallister, E., Grance, T., & Scarfone, K. (2010, April). Guide to Protecting the Confidentiality of Personally ... Retrieved July 13, 2016, from http://csrc.nist.gov/publications/nistpubs/800-122/sp800-122.pdf 36. Act, A. C. C. O. U. N. T. A. B. I. L. I. T. Y. (1996). Health insurance portability and accountability act of 1996. Public law, 104, 191.
  • 37. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 37 of 46 37. Graham, C. M. (2010). HIPAA and HITECH Compliance: An Exploratory Study of Healthcare Facilities Ability to Protect Patient Health Information. Proceedings of the Northeast Business & Economics Association. 38. Anderson, H. (2010, February 8). The Essential Guide to HITECH Act. Retrieved June 19, 2016, from http://www.healthcareinfosecurity.com/essential-guide-to-hitech-act-a- 2053 39. Dimov, I. (2013, June 20). Guiding Principles in Information Security - InfoSec Resources. Retrieved July 09, 2016, from http://resources.infosecinstitute.com/guiding-principles-in-information- security/ 40. Amazon Web Services (AWS) - Cloud Computing Services. (n.d.). Retrieved July 10, 2016, from https://aws.amazon.com/ 41. EC2 Instance Types – Amazon Web Services (AWS). (2016). Retrieved July 10, 2016, from https://aws.amazon.com/ec2/instance-types/ 42. Regions and Availability Zones. (2016, January). Retrieved July 13, 2016, from http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using- regions-availability-zones.html 43. Elastic IP Addresses. (2016). Retrieved July 10, 2016, from http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip- addresses-eip.html 44. AWS | Elastic Load Balancing - Cloud Network Load Balancer. (2016). Retrieved July 10, 2016, from https://aws.amazon.com/elasticloadbalancing/ 45. AWS | Amazon Route 53 - Domain Name Server - DNS Service. (2016). Retrieved July 10, 2016, from https://aws.amazon.com/route53/ 46. SSL Security Solutions. (2016). Retrieved July 10, 2016, from http://www.networksolutions.com/SSL-certificates/index.jsp 47. What is the SSL Certificate Chain? (2016). Retrieved July 10, 2016, from https://support.dnsimple.com/articles/what-is-ssl-certificate-chain/ 48. Ellingwood, J. (2015, January 28). Apache vs Nginx: Practical Considerations | DigitalOcean. Retrieved July 10, 2016, from https://www.digitalocean.com/community/tutorials/apache-vs-nginx- practical-considerations
  • 38. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 38 of 46 49. Node.js Introduction. (2016). Retrieved July 10, 2016, from http://www.tutorialspoint.com/nodejs/nodejs_introduction.htm 50. Webberly, W. (2016, May 23). Direct to S3 File Uploads in Node.js | Heroku Dev Center. Retrieved July 12, 2016, from https://devcenter.heroku.com/articles/s3-upload-node#summary 51. Compensation. (2013, October 22). Retrieved July 12, 2016, from http://www.benefits.va.gov/compensation/dbq_disabilityexams.asp
  • 39. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 39 of 46 Source Code Listings App.js – this is the server side logic for the project: /* Cecil Thornhill 5/26/2016 Based on code examples and samples from Will Webberly and Amazon for S3 uploads */ /* In learning how to interface to S3 via Node JS and JavaScript I started with code from a tutorial provided by Dr. Will Webberly who was a computer science lecturer at Cardiff University and is now CTO at Simply Di Ideas. Will was kind enough to correspond with my and address questions on the concepts and use cases involved in my project. The original article I referenced is at: https://devcenter.heroku.com/articles/s3-upload-node#initial-setup */ /* This is the main logic for the server side of the proof of concept demo for my project. The code here supports the features required to allow the client to security load a file to the S3 storage site. The simple proof pages and this core logic do not attempt to implement any user authentication, authorization or administration of the site. Those funcitons are pre-selected via the structure of the users and groups built in the S3 interface for this demo. All these aspects would be expected in a more full featured site design, but are not required to establish the functional proof of concept for the main secure upload of files functionality. */ /* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */
  • 40. Secure File Management Using the Public Cloud Masters of Cybersecurity Practicum Project, ISM 6905– Cecil Thornhill Masters Project CThornhill v2 final.docx7/13/16 Page 40 of 46 /* * Import required packages. * Packages should be installed with "npm install". */ /* CT - I am using local variable for the development versions of this demo site. Below I requre dotenv to allow local config management, so this demo can runwithout setting envirionment variables on the server which is the more correct final operations configuration practice on a deployed systems to prevent exposing the values in the open production environment. Of course it is much easier to manage local values from this resource file in the development phase so that is the way I went for the the current demo code. */ var dotenv = require('dotenv'); dotenv.load(); /* To ensure that we got the values we expexted I also show the variables now in process.env - now with the values from the .env added on the console. Of course this is not something to do in the final production system. */ console.log(process.env) const express = require('express'); const aws = require('aws-sdk'); /* * Set-up and run the Express app. CT - note we are ruuning on port 3000 in this case. It is important to foraward your web traffic from the NGINX server to the proper port via setting up the reverse proxy configuration in the NGINX server, so that traffic gets through from the web server to the applicaiton server. */ const app = express(); app.set('views', './views'); app.use(express.static('./public')); app.engine('html', require('ejs').renderFile); app.listen(process.env.PORT || 3000); /* * Load the S3 information from the environment variables.