1. Today I am going to give you an overview of my new book,
“Data Dynamite: how liberating information will transform our
world.”
Originally I was to co-author the book with Vivek Kundra,
Chief Technical Officer of the District of Columbia, and a true
trailblazer in this field. However, fortunately for the US,
unfortunately for me, President Obama chose Vivek to become the
US’s first CIO.
2. I’m convinced I was chosen by to write this book through some
sort of cosmic joke, because I’m the least-likely person to write a
book on data. You see, I’m right-brained and intuitive. For me, data
used to be good for one thing, and one thing only: figuring the Red
Sox’ batting averages. But in reality, that makes me ideally suited to
write this book, because it’s time that people like me no longer be
disenfranchised when it comes to data. It’s time for data for the rest
of us!
3. When I got interested in data, I found it was pretty hard to get at.
We pay taxes so government can collect data, and you
can bet companies know all about our shopping habits. Our
activities and lives are data’s raw material.
But once it’s collected, most citizens -- and a lot of
employees for that matter -- don’t have a clue where data is stored
or how it’s used. It’s like that last scene in “Raiders of the Lost
Ark,” where the Ark is boxed up and stored in a government
warehouse: you knew it wouldn’t be found again. Substitute a data
warehouse and you’re got the picture of the too-frequent reality.
4. Today, there are signs of hope. Closely-controlled and
long-lost data is being liberated by the growing demand for
transparency.
Perhaps the best example is one of Vivek Kundra’s
primary accomplishments while he was the U.S. CIO: Data.gov.
The government launched it in the Spring of 2009 with about 20
data sets. By the end of its first three months in use more than
100,000 government data sets – many of them valuable real-time
geo-spatial ones – had been uploaded, Now, nearly 400,000 data
sets are hosted on Data.gov, demonstrating how much data has
been trapped in data warehouses, waiting only to be liberated to
serve the common good .
5. The time has come to liberate data!
”Liberating data makes it automatically available to
those who need it (based on their roles and responsibilities), when
and where they need it, in forms they can use, and with freedom to
use as they choose -- while simultaneously protecting security and
privacy."
6. The result will be change and benefits in every aspect of
our lives, changes that are particularly critical given the current
global challenges and that will improve our lives:
• give workforces real-time information
• automate previously manual processes, saving time & increasing
efficiency
• improve government regulatory processes by making access to
reports instantaneous and shareable by all agencies
• reduce corporate regulatory costs
• restore public confidence through transparency
• empower the public as full partners in government and business.
7. However, we are a long way from fully realizing these
benefits. Data.gov and its counterparts in about 20 other countries
to the contrary, the reality is that, by and large, data has not been
liberated either by government or businesses -- and when it has
been liberated we’re often unprepared to capitalize on it.
The potential for transformation is not all that different
from 1520, when Martin Luther’s translation of the Latin Bible into
German and decision to print copies, instead of hand-copy them,
gave most people direct access to the printed word for the first
time. They no longer had to rely on the clergy as intermediaries.
The results were quick and dramatic: Luther’s works no
only led to the Reformation, but to a tremendous push for literacy
and the printed word.
Just as the printing press transformed learning and
people’s access to the word, so too the Internet, and handful of new
web-based tools, none of them radically innovative by themselves
but revolutionary when combined, is making it possible, in many
cases for the first time, for workers and the general public to have
direct access to actionable, valuable data. I believe the benefits and
revolution for numbers will be equally dramatic as what Luther set
in motion for words.
8. The first step to begin this transition is an strategic one:
It’s time to switch to data-centric organizations, in which usable
data is accessible to all sorts of applications and devices,
automatically, and all of the organization’s functions are arranged
around the data.
9. The 2nd step to liberate data is to assure that data is
valuable. That means that instead of data becoming captured and
altered by applications, it must remain as “data nuggets,”
accessible to all applications and machines that can act on it. To
create those data nuggets we must “structure” data using XML,
KML or other systems that attach “tags” such as the XBRL ones
you see here, to the numbers. This information about information,
or metadata, transforms mere numbers into valuable data. In this
case, instead of just the number 882,000,000, we now know it
refers to the company’s net income. That income data can flow
automatically, and in real time, to any place where the same tags
are inserted.9
These tag systems are universal, open standards,
available to all, at no charge. I want to emphasize standards,
incidentally: it’s precisely because XML, XBRL, KML are
universally recognized and not proprietary, that it makes them
valuable: they, and the data tagged by them can be shared by all.
One of the most important aspects of XML and variants is that
once the tags are attached to the data, they remain attached: the
package of metadata and data can be automatically shared by
other applications as well as devices. That reduces errors because
the data doesn’t have to be rekeyed: you get a “single version of the
truth.”
10. The third step for effective liberating data programs is
to provide users with the Web 2.0-based tools such as Gapminder
(shown here) that will make it possible for them to really capitalize
on that data. Even for trained statisticians, let alone the rest of us,
data visualization tools aid in understanding complex data sets,
relationships, and so on, because they take statistics and portray
them graphically, which makes it easier to understand trends,
possible causality, and other factors. As one of the acknowledged
thought leaders in data visualization, Edward Tufte, says,
“Graphics reveal data. Indeed, graphics can be more precise and
revealing than conventional statistical computations.”
In recent years a number of lower-cost dashboard
applications such as Tableau, as well as free web-based data
visualization tools, such as Many Eyes, have become available ,
allowing non-statisticians to easily take data and turn it into a wide
range of highly informative visual representations, while Web 2.0
tools such as tags, threaded discussions and topic hubs encourage
robust discussion of the results. That’s important, too: when data
is discussed by people with differing backgrounds, interests and
skills, aspects of the data are discovered and explored that even the
brightest person, working in isolation, would never uncover.
11. Curiously, although a growing range of government
agencies release public data streams, almost none provide them to
their own workforces, to give workers actionable data precisely
when and where they need it, to do their work more efficiently.
The fourth element of an effective liberating data
strategy is for agencies -- and corporations -- to follow the
District of Columbia's lead, and apply the same strategy behind the
firewall first, giving workers access to the same data they disclose
in public data feeds.
After all, employees may be struggling with
incompatible data bases, may need to reach across departmental
“silos” to see if there might be synergies between programs, and
employees from another department may be able to provide new
insights simply because of their differing life experiences and
expertise.
As more young workers, who have never known life
without the Web, join workforces, they’ll naturally ask why tools
they’ve used can’t be used in the workplace. A data graphics project
can empower them and tap their expertise.
Using the same data feeds to run your organization that
agencies and companies furnish through external data feeds to the
public and others can be a powerful way of earning public trust:
you’re in essence saying we stand behind this data: we’re so
confident in it that we use the same data to run our daily
operations as we furnish to you.
12. Finally, on the cutting edge of liberating data is to use it to
invite your customers or citizens to become co-creators of products
and services.
That’s what Beth Noveck, the former Obama Administration
deputy CTO, did prior to joining the Administration, with the Peer-
to-Patent program, which allows interested experts and laymen to
become active partners in the patent review process. They have
already significantly reduced the patent application backlog.
With liberating data, crowdsourcing will become
commonplace and will result in both improved services to the
public and entrepreneurial opportunities.
13. But what if you liberate data but nobody comes? We have to
realize, and deal with, the reality that a majority of the American
population is innumerate, i.e., doesn’t have the basic skills
demanded to deal with basic numeric calculations. This rate was
probably masked by indifference during the era when data was
hard to obtained, but now that it is potentially ubiquitous, that
high failure rate is unacceptable.
Fortunately, the same tools that can make data intelligible and
interesting to adults can also be used in the classroom to make dry
numbers come alive and let students learn by playing with
numbers. The private sector should partner with educators to
make this transition a reality, to build numeracy and the people’s
ability to deal with statistical information..
14. One reason for optimism that a new data-centric society could
overcome innumeracy is the way that users of a wide range of
social media have been quick to adopt, and have quickly learned to
use accurately, tagging data. In this case, use of the #wxreport tag
assures that the National Weather Service’s computers will receive
Tweets referring to breaking local weather observations, making
the public valuable adjuncts to other information sources. If this
kind of alteration in user behavior can happen spontaneously,
imagine what could happen if there were formal programs
designed to increase data numeracy!
15. Thank you.
To learn more about liberating data and how to create
the processes and policies to make it a reality, contact:
Stephenson Strategies 335 Main Street, Medfield, MA 02052 (617)
314-7858 D.Stephenson@stephensonstrategies.com