SlideShare a Scribd company logo
1 of 14
Download to read offline
Caption Transcript from “Challenges and Opportunities in Big Data”

On Thursday, March 29, 2012, from 2-3:45 pm ET, federal government science heads from
OSTP, NSF, NIH, DOE, DOD, DARPA and USGS outlined how their agencies are engaged in
Big Data research. The event took place at the American Association for the Advancement of
Science in Washington, D.C.

An archive of this webcast will be available on nsf.gov within two days of this event.

(I have no ownership of this information nor can I attest to its accuracy since it was transcribed
and may have room for error. This information was extracted from
http://live.science360.gov/bigdata/ immediately after the session concluded)


Event ID: 1921909
Event Started: 3/29/2012 1:47:56 PM ET


Please stand by for real time captions.

I welcome all of you. From the look -- looks of the crowd, big data is a big deal. We think it will
be bigger. The Obama administration is announcing new investments in federal agencies. In
research and development related to big data. That is in a Norma's the volume of data, and the
variety of those data. I am joined by a number of administrative [Indiscernible]. I will introduce
them in a moment. Following their brief presentation [Indiscernible] we will have a few minutes
for questions. We are happy to have as a moderator from that panel Steve or -- Lour of the New
York Times. We are grateful to him. Before I turn things over to my colleagues I would like to
say a few words. Why date data and then nonprofit organizations may get involved further in
this domain. These data are being [Indiscernible]. Computers running large scales programs.

Recently, the Presidents Council on science and technology, it concluded a report for the
president that the federal government is under investing in analyzing sharing and -- what this
data. What matters, is our ability to arrive from the new insights to recognize relationships to
make accurate prediction. Our ability to move from date it to knowledge to action. As we look at
the action in the White House, following the release of the [Indiscernible] report aired became
clear for a national big data initiative. The advantages, a cross government focus on big data.
One, the domain of big date it will be important from an economic point of view. It is a creation
of new IT products. How to use big data to make better decisions. Second, it is critical to
accelerating discovery and many domains in science and engineering. Such as in astronomy.
Containing hundreds of millions of celestial objects. Working with big data helps with major
national challenge, security, health environment and education. Some hospitals are using big
data. Startups are beginning with online courses. These kinds of applications are dependent on
making the most of information we and most of the world are generating every day. We will take
the lead on big data that the government can play an entire -- important role. Using big data
approaches to make progress on key national challenges. Challenges -- government challenges
as well. Glassware -- last year, we had it enter agency committee on big data. It has already
identify concrete agencies that [Indiscernible] can take. To advance the state of art in big data
heard to solve big problems in science, health security environment and more. We will make
more additional investments. I want to challenge industries, to join with the ministration to make
the most of the extraordinary opportunities created by big data. This is not something the
government can do by itself. We need what the president calls and all hands on deck approach.
To have this domain realize. Some companies are already sponsoring big data. Universities are
beginning to create new courses. An entire new courses of study. Organization, like data
without Borders providing programs and data collection and analysis your --.

It is my pleasure to introduce our speakers.

I do want to recognize one person that you will see later, Con O'Neill are deputy rector. --

Let me now turn to Sresh Subra.

Thank you John. Today science gathers data from science, but from Spirit -- theoretical
calculation. At NSF it happens across all fields. At NSF we recognize that data is motivated a
profound transportation -- transformation in culture and conduct in science research. NSF is a
proud leader in supporting the fundamental science and infrastructure underlining and enabling
the big data revolution hurt --. Back in the 1980s, we supported the first high performance
should -- supercomputing centers heard now we are supporting the next generation. At Blue
Waters the University of Illinois, one of the most powerful supercomputer in the world capable of
quadrillions of calculations per second just opened two research teams two weeks ago. We
announce seven new efforts. NSF and NIH have joined together in collaboration in the critical
area of core techniques and technologies. We will have new knowledge from large data sets. It
will evaluate new algorithms. Technology and tools to improve data collection and management.
In addition to this cross agency solicitation, I am delighted to announce a $10 million expeditions
in computing award to researchers at the University of California, Berkeley. The team will
undertake a complete rethinking of data analysis, integrating algorithms, machines, and people
to develop new ways to turn data into knowledge and insight.

Another 1.2 million Another $1.2 million awarded brings together statisticians and biologists to
develop network models and automatic scalable algorithms and tools to tell us more about
protein structures and biological pathways. NSF has aggressively embarked on focused cross
function efforts to address the challenges and seize opportunity best opportunities that big data
enable science efforts

We are calling on members of the science community to use data citation to increase
opportunities for the use and analysis of data sets. This is Tran5's commitment per to
sustainability of data. To continue Tran5's role, for the best and brightest scientists and
engineers we [Indiscernible] a 21 track.

We will announce an opportunity for students is a $2 million research research -- training group
award. This is for undergraduates. To use graphic and accurate [Indiscernible] .

This involves, many NSF directives . To teach and [Indiscernible] learning environments. The
data-driven [Indiscernible] are bold and cross disciplinary, and cross governmental. We hope
that this work we do today, will lay the groundwork for new enterprises. And to fortify the
foundations of US competitiveness. Thank you so much

[ Applause ]
We just described to you this joint effort with NSF and NIH for bold ideas in the management of
big data aired there may have been a time when biologists were small [Indiscernible] with
lavatories cranking out modest volumes of information. Because that is all we had. With the
tools available. We now have arrived in the big time. We are capable of producing big data sets.
And the need for analysis, we are thrilled to join our colleagues as they -- to tackle these issues.
I want to say thanks to Karen Remington, helping us -- how we you can make the most of this
announcement. Today's advances are driving our kind of science. As I come out of the field of
genomics -- it will not Sapporo's you that I will use that as an example [Indiscernible] your --.

Will [Indiscernible] . It just went up on our websites a few minutes ago. A new collaboration. To
try to deal with some of the magnitude of data being produced. We have now formed a class
TranII with Amazon. It is up on the cloud. This is a project that aims to sequence 1000
sequence, but 17 of those are in hand. I can tell you, it has been a challenge for all of the users
who want access to this information. That was the point of doing the project. To have that
access. This is 200 terabytes. That is 16 million file cabinets worth, or 30,000 DVDs. Having this
up in the cloud, will provide substantial advantages to users. We are delighted to have
[Indiscernible] a couple of years . We are happy to announce this collaboration with Amazon.
This will be a launching pad for many kinds of data opportunities. That will include data that will
be coming forward. In large quantities. Especially in cancer research. The Cancer genome is
[Indiscernible] breaking 22,000 gene that genomes. What are the driving mutations question
Mark anyone working in cancer want access to that data. It is important to us to get the data in
control. For tools for sharing and managing data. There are other things to mention, national
[Indiscernible] . To support through the common port. One is your --. To have medical
information shared. The most daunting and most exciting of our [Indiscernible], we need a
dynamic inventory of our science resources to provide information. We recognize that that data
day increase the technology -- we are pleased that we have highlighted this and brought it
together. To figure out how we can work, and accept the challenge, embraced the -- challenge
of big data. We look forward to that continue collaboration. We are delighted to have a chance
to be part of this this afternoon. Thank you very much.

[ Applause ]

Good afternoon. The USGS collects the data on water, earthquakes, geology, climate come a
biology, and the earth landscape. So much big data it we are in danger of draining -- drowning.
Today I would like to tell you what this innervated center is doing to help scientists come
together in our beautiful center in Fort Collins Colorado. To gather meaning from that larger
amount of data. Bringing uncommon bedfellows of scientists. Bringing together economists,
biologists -- in government and outside of government to tackle the kinds of problems that are
the headlines of today. By using existing data and rich sources -- to employ new technology to
share and integrate existing data in order to solve those problems. To make progress in
science. It is a proposal driven dross us through peer review. The projects are selected from the
John Wesley Powell center. There is a process open to select new projects for next year your it
will close on April it will close on April 30. We will also announce the proposals for next year. I
will announce today. To give you an example of the ongoing projects now, and some of our
successes. You will have an idea what the center does hurt weight --

One problem is the uncertain climate future. We understand that climate models [Indiscernible].
The model is only good as it can protect the past. If they can only per Dick the past within a
couple of degrees, because the climate past -- our ability to reconstruct climate situation is only
good to a couple of Greece, we cannot trust that prediction into the future. We came together at
the center, a used a variety of data, to put together a purse sized reconstruction of the
[Indiscernible] thermal optimum ever produced. It will be the standard for all models. So in the
future, we can per Dick future climate better than before. Another example on -- for ongoing is
water resources. This is a subject of current headlines. People want to know how much water is
used -- is it making a dent in my local water supply push Mark -- ?

This is something the center is undertaking. A proposal -- if it is selected, I should add, for this
round, the proposals are being supported by the geological survey, but survived -- supported by
the national science foundation. Understanding and managing resilience of global change.
Modeling species response to environmental change. Mercouri cycling across North America.
Fibrous [Indiscernible] in the US for human health. Modeling of earthquakes and magnitude.
Thank you very much. We are very part -- proud to be a part of this initiative.

Good afternoon. I would like to start off by thanking you for inviting us. Today the department of
defense is [Indiscernible] . We are placing a big bet on big data. We change the game
[Indiscernible] . To be the first to demonstrate in use [Indiscernible] secure here -- peer to peer.
With computer speed, precision and human [Indiscernible] . This will help our analysts to make
sense of the huge volume of data that our military [Indiscernible] collects . They will also support
multiple missions. We see an opportunity to maneuver and understand the environment. They
will not have to make decisions by themselves. They will know when they can and cannot call
upon the human. There is a revolution emerging on using mass data. [Indiscernible] . This has a
potential to bring together sensing, perception and decision support. Since the invention of
integrative circuits -- in 1959 with a single [Indiscernible] transition. Two processors that are
embedded in cell phones. No technology has greater impact or scale. Information technology is
at the core of defense. We funded [Indiscernible] in the earlydays.. That are accused globally.
The department continue investment in 3-D electronics, computing -- will ensure to extend this
legacy. How we employ that capacity and the capability to use data that is being produced.
Everyone has it [Indiscernible] . It is remarkable [Indiscernible] transition from the early concepts
a few years ago to concepts [Indiscernible] . That dynamic reasoning, to learn from experience
with little training and understanding those tools that recognize trends. Adapt to the real world.
Without relying on human intervention. This must be within a secure framework, put the trust in
the data. And the human trust in the system will be maintained with the system to communicate
very natural [Indiscernible] and allow users to collaborate and reason with the data. In 1950,
Alan turn -- Turnon proposed this concept. Information on these [Indiscernible] limitation is
available at a new website being launched. We are looking for a generation of new ideas.
[Indiscernible] . The --

When we lived in caves -- [Indiscernible] lined up a tree to get [Indiscernible] to get a better look.
Later it was observation balloons. Rovero -- in recent decades there was [Indiscernible] . From
paper to hard drives to [Video and audio cutting in and out] that data it collected is often
imperfect, incomplete and [Video and audio cutting in and out] .

The Atlantic Ocean is 350 350 million cubic in volume. It is 100 100 billion billion gallons of
water. If each gallon of water are presented a [Indiscernible] the Atlantic Ocean could store all of
the data it generated by the water. [Video and audio cutting in and out] . We need new
fundamental approaches to big data. That match the needs of users, adaptable to changing
missions and to perform on a timescale that match [Indiscernible]. We announced that X. data.
It is a $25 million program a year. With that program, we seek the equivalent of a radar and
overhead a merger he -- image for big data. To provide the DOD [Video and audio cutting in and
out]
One of our roles within the science community am a and one that we are proud, is supporting
construction of major [Indiscernible] over laboratories. These facilities include Pardo -- large-
scale x-ray light sources. And some of the world's fastest supercomputers. More than 26,000
researchers across the nation from the universities to government laboratories make use of
these facilities each year. In fact, an interesting [Indiscernible] substantially your best. Single
experiments conducted, can produce terabytes data per day. Estimated terabytes per second. It
has to reject [Indiscernible] one of 1000 piece of data each nano second. The standard output,
and then enter comparison project of terabits today constitutes the fastest collection of data
facilities. The client community expects that [Indiscernible] to be hundred exabytes by 200 -- to
store share and analyze information tiered --. We have been a supporter of innovated research,
analysis of extreme data. Storage and visualization technology. One is the height storage
system. The fast bit data [Indiscernible] used by a major Internet search such -- engine. It is the
winner of 2008 [Video and audio cutting in and out] . To aid the nation site in the analysis and
[Indiscernible] . It will develop new and existing technology for big data. We will partner with
other teams to ensure that the best up to date technology is used throughout our program. This
was case done -- based on peer to peer [Indiscernible]. Leading mathematicians am a and
program experts from seven universities. It will have a cross range of fields, to probe and mine
their data. I am pleased to say that several members are here today. I would like to thank Ari
and Rob and their team. For their leadership in this area. Again we are grateful to [Video and
audio cutting in and out] . Thank you

[ Applause ]

We have a few moments for questions. If you would like to direct your question to a specific
speaker, please do so. If you do not, I will probably choose.

I am just -- Jeff. When did you first realize that government was not doing enough on big data?
Was it before that [Indiscernible] report ? Why are -- are not NASA [Video and audio cutting in
and out]

Many individuals at the thought that the federal government was not coming together. It really
came together with the be cast -- [Indiscernible] report came out . [Video and audio cutting in
and out] NASA and NOHA are not up here [Video and audio cutting in and out] .

This is Bob [Indiscernible] from CNRI. I am sure you about about that international
[Indiscernible]. Science is not something we do alone in this country. I was wondering how you
see the development of a collaborative environment being developed [Video and audio cutting
in and out]

Would you like to answer?

We think about competition and collaboration all of the time with all of our at entities. This is
something where it -- it is a [Indiscernible] thinking . [Video and audio cutting in and out] we
have been engaging in with many of our partners. An Antarctic that we have 15 countries
involved. It is the same with the Arctic it -- circle. More recently we have been engaging other
countries not just [Indiscernible]. We have biodiversity. For our programs, we have strong
collaborations. The [Video and audio cutting in and out] . There is big data [Video and audio
cutting in and out]
On the earthquake model [Video and audio cutting in and out] it was collaborative [Video and
audio cutting in and out] for an example the lodge Sobotka -- large sum Entre.

What the laboratory actually does [Video and audio cutting in and out] . Every year or so, they
have a model comparison. Between the various models around the world. Things like LAC, and
the data from close all over the world. It is a huge network today.

When you have public access to data [Video and audio cutting in and out] at the earliest
possible moment. It is critical to make that information available. Other projects [Video and
audio cutting in and out] have been doing, also it involved multiple customers -- countries. It is
critical for data sharing. For the average investigator has access to it.

[Indiscernible-low volume]

We do understand that. The challenge of doing that was pointed out forcefully in that report.

I think it is hot.

The fact that you guys are going to introduce [Indiscernible-low volume] there will be jobs. For
many scientist. What are you predicting [Indiscernible-low volume] . With scientists working with
data. Who are looking for work to be able to work in these areas, not just the new generation but
those in the area now looking for work.

[Video and audio cutting in and out]

This is where the excitement is going to be. [Laughter] . There is a vast quality of [Indiscernible]
career path . There are [Indiscernible] . [Video and audio cutting in and out] . We are determined
to provide those training pathways to increase skilled individuals. And also to [Video and audio
cutting in and out] . To have programs for individuals that need those skills in Midshipman
Courier pert -- mid-Courier

Are shortages is not high performance computers, but rather high-performance people. We
have training to handle more applications, to help industries get used to this [Video and audio
cutting in and out] .

Privacy issues with that. [Video and audio cutting in and out] . There is a consumer bill of rights.
Could you expand on that? On the new challenges in the new thinking making sure we get that
benefits and minimize the risks.

We know that privacy dealing with it data -- they are thinking about the privacy [Indiscernible].
We talk about this in the Council groups . I think you can expect [Video and audio cutting in and
out] . Our capacity to keep up with the privacy [Indiscernible] .

[Video and audio cutting in and out]

To our panel experts [Video and audio cutting in and out]

[ Applause ]
We have a wonderful panel of experts for you. We have experts from industry and academia.
We are fortunate to have Daphne Koehler, from the University [Indiscernible] . Who is an expert
in machine learning and applied [Indiscernible] with the big data and application.

And Alex form [Indiscernible] your please join me in welcoming our expert panel.

Thank you for coming. Astronomy is a strong one. This is proof. Please describe it and briefly --
what is learned. What is your take on the lessons learned. That can be applied in general to
scientific discovery?

We also cause it that Cosmo genome project. To map out the whole northern sky. It should be
available for everyone on the planet. It seems like an incredible large amount of data. We went
to this process, working with Jim Gray her Microsoft. We realized that this is much bigger than
[Indiscernible] . All of the information is it available at our anger tips. We can make this a data
available for everyone. It was a wonderful experience to try out new [Indiscernible] . How to
create the new data. How they astronomy community adapts to new programming language. I
see the same patterns emerging would genomics. What the data sets and the scientist can do
[Video and audio cutting in and out] . They can create -- all of that data about the whole sky.

What did you find out question Mark -- ?

One discovery am a --, one thing we did not think we could measure, was the [Indiscernible] of
the early universe. We could figure out what the big band -- bang would look like.

There is a visual imprint?

It is like taken a big drum and put sand over it, we have the same picture of the universe.

Let's go with Daphne Koehler from Stanford. You engaged in a [Indiscernible] project. About
putting out advanced online programs in college courses for computer science. There has been
a big debate on online education. That is a field lacking in data. Briefly describe, the education
process. What is the potential of applications if you will for education.

We should talk about big education. Where do we get the right training for 21st-century jobs.
Education is a great equalizer come up but it has suffered from scarcity and affordable in the
United States and around the world. What we have started at Stanford have a --, offering large
online classes throughout the world. It will equalize society and provide opportunities for people
that would not have access to high-quality education. For the view that had [Indiscernible] . That
said of the courses, with on line, with assessment, an accomplishment at the end. Provide us
with information on how humans learn hurt when you track the data from hundred of thousands
of people engaging in material, answering questions, as they interact with each other. That is a
store of data we have not had in education. The studies, the typical sizes 20 or 50 people. Here
you can actually study human learning when you have 100,000 people. And figure out what
people -- what makes people learn and what doesn't. This kind of big data is a surprising and a
new opportunity for us to understand how people learn and how to learn better.

In some way this is that technology will live -- nudge. That ultimate -- you are just starting but
this.
There was a paper from Bloom people were trained using a tutor. It was to standard deviation
above the norm [Video and audio cutting in and out] . You cannot afford a -- an individual tutor
for everyone. If you can get the online environment have the same personalization -- as a
human tutoring setting. To patter recognitions -- get us to those goals that Bllom outlined.

James, was a co-author of a basic study we done -- the McKinsey global Institute. It had a lot of
attention on many fronts. One thing that you said no, we need a, we need 149,000 more people
that have deep analytical skills. Give us a flavor -- it is a big gap. Give us a flavor of the
numbers and where they came from. What you see unfolding.

It is very exciting about talking about big data. [Video and audio cutting in and out] . Workforce
issue is one of those. One things that was striking in our research, this came from looking at a
combination of what are the company's looking for -- they requirements. The challenges they
talk about. The thing they put at the top of the list, was the scale challenge. They can work their
way through technical issues, but the skill challenge isn't an issue. -- People can manipulate
with that data. When we look -- a combination of set skill in the workforce today. When we
projected for were to the next five years [Video and audio cutting in and out] that gap for those
skills was at least hundred and 50 -- 159,000 workers. That intakes in count of everyone being
trained. The next big challenge, is the group -- that data savvy managers. That decision makers.
In the big data world, how you manage things changes dramatically. I know for an example, one
company that takes advantage of big data in a huge way. The CEO actually [Video and audio
cutting in and out] when I make hiring decisions [Video and audio cutting in and out] how do you
run experiments? How do you get insights push Mark -- ?

[Video and audio cutting in and out] . It was at least 1 1/2 million gap in who was trained. I am
involved with the US at Berkeley, getting enormous pressure from companies to create data
management courses. How do you educate a new generation? These are the more technical --
the things in big data there are a lot of -- how do you connect structured databases with sources
of data that gives you unstructured data. How do you connect location devices that are
[Indiscernible] . There is a lot of data integration that are required. That are different from your
typical databased manager. That group, that gap was 300 to 400 to 400,000. This resonate with
the challenges companies are having with filling these roles. This is a surprising thing per --.

Let's give Lucila a chance for the policy am public purchase a patient site that goes along with
privacy issues especially with the health industry. [Video and audio cutting in and out] why didn't
you tell us a little bit about privacy.

E. manage and every visit, where you collect the data. And try to extract what worked and what
did not work. [Video and audio cutting in and out] . Imagine all doctors -- all of the data available
to you as well as [Indiscernible] . As well as location, and environmental data and so forth. It is
important for us to have this data so the analysts can have a solution [Indiscernible] . It is
important that privacy be preserved. When you go to the doctor, you want your data and privacy
preserved. I believe people that -- are nice tiered -- teary at

We would know why one in 80 kids in America are at risk or diagnosed with autism. What are
the factors? What can we learn when we assemble the data collect what -- correctly your --.

So much of this -- it is that technical side. Be on that, there is the public acceptability. You
mention blood donation [Indiscernible] tiered that kid who saved another with a rare type.
Nancy's life was saved because of XYZ. As we [Indiscernible] . We established laws and rules
for credit cards, -- if you do the right thing it is a $50 limit. On the social engineering side, no
matter what you do tech now -- technically [Video and audio cutting in and out] . We have heard
this about electronic health records. These are issues. Are you guys working on this on the
social engineering side?

I owe they say the surge in gets the [Indiscernible] . We in informatics do not get any
[Indiscernible] . We are behind the scene. We are trying to discover new things, so the next
generation will not suffer from the same issues. We are developing a consent management
system. In which we do exactly that. We ask the patient, do you want to donate your data, what
kind of data and who would you like to donate to? At one point you may will change your mind,
and that should be available to you as a choice. People do not realize how much data they
already donate.

You talk about getting over the subconscious [Indiscernible] at giving data.

For quite a long time, the astronomy community has been quicker to adopting [Indiscernible]
doing the science. The only thing we could do was observed the skies. We could not change the
stars and galaxies. We just could interpret that data. There are people who are doing lab
experience -- they have several degrees of freedom. Astronomers learn about we saw
something in the sky. We did not had a hypothesis that this was a collapsing nova. We just saw
the [Indiscernible] and made more observations. We try to figure out what is going on. So the
community was very quick to accept existing data sets. We had no subconscious your ears to
work on.

Their response, is collaboration is great that -- or it.

It is clear the same revolution in the biological, a life-size -- like science. The early science
[Indiscernible] was separate room what we have seen now. It is very similar. There is larger
simulations. Some are under [Indiscernible] . When we analyze this data it is the same. It is
important that these computations are done at the super -- supercomputer centers. To turn the
simulation into [Indiscernible] that everyone can play with .

Daphne I do not want to leave you out. You published a paper last year. [Video and audio
cutting in and out]

The idea was to let the data speak for themselves. [Video and audio cutting in and out] rather
than coming in with me particular perspective. Most of pathology looks at the structure and the
state -- shape of the breast cancer itself. The surrounding tissue is more [Indiscernible] then that
cancer cells themselves. This is where the benefits of large data can come to bear. That data
cannot speak and tell us [Indiscernible] .

Part of the assumption, we assume that cell. -- sale. Big data is -- this progression from data to
knowledge is a path -- to have one believe. Your title was --

Let's talk about the micro level. When we look at across the different companies and industries
hurt when you look at companies within the same industry, there is a wide divergence. For an
example, you have companies that think they have made a lot of progress. They have gone
from analyzing data. Maybe they have analyzed it every month. They use it to figure out what to
do next month. The companies at the other end, that are analyzing all of the data every day -- in
real time. And using that data to think about what date present to you next. Either on the
website or in the physical store. They have gone from data to action. You see a range of
practices. This takes you back to innovation. You will see it in the results. What different
companies to is quite wide. It will take you to that end of Asian competitive -- innovated
competitive.

If you think about everything from insurance -- for an example. We may sit in the same
demographic, but may be you drive in the middle of the night, anhydride during the day. That is
update we did not know before. Now we can monitor that. We will find exactly how you drive
when you drive. There is a lot of offerings and systems that are not efficient. There is more
segmentation. When you look across multiple sectors, the productivity potential to make certain
sectors more productive is quite significant. We did some in-depth analysis. When you imagine
what companies are doing using big data is very significant. This is another reason why we think
this is important. What they do point out, why this is interesting -- it is a huge volume of it. All of
this data is digital. Whenever you have digital data, -- the keys what you can do to copy it -- play
with it, experiment is almost unbounded. I was surprised -- if you look at data 20 years ago, that
resided in research institutes. They had all of the data. Today it is a different. Most of it is in your
pocket right now. You are carrying it around and capturing data. Who has the data, what do they
do with it. That is a big shift. It is now pervasive across every sectors. What you will find, any
company appreciable size -- who employ 1000 or more people, they have as much Deibert --
data as the library of congress. The question becomes, how do companies use to leverage this?
The part I like the most am a the good news am a much of this is going to be surplus. Which will
be HREF it thing. --

If we stayed at a hotel to -- two or three time they should know who we are. We are setting up
our own expectation. We rely on our cell phones to guide us around but the big benefit I like,
what it will create for that consumer. The range of practices around -- across countries is
surprising.

The consumer benefit is price haggling. You are negotiation in the economic world are
asymmetric more than ever.

Not only do you know what price to pay, you will know which store actually has it that day. You
will also know how far away you are that day. It gets more interesting.

One thing, on energy. If you think about it, a lot of this is happening with any of us intervening.
The number of things have a center, an IP address is almost limitless. That means, like in
energy, if you look at how we use energy. We leave things on. Lights, refrigerate should, --
voters -- motors on. Now we can put these things everywhere, that benefit of energy
[Indiscernible] . The potential is significant.

On the privacy side. We have seen so much on my ability to [Indiscernible] data . I did a story a
few years ago at -- on researchers that mashed up information. For 12 percent of the population
they could get down to the nine digits -- their Social Security number. You understand this --
what is the thing to say to people about their privacy?

Mathematically you can not per -- guaranteed that people will not [Indiscernible] that data you
disclose. You can guarantee the risk. If people are willing to ask map -- except that risk. What is
the risk -- as I said before, much of that data has been released to, not researchers, not health
care institutions -- people are using the Internet all times. They art donating data. The very thing
to say, there is a benefit. It is not too one individual. The risk does exist, it can be made minimal.
I used to say, who cares about my health care data? In legislation and policy for insurance --
they cannot use that for a particular purpose and cell one. -- so on.

I have a question are all members of the panel. What concerns you about this? There are a
number of dimensions. In Stanford we are saying about [Video and audio cutting in and out]
there is always a pattern. If the pattern significant? You have increased risk of falls positives --
false positives. There is a high level pattern recognition. Models are per told -- brittle. This gets
into privacy -- discrimination based on that -- not really who you are. As practitioners in this
world, what worries you?

The concerns you raised are valid. When people misapply the to, -- data. You can have
confidence in the pattern. It speaks to the need for fundamental science in pattern recognition.
That plays a critical low in our ability to analyze the kinds of data we are seeing. And to avoid
mismanaging.

I would agree. Quite frankly, from the examples I have seen I am trusting of that data models
than the intuition base decision-making you often see. There are a couple of things, one is the
question of discrimination of an economic sense. One thing to -- when you desegregate to such
a letter, to offer services to a certain demographics. The good news, quite often there are other
providers that are happy with the right economic incentives. The other one I worry about, I am
struck in the private sector how -- in some instances data stops what is it interesting -- what is
going on. Retailers can analyze so many things. When you have poor relations better
[Indiscernible] . You will actually get -- sale of diapers are a big way next to the canons of --
cans of formula. These are children diapers. [Laughter] . The point, you get issues that are
practical for retail -- but you do not know what is really going on.

In many times it does happen. That is white many things are called I/O markers -- bio markers.
It is important to have good analyst and good people to interpret the data. That is white training
individuals for that particular domain -- is a interesting proposition. Another thing I want to
remind people, when the inventor of the airplane saw the airplane used in war. Was quite
disappointed. It does that mean the airplane should not have been invented, just for another
purpose. In our community, we are trying to invent something new that is very useful. We count
on the larger community to regulate its use it is Mac --.

It is a classic one in every field. It is a fact finding mission. It is easier to find it. It is more for the
social science and politics.

The bigger that data sets are, we will still see in here to -- uncertainties. There is the uncertain
principle. We have to teach the next generation, and also statistical thinking. We have to be
careful not just to give them a telescope for data but also [Indiscernible] .

Before I forget, one early comment, you talk about training of people. Scientist training I shape
people. And statisticians train -- TT shape people I gather your career is a good example.

I got there by accident. I was into pewter science -- computer science.

Deep in one area and brought in others [Indiscernible] .

We have to start early. I take some of these grants address those.
I am JIm from RPI. You have been talking about math statistics and computer. If we want our
students to be analysis, we have to take -- teach about peer, a lot of step on the social side.
How do you see that fitting into the education training?

We have not touched on the other part -- that computer science. How we can transform
professionals that are dealing with one data at a time, to understand the whole complexity of the
data. There are several initiatives for treating these individuals. I am glad to be part of them.
You do bring back, in our case, doctors back to the computer science. So they can be that data
analysts. Then they will marry that [Indiscernible] knowledge with that computer knowledge.

It is that data savvy management. That applies to marketing, behavioral science -- there is a big
second swab. You have to be able to harness.

We need to train pie shape lawyers. [Laughter]

I want to address one point. Observation on people adopting these techniques. Not being able
to explain beer and it -- diapers. If you dig into that, dear and dipole -- beer and diapers are
people who want to stay home and watch football. We were closely with the electronic records.
One thing we face even with companies, and that -- how do you explain in common human
terms with the language that Dr. speed or the business speak -- how to back [Indiscernible] .

You are right. One things when we find by machine learning -- the machine learning, is the 25 or
so that the work in many cases. 75 percent is looking at the work. It is the human intuition. It is
not too replace the machine, but help dig in the patterns that are. -- better. That is important
when ever you do -- when the goal is discovery.

It is one thing for these companies to higher 150,000 new workers, how certain are you there
going to put their money where their mouth is. Instead of wishful thinking. It would be nice to
have these workers --

The quick answer, right now there are lots of companies that have open racks -- in retail,
financial services and Internet services. If you talk to how very in -- Hal Veran from Google, he
will tell you a or is it difficult [Indiscernible] in hiring statisticians. We had a word situation in the
US economy, there is a sorted out jobs. There are companies that have open [Indiscernible] ,
you do survey after survey, most companies will say -- we have 40 percent open racks I cannot
tell because that particular skills.

Let me add to that. In silicone Valley it is almost impossible to hire qualified engineer,
developers -- it is a problem with this industry. You cannot find enough people to do the work
required to make the economy to continue to grow. I do not take it is the lack of jobs. It is lack of
people with the right skills.

I think the answer, if this is true being a competitive advantage -- you have to higher -- hire
them. Or be killed.

I would like to make a comment. Regarding education. I am glad to see additional in the data
announcement that we have several initiatives to support the education for the next generation.
What I would like to point out come a when we educate data savvy or copy occasional say
every -- it has to be bilingual her for an example, we at NIH we are trying to train the younger
generation am a --, they need to get their hands wet dirt if he only turning data savvy people --
that is okay. When you train that younger generation they have to have pulled skills or they will -
- both skills are they will fail.

One thing we saw in our study, a lot of the data -- is unstructured data. Comments people will
write. That are unique, it is Pacific to the discipline. Or a specific language. Who is contributing
to the unstructured data -- and so forth. Most will tell you their biggest challenge -- capturing
data on their website. It is all on structured. -- unstructured.

In addition to that, when you make predictive models, a lot of data savvy people -- you collected
the data too late. You need to collect it the way we think is correct. Another question, that help
donation of your personal data is a patient, there is a big movement, called, patient like me. It
has similar outcomes compared [Indiscernible]. My question to do , we do not have good tools
to integrate that private donated data with our [Indiscernible] .

Those efforts are undergoing. There are initiatives to do that. It is to integrate it with the
electronic health. I would like to comment -- we are training -- it is interdisciplinary -- we are not
all the way [Indiscernible] dirt they are starting their careers. For an example, we are training
these individuals as well. We have to attack from all sides. If someone has health care degree
they can be complemented with a computer skill course. We need more of those.

My name is Karen Keane. I want to go back to a comment, we do not have data in education.
We do have a lot of information in a education that is not turned into data. The marginal cost of
electronic data, in long-standing industries -- for the sake of argument. The data, the information
collected is often not collected in a digital format. Often, the information that is the lack it --
collected can drive decision. I was interested in from a medical perspective -- this is a problem
that the medical -- collecting a broader range of data. Thinking how does an existing group that
needs to be able to process that data -- I can imagine a world where we have not, and course
state standards. You could create massive data states if you could only collect the data is
across these groups. You would also have to change the way teachers and administrators in a
variety of the other people work. Thinking about -- how can we learn from that experience to
maximize the potential for big data in places where it is not being taken advantage of.

Someone who has worked with the electronic healthcare records. It is less rosy than you think.
There are records out there in electronic form, when you actually look in their many fields are
missing. Many fields are wrong. I think the industry has a way to go before one can make
significant use of the information that is in principle -- but it is difficult to get at. In this respect,
the educational industry so to speak, because it is coming in later can avoid some of the
mistakes that not putting things in appropriate form. Collect things more systematically that -- so
one can make that [Indiscernible] for analysis. We know that the significant patterns that could
[Indiscernible] . Etiquette would be a tremendous boost on how to teach her children.

We are made a tremendous progress. Making it rosier. There is room for improvement in the
way we are collecting data. And the way we use existing data and transform it into computable
format. That is part of the education we need to inflect not only on the user's but also the
professionals who are dealing with healthcare informatics. The need for change it this, I think
there is tremendous progress. Going through steps. As much as we would like to leapfrog many
steps -- we need stages -- for the record to be useful. That information will be sparse. You do
not need all the information. You do not need to ask all of the questions. You need to work with
sparse data. And to count the computer science community to develop ways to make the data is
work for us.

It is to acknowledge the question -- the basis of the question. One thing we found out that was
interesting, when we look across all industries -- there are three categories that are merged.
There were sectors where everything was ready. The investment was there, they were open
and competitive. There was a lot of interaction. All of the things will happen. Then you have
another category in the middle. Or healthcare was one. There were challengers that you point
out, it was set up to be instrumental industry. Equipment that captures data. Payments going
back and forth. The instrument ability -- is very high. There was incentive, and privacy.
Education was in the third. It was a challenge. There has been relatively few investments in the
system. It is not a highly instrumented -- yet. If you can have standardized test -- it limits the
extent where it is instrumental sector. It had the most challenge -- the education was the most
challenge sector. At the same time it is a sector as an economy -- we expand enormous dollars
per we send a lot. It tends to be more efficient. This set up not to take an advantage. There is a
national challenge, it is one sector -- how do we solve that one? It will not solve itself. I do not
worry about us -- retail, but I worry about education.

I am here writing a story for AOL. Several weeks ago there was the big government warm. --
fourm. I think there is a disconnect between what you are doing and that intelligent is doing --
CIA. We have 20 systems doing this on a massive scale. A lot of new vendor products -- that
seemed to be way beyond what you are talking about here. I think it needs to be informed and
connected with the intelligence community --

We have gone through generations of leadership. And a lot of money.

I do agree with that. The most amazing example of this -- but the defense and tell -- intelligence.
The leading edge companies. To look for the most sophisticated models, you do need to look at
agencies -- you also look at high-end retail. That is the [Indiscernible]

[Indiscernible-low volume]

I think we have exhausted our time. They give her a much are in

-- Thank you very much.

[Event concluded]

More Related Content

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Challenges and opportunities in big data

  • 1. Caption Transcript from “Challenges and Opportunities in Big Data” On Thursday, March 29, 2012, from 2-3:45 pm ET, federal government science heads from OSTP, NSF, NIH, DOE, DOD, DARPA and USGS outlined how their agencies are engaged in Big Data research. The event took place at the American Association for the Advancement of Science in Washington, D.C. An archive of this webcast will be available on nsf.gov within two days of this event. (I have no ownership of this information nor can I attest to its accuracy since it was transcribed and may have room for error. This information was extracted from http://live.science360.gov/bigdata/ immediately after the session concluded) Event ID: 1921909 Event Started: 3/29/2012 1:47:56 PM ET Please stand by for real time captions. I welcome all of you. From the look -- looks of the crowd, big data is a big deal. We think it will be bigger. The Obama administration is announcing new investments in federal agencies. In research and development related to big data. That is in a Norma's the volume of data, and the variety of those data. I am joined by a number of administrative [Indiscernible]. I will introduce them in a moment. Following their brief presentation [Indiscernible] we will have a few minutes for questions. We are happy to have as a moderator from that panel Steve or -- Lour of the New York Times. We are grateful to him. Before I turn things over to my colleagues I would like to say a few words. Why date data and then nonprofit organizations may get involved further in this domain. These data are being [Indiscernible]. Computers running large scales programs. Recently, the Presidents Council on science and technology, it concluded a report for the president that the federal government is under investing in analyzing sharing and -- what this data. What matters, is our ability to arrive from the new insights to recognize relationships to make accurate prediction. Our ability to move from date it to knowledge to action. As we look at the action in the White House, following the release of the [Indiscernible] report aired became clear for a national big data initiative. The advantages, a cross government focus on big data. One, the domain of big date it will be important from an economic point of view. It is a creation of new IT products. How to use big data to make better decisions. Second, it is critical to accelerating discovery and many domains in science and engineering. Such as in astronomy. Containing hundreds of millions of celestial objects. Working with big data helps with major national challenge, security, health environment and education. Some hospitals are using big data. Startups are beginning with online courses. These kinds of applications are dependent on making the most of information we and most of the world are generating every day. We will take the lead on big data that the government can play an entire -- important role. Using big data approaches to make progress on key national challenges. Challenges -- government challenges as well. Glassware -- last year, we had it enter agency committee on big data. It has already identify concrete agencies that [Indiscernible] can take. To advance the state of art in big data heard to solve big problems in science, health security environment and more. We will make more additional investments. I want to challenge industries, to join with the ministration to make the most of the extraordinary opportunities created by big data. This is not something the
  • 2. government can do by itself. We need what the president calls and all hands on deck approach. To have this domain realize. Some companies are already sponsoring big data. Universities are beginning to create new courses. An entire new courses of study. Organization, like data without Borders providing programs and data collection and analysis your --. It is my pleasure to introduce our speakers. I do want to recognize one person that you will see later, Con O'Neill are deputy rector. -- Let me now turn to Sresh Subra. Thank you John. Today science gathers data from science, but from Spirit -- theoretical calculation. At NSF it happens across all fields. At NSF we recognize that data is motivated a profound transportation -- transformation in culture and conduct in science research. NSF is a proud leader in supporting the fundamental science and infrastructure underlining and enabling the big data revolution hurt --. Back in the 1980s, we supported the first high performance should -- supercomputing centers heard now we are supporting the next generation. At Blue Waters the University of Illinois, one of the most powerful supercomputer in the world capable of quadrillions of calculations per second just opened two research teams two weeks ago. We announce seven new efforts. NSF and NIH have joined together in collaboration in the critical area of core techniques and technologies. We will have new knowledge from large data sets. It will evaluate new algorithms. Technology and tools to improve data collection and management. In addition to this cross agency solicitation, I am delighted to announce a $10 million expeditions in computing award to researchers at the University of California, Berkeley. The team will undertake a complete rethinking of data analysis, integrating algorithms, machines, and people to develop new ways to turn data into knowledge and insight. Another 1.2 million Another $1.2 million awarded brings together statisticians and biologists to develop network models and automatic scalable algorithms and tools to tell us more about protein structures and biological pathways. NSF has aggressively embarked on focused cross function efforts to address the challenges and seize opportunity best opportunities that big data enable science efforts We are calling on members of the science community to use data citation to increase opportunities for the use and analysis of data sets. This is Tran5's commitment per to sustainability of data. To continue Tran5's role, for the best and brightest scientists and engineers we [Indiscernible] a 21 track. We will announce an opportunity for students is a $2 million research research -- training group award. This is for undergraduates. To use graphic and accurate [Indiscernible] . This involves, many NSF directives . To teach and [Indiscernible] learning environments. The data-driven [Indiscernible] are bold and cross disciplinary, and cross governmental. We hope that this work we do today, will lay the groundwork for new enterprises. And to fortify the foundations of US competitiveness. Thank you so much [ Applause ]
  • 3. We just described to you this joint effort with NSF and NIH for bold ideas in the management of big data aired there may have been a time when biologists were small [Indiscernible] with lavatories cranking out modest volumes of information. Because that is all we had. With the tools available. We now have arrived in the big time. We are capable of producing big data sets. And the need for analysis, we are thrilled to join our colleagues as they -- to tackle these issues. I want to say thanks to Karen Remington, helping us -- how we you can make the most of this announcement. Today's advances are driving our kind of science. As I come out of the field of genomics -- it will not Sapporo's you that I will use that as an example [Indiscernible] your --. Will [Indiscernible] . It just went up on our websites a few minutes ago. A new collaboration. To try to deal with some of the magnitude of data being produced. We have now formed a class TranII with Amazon. It is up on the cloud. This is a project that aims to sequence 1000 sequence, but 17 of those are in hand. I can tell you, it has been a challenge for all of the users who want access to this information. That was the point of doing the project. To have that access. This is 200 terabytes. That is 16 million file cabinets worth, or 30,000 DVDs. Having this up in the cloud, will provide substantial advantages to users. We are delighted to have [Indiscernible] a couple of years . We are happy to announce this collaboration with Amazon. This will be a launching pad for many kinds of data opportunities. That will include data that will be coming forward. In large quantities. Especially in cancer research. The Cancer genome is [Indiscernible] breaking 22,000 gene that genomes. What are the driving mutations question Mark anyone working in cancer want access to that data. It is important to us to get the data in control. For tools for sharing and managing data. There are other things to mention, national [Indiscernible] . To support through the common port. One is your --. To have medical information shared. The most daunting and most exciting of our [Indiscernible], we need a dynamic inventory of our science resources to provide information. We recognize that that data day increase the technology -- we are pleased that we have highlighted this and brought it together. To figure out how we can work, and accept the challenge, embraced the -- challenge of big data. We look forward to that continue collaboration. We are delighted to have a chance to be part of this this afternoon. Thank you very much. [ Applause ] Good afternoon. The USGS collects the data on water, earthquakes, geology, climate come a biology, and the earth landscape. So much big data it we are in danger of draining -- drowning. Today I would like to tell you what this innervated center is doing to help scientists come together in our beautiful center in Fort Collins Colorado. To gather meaning from that larger amount of data. Bringing uncommon bedfellows of scientists. Bringing together economists, biologists -- in government and outside of government to tackle the kinds of problems that are the headlines of today. By using existing data and rich sources -- to employ new technology to share and integrate existing data in order to solve those problems. To make progress in science. It is a proposal driven dross us through peer review. The projects are selected from the John Wesley Powell center. There is a process open to select new projects for next year your it will close on April it will close on April 30. We will also announce the proposals for next year. I will announce today. To give you an example of the ongoing projects now, and some of our successes. You will have an idea what the center does hurt weight -- One problem is the uncertain climate future. We understand that climate models [Indiscernible]. The model is only good as it can protect the past. If they can only per Dick the past within a couple of degrees, because the climate past -- our ability to reconstruct climate situation is only good to a couple of Greece, we cannot trust that prediction into the future. We came together at
  • 4. the center, a used a variety of data, to put together a purse sized reconstruction of the [Indiscernible] thermal optimum ever produced. It will be the standard for all models. So in the future, we can per Dick future climate better than before. Another example on -- for ongoing is water resources. This is a subject of current headlines. People want to know how much water is used -- is it making a dent in my local water supply push Mark -- ? This is something the center is undertaking. A proposal -- if it is selected, I should add, for this round, the proposals are being supported by the geological survey, but survived -- supported by the national science foundation. Understanding and managing resilience of global change. Modeling species response to environmental change. Mercouri cycling across North America. Fibrous [Indiscernible] in the US for human health. Modeling of earthquakes and magnitude. Thank you very much. We are very part -- proud to be a part of this initiative. Good afternoon. I would like to start off by thanking you for inviting us. Today the department of defense is [Indiscernible] . We are placing a big bet on big data. We change the game [Indiscernible] . To be the first to demonstrate in use [Indiscernible] secure here -- peer to peer. With computer speed, precision and human [Indiscernible] . This will help our analysts to make sense of the huge volume of data that our military [Indiscernible] collects . They will also support multiple missions. We see an opportunity to maneuver and understand the environment. They will not have to make decisions by themselves. They will know when they can and cannot call upon the human. There is a revolution emerging on using mass data. [Indiscernible] . This has a potential to bring together sensing, perception and decision support. Since the invention of integrative circuits -- in 1959 with a single [Indiscernible] transition. Two processors that are embedded in cell phones. No technology has greater impact or scale. Information technology is at the core of defense. We funded [Indiscernible] in the earlydays.. That are accused globally. The department continue investment in 3-D electronics, computing -- will ensure to extend this legacy. How we employ that capacity and the capability to use data that is being produced. Everyone has it [Indiscernible] . It is remarkable [Indiscernible] transition from the early concepts a few years ago to concepts [Indiscernible] . That dynamic reasoning, to learn from experience with little training and understanding those tools that recognize trends. Adapt to the real world. Without relying on human intervention. This must be within a secure framework, put the trust in the data. And the human trust in the system will be maintained with the system to communicate very natural [Indiscernible] and allow users to collaborate and reason with the data. In 1950, Alan turn -- Turnon proposed this concept. Information on these [Indiscernible] limitation is available at a new website being launched. We are looking for a generation of new ideas. [Indiscernible] . The -- When we lived in caves -- [Indiscernible] lined up a tree to get [Indiscernible] to get a better look. Later it was observation balloons. Rovero -- in recent decades there was [Indiscernible] . From paper to hard drives to [Video and audio cutting in and out] that data it collected is often imperfect, incomplete and [Video and audio cutting in and out] . The Atlantic Ocean is 350 350 million cubic in volume. It is 100 100 billion billion gallons of water. If each gallon of water are presented a [Indiscernible] the Atlantic Ocean could store all of the data it generated by the water. [Video and audio cutting in and out] . We need new fundamental approaches to big data. That match the needs of users, adaptable to changing missions and to perform on a timescale that match [Indiscernible]. We announced that X. data. It is a $25 million program a year. With that program, we seek the equivalent of a radar and overhead a merger he -- image for big data. To provide the DOD [Video and audio cutting in and out]
  • 5. One of our roles within the science community am a and one that we are proud, is supporting construction of major [Indiscernible] over laboratories. These facilities include Pardo -- large- scale x-ray light sources. And some of the world's fastest supercomputers. More than 26,000 researchers across the nation from the universities to government laboratories make use of these facilities each year. In fact, an interesting [Indiscernible] substantially your best. Single experiments conducted, can produce terabytes data per day. Estimated terabytes per second. It has to reject [Indiscernible] one of 1000 piece of data each nano second. The standard output, and then enter comparison project of terabits today constitutes the fastest collection of data facilities. The client community expects that [Indiscernible] to be hundred exabytes by 200 -- to store share and analyze information tiered --. We have been a supporter of innovated research, analysis of extreme data. Storage and visualization technology. One is the height storage system. The fast bit data [Indiscernible] used by a major Internet search such -- engine. It is the winner of 2008 [Video and audio cutting in and out] . To aid the nation site in the analysis and [Indiscernible] . It will develop new and existing technology for big data. We will partner with other teams to ensure that the best up to date technology is used throughout our program. This was case done -- based on peer to peer [Indiscernible]. Leading mathematicians am a and program experts from seven universities. It will have a cross range of fields, to probe and mine their data. I am pleased to say that several members are here today. I would like to thank Ari and Rob and their team. For their leadership in this area. Again we are grateful to [Video and audio cutting in and out] . Thank you [ Applause ] We have a few moments for questions. If you would like to direct your question to a specific speaker, please do so. If you do not, I will probably choose. I am just -- Jeff. When did you first realize that government was not doing enough on big data? Was it before that [Indiscernible] report ? Why are -- are not NASA [Video and audio cutting in and out] Many individuals at the thought that the federal government was not coming together. It really came together with the be cast -- [Indiscernible] report came out . [Video and audio cutting in and out] NASA and NOHA are not up here [Video and audio cutting in and out] . This is Bob [Indiscernible] from CNRI. I am sure you about about that international [Indiscernible]. Science is not something we do alone in this country. I was wondering how you see the development of a collaborative environment being developed [Video and audio cutting in and out] Would you like to answer? We think about competition and collaboration all of the time with all of our at entities. This is something where it -- it is a [Indiscernible] thinking . [Video and audio cutting in and out] we have been engaging in with many of our partners. An Antarctic that we have 15 countries involved. It is the same with the Arctic it -- circle. More recently we have been engaging other countries not just [Indiscernible]. We have biodiversity. For our programs, we have strong collaborations. The [Video and audio cutting in and out] . There is big data [Video and audio cutting in and out]
  • 6. On the earthquake model [Video and audio cutting in and out] it was collaborative [Video and audio cutting in and out] for an example the lodge Sobotka -- large sum Entre. What the laboratory actually does [Video and audio cutting in and out] . Every year or so, they have a model comparison. Between the various models around the world. Things like LAC, and the data from close all over the world. It is a huge network today. When you have public access to data [Video and audio cutting in and out] at the earliest possible moment. It is critical to make that information available. Other projects [Video and audio cutting in and out] have been doing, also it involved multiple customers -- countries. It is critical for data sharing. For the average investigator has access to it. [Indiscernible-low volume] We do understand that. The challenge of doing that was pointed out forcefully in that report. I think it is hot. The fact that you guys are going to introduce [Indiscernible-low volume] there will be jobs. For many scientist. What are you predicting [Indiscernible-low volume] . With scientists working with data. Who are looking for work to be able to work in these areas, not just the new generation but those in the area now looking for work. [Video and audio cutting in and out] This is where the excitement is going to be. [Laughter] . There is a vast quality of [Indiscernible] career path . There are [Indiscernible] . [Video and audio cutting in and out] . We are determined to provide those training pathways to increase skilled individuals. And also to [Video and audio cutting in and out] . To have programs for individuals that need those skills in Midshipman Courier pert -- mid-Courier Are shortages is not high performance computers, but rather high-performance people. We have training to handle more applications, to help industries get used to this [Video and audio cutting in and out] . Privacy issues with that. [Video and audio cutting in and out] . There is a consumer bill of rights. Could you expand on that? On the new challenges in the new thinking making sure we get that benefits and minimize the risks. We know that privacy dealing with it data -- they are thinking about the privacy [Indiscernible]. We talk about this in the Council groups . I think you can expect [Video and audio cutting in and out] . Our capacity to keep up with the privacy [Indiscernible] . [Video and audio cutting in and out] To our panel experts [Video and audio cutting in and out] [ Applause ]
  • 7. We have a wonderful panel of experts for you. We have experts from industry and academia. We are fortunate to have Daphne Koehler, from the University [Indiscernible] . Who is an expert in machine learning and applied [Indiscernible] with the big data and application. And Alex form [Indiscernible] your please join me in welcoming our expert panel. Thank you for coming. Astronomy is a strong one. This is proof. Please describe it and briefly -- what is learned. What is your take on the lessons learned. That can be applied in general to scientific discovery? We also cause it that Cosmo genome project. To map out the whole northern sky. It should be available for everyone on the planet. It seems like an incredible large amount of data. We went to this process, working with Jim Gray her Microsoft. We realized that this is much bigger than [Indiscernible] . All of the information is it available at our anger tips. We can make this a data available for everyone. It was a wonderful experience to try out new [Indiscernible] . How to create the new data. How they astronomy community adapts to new programming language. I see the same patterns emerging would genomics. What the data sets and the scientist can do [Video and audio cutting in and out] . They can create -- all of that data about the whole sky. What did you find out question Mark -- ? One discovery am a --, one thing we did not think we could measure, was the [Indiscernible] of the early universe. We could figure out what the big band -- bang would look like. There is a visual imprint? It is like taken a big drum and put sand over it, we have the same picture of the universe. Let's go with Daphne Koehler from Stanford. You engaged in a [Indiscernible] project. About putting out advanced online programs in college courses for computer science. There has been a big debate on online education. That is a field lacking in data. Briefly describe, the education process. What is the potential of applications if you will for education. We should talk about big education. Where do we get the right training for 21st-century jobs. Education is a great equalizer come up but it has suffered from scarcity and affordable in the United States and around the world. What we have started at Stanford have a --, offering large online classes throughout the world. It will equalize society and provide opportunities for people that would not have access to high-quality education. For the view that had [Indiscernible] . That said of the courses, with on line, with assessment, an accomplishment at the end. Provide us with information on how humans learn hurt when you track the data from hundred of thousands of people engaging in material, answering questions, as they interact with each other. That is a store of data we have not had in education. The studies, the typical sizes 20 or 50 people. Here you can actually study human learning when you have 100,000 people. And figure out what people -- what makes people learn and what doesn't. This kind of big data is a surprising and a new opportunity for us to understand how people learn and how to learn better. In some way this is that technology will live -- nudge. That ultimate -- you are just starting but this.
  • 8. There was a paper from Bloom people were trained using a tutor. It was to standard deviation above the norm [Video and audio cutting in and out] . You cannot afford a -- an individual tutor for everyone. If you can get the online environment have the same personalization -- as a human tutoring setting. To patter recognitions -- get us to those goals that Bllom outlined. James, was a co-author of a basic study we done -- the McKinsey global Institute. It had a lot of attention on many fronts. One thing that you said no, we need a, we need 149,000 more people that have deep analytical skills. Give us a flavor -- it is a big gap. Give us a flavor of the numbers and where they came from. What you see unfolding. It is very exciting about talking about big data. [Video and audio cutting in and out] . Workforce issue is one of those. One things that was striking in our research, this came from looking at a combination of what are the company's looking for -- they requirements. The challenges they talk about. The thing they put at the top of the list, was the scale challenge. They can work their way through technical issues, but the skill challenge isn't an issue. -- People can manipulate with that data. When we look -- a combination of set skill in the workforce today. When we projected for were to the next five years [Video and audio cutting in and out] that gap for those skills was at least hundred and 50 -- 159,000 workers. That intakes in count of everyone being trained. The next big challenge, is the group -- that data savvy managers. That decision makers. In the big data world, how you manage things changes dramatically. I know for an example, one company that takes advantage of big data in a huge way. The CEO actually [Video and audio cutting in and out] when I make hiring decisions [Video and audio cutting in and out] how do you run experiments? How do you get insights push Mark -- ? [Video and audio cutting in and out] . It was at least 1 1/2 million gap in who was trained. I am involved with the US at Berkeley, getting enormous pressure from companies to create data management courses. How do you educate a new generation? These are the more technical -- the things in big data there are a lot of -- how do you connect structured databases with sources of data that gives you unstructured data. How do you connect location devices that are [Indiscernible] . There is a lot of data integration that are required. That are different from your typical databased manager. That group, that gap was 300 to 400 to 400,000. This resonate with the challenges companies are having with filling these roles. This is a surprising thing per --. Let's give Lucila a chance for the policy am public purchase a patient site that goes along with privacy issues especially with the health industry. [Video and audio cutting in and out] why didn't you tell us a little bit about privacy. E. manage and every visit, where you collect the data. And try to extract what worked and what did not work. [Video and audio cutting in and out] . Imagine all doctors -- all of the data available to you as well as [Indiscernible] . As well as location, and environmental data and so forth. It is important for us to have this data so the analysts can have a solution [Indiscernible] . It is important that privacy be preserved. When you go to the doctor, you want your data and privacy preserved. I believe people that -- are nice tiered -- teary at We would know why one in 80 kids in America are at risk or diagnosed with autism. What are the factors? What can we learn when we assemble the data collect what -- correctly your --. So much of this -- it is that technical side. Be on that, there is the public acceptability. You mention blood donation [Indiscernible] tiered that kid who saved another with a rare type. Nancy's life was saved because of XYZ. As we [Indiscernible] . We established laws and rules
  • 9. for credit cards, -- if you do the right thing it is a $50 limit. On the social engineering side, no matter what you do tech now -- technically [Video and audio cutting in and out] . We have heard this about electronic health records. These are issues. Are you guys working on this on the social engineering side? I owe they say the surge in gets the [Indiscernible] . We in informatics do not get any [Indiscernible] . We are behind the scene. We are trying to discover new things, so the next generation will not suffer from the same issues. We are developing a consent management system. In which we do exactly that. We ask the patient, do you want to donate your data, what kind of data and who would you like to donate to? At one point you may will change your mind, and that should be available to you as a choice. People do not realize how much data they already donate. You talk about getting over the subconscious [Indiscernible] at giving data. For quite a long time, the astronomy community has been quicker to adopting [Indiscernible] doing the science. The only thing we could do was observed the skies. We could not change the stars and galaxies. We just could interpret that data. There are people who are doing lab experience -- they have several degrees of freedom. Astronomers learn about we saw something in the sky. We did not had a hypothesis that this was a collapsing nova. We just saw the [Indiscernible] and made more observations. We try to figure out what is going on. So the community was very quick to accept existing data sets. We had no subconscious your ears to work on. Their response, is collaboration is great that -- or it. It is clear the same revolution in the biological, a life-size -- like science. The early science [Indiscernible] was separate room what we have seen now. It is very similar. There is larger simulations. Some are under [Indiscernible] . When we analyze this data it is the same. It is important that these computations are done at the super -- supercomputer centers. To turn the simulation into [Indiscernible] that everyone can play with . Daphne I do not want to leave you out. You published a paper last year. [Video and audio cutting in and out] The idea was to let the data speak for themselves. [Video and audio cutting in and out] rather than coming in with me particular perspective. Most of pathology looks at the structure and the state -- shape of the breast cancer itself. The surrounding tissue is more [Indiscernible] then that cancer cells themselves. This is where the benefits of large data can come to bear. That data cannot speak and tell us [Indiscernible] . Part of the assumption, we assume that cell. -- sale. Big data is -- this progression from data to knowledge is a path -- to have one believe. Your title was -- Let's talk about the micro level. When we look at across the different companies and industries hurt when you look at companies within the same industry, there is a wide divergence. For an example, you have companies that think they have made a lot of progress. They have gone from analyzing data. Maybe they have analyzed it every month. They use it to figure out what to do next month. The companies at the other end, that are analyzing all of the data every day -- in real time. And using that data to think about what date present to you next. Either on the
  • 10. website or in the physical store. They have gone from data to action. You see a range of practices. This takes you back to innovation. You will see it in the results. What different companies to is quite wide. It will take you to that end of Asian competitive -- innovated competitive. If you think about everything from insurance -- for an example. We may sit in the same demographic, but may be you drive in the middle of the night, anhydride during the day. That is update we did not know before. Now we can monitor that. We will find exactly how you drive when you drive. There is a lot of offerings and systems that are not efficient. There is more segmentation. When you look across multiple sectors, the productivity potential to make certain sectors more productive is quite significant. We did some in-depth analysis. When you imagine what companies are doing using big data is very significant. This is another reason why we think this is important. What they do point out, why this is interesting -- it is a huge volume of it. All of this data is digital. Whenever you have digital data, -- the keys what you can do to copy it -- play with it, experiment is almost unbounded. I was surprised -- if you look at data 20 years ago, that resided in research institutes. They had all of the data. Today it is a different. Most of it is in your pocket right now. You are carrying it around and capturing data. Who has the data, what do they do with it. That is a big shift. It is now pervasive across every sectors. What you will find, any company appreciable size -- who employ 1000 or more people, they have as much Deibert -- data as the library of congress. The question becomes, how do companies use to leverage this? The part I like the most am a the good news am a much of this is going to be surplus. Which will be HREF it thing. -- If we stayed at a hotel to -- two or three time they should know who we are. We are setting up our own expectation. We rely on our cell phones to guide us around but the big benefit I like, what it will create for that consumer. The range of practices around -- across countries is surprising. The consumer benefit is price haggling. You are negotiation in the economic world are asymmetric more than ever. Not only do you know what price to pay, you will know which store actually has it that day. You will also know how far away you are that day. It gets more interesting. One thing, on energy. If you think about it, a lot of this is happening with any of us intervening. The number of things have a center, an IP address is almost limitless. That means, like in energy, if you look at how we use energy. We leave things on. Lights, refrigerate should, -- voters -- motors on. Now we can put these things everywhere, that benefit of energy [Indiscernible] . The potential is significant. On the privacy side. We have seen so much on my ability to [Indiscernible] data . I did a story a few years ago at -- on researchers that mashed up information. For 12 percent of the population they could get down to the nine digits -- their Social Security number. You understand this -- what is the thing to say to people about their privacy? Mathematically you can not per -- guaranteed that people will not [Indiscernible] that data you disclose. You can guarantee the risk. If people are willing to ask map -- except that risk. What is the risk -- as I said before, much of that data has been released to, not researchers, not health care institutions -- people are using the Internet all times. They art donating data. The very thing to say, there is a benefit. It is not too one individual. The risk does exist, it can be made minimal.
  • 11. I used to say, who cares about my health care data? In legislation and policy for insurance -- they cannot use that for a particular purpose and cell one. -- so on. I have a question are all members of the panel. What concerns you about this? There are a number of dimensions. In Stanford we are saying about [Video and audio cutting in and out] there is always a pattern. If the pattern significant? You have increased risk of falls positives -- false positives. There is a high level pattern recognition. Models are per told -- brittle. This gets into privacy -- discrimination based on that -- not really who you are. As practitioners in this world, what worries you? The concerns you raised are valid. When people misapply the to, -- data. You can have confidence in the pattern. It speaks to the need for fundamental science in pattern recognition. That plays a critical low in our ability to analyze the kinds of data we are seeing. And to avoid mismanaging. I would agree. Quite frankly, from the examples I have seen I am trusting of that data models than the intuition base decision-making you often see. There are a couple of things, one is the question of discrimination of an economic sense. One thing to -- when you desegregate to such a letter, to offer services to a certain demographics. The good news, quite often there are other providers that are happy with the right economic incentives. The other one I worry about, I am struck in the private sector how -- in some instances data stops what is it interesting -- what is going on. Retailers can analyze so many things. When you have poor relations better [Indiscernible] . You will actually get -- sale of diapers are a big way next to the canons of -- cans of formula. These are children diapers. [Laughter] . The point, you get issues that are practical for retail -- but you do not know what is really going on. In many times it does happen. That is white many things are called I/O markers -- bio markers. It is important to have good analyst and good people to interpret the data. That is white training individuals for that particular domain -- is a interesting proposition. Another thing I want to remind people, when the inventor of the airplane saw the airplane used in war. Was quite disappointed. It does that mean the airplane should not have been invented, just for another purpose. In our community, we are trying to invent something new that is very useful. We count on the larger community to regulate its use it is Mac --. It is a classic one in every field. It is a fact finding mission. It is easier to find it. It is more for the social science and politics. The bigger that data sets are, we will still see in here to -- uncertainties. There is the uncertain principle. We have to teach the next generation, and also statistical thinking. We have to be careful not just to give them a telescope for data but also [Indiscernible] . Before I forget, one early comment, you talk about training of people. Scientist training I shape people. And statisticians train -- TT shape people I gather your career is a good example. I got there by accident. I was into pewter science -- computer science. Deep in one area and brought in others [Indiscernible] . We have to start early. I take some of these grants address those.
  • 12. I am JIm from RPI. You have been talking about math statistics and computer. If we want our students to be analysis, we have to take -- teach about peer, a lot of step on the social side. How do you see that fitting into the education training? We have not touched on the other part -- that computer science. How we can transform professionals that are dealing with one data at a time, to understand the whole complexity of the data. There are several initiatives for treating these individuals. I am glad to be part of them. You do bring back, in our case, doctors back to the computer science. So they can be that data analysts. Then they will marry that [Indiscernible] knowledge with that computer knowledge. It is that data savvy management. That applies to marketing, behavioral science -- there is a big second swab. You have to be able to harness. We need to train pie shape lawyers. [Laughter] I want to address one point. Observation on people adopting these techniques. Not being able to explain beer and it -- diapers. If you dig into that, dear and dipole -- beer and diapers are people who want to stay home and watch football. We were closely with the electronic records. One thing we face even with companies, and that -- how do you explain in common human terms with the language that Dr. speed or the business speak -- how to back [Indiscernible] . You are right. One things when we find by machine learning -- the machine learning, is the 25 or so that the work in many cases. 75 percent is looking at the work. It is the human intuition. It is not too replace the machine, but help dig in the patterns that are. -- better. That is important when ever you do -- when the goal is discovery. It is one thing for these companies to higher 150,000 new workers, how certain are you there going to put their money where their mouth is. Instead of wishful thinking. It would be nice to have these workers -- The quick answer, right now there are lots of companies that have open racks -- in retail, financial services and Internet services. If you talk to how very in -- Hal Veran from Google, he will tell you a or is it difficult [Indiscernible] in hiring statisticians. We had a word situation in the US economy, there is a sorted out jobs. There are companies that have open [Indiscernible] , you do survey after survey, most companies will say -- we have 40 percent open racks I cannot tell because that particular skills. Let me add to that. In silicone Valley it is almost impossible to hire qualified engineer, developers -- it is a problem with this industry. You cannot find enough people to do the work required to make the economy to continue to grow. I do not take it is the lack of jobs. It is lack of people with the right skills. I think the answer, if this is true being a competitive advantage -- you have to higher -- hire them. Or be killed. I would like to make a comment. Regarding education. I am glad to see additional in the data announcement that we have several initiatives to support the education for the next generation. What I would like to point out come a when we educate data savvy or copy occasional say
  • 13. every -- it has to be bilingual her for an example, we at NIH we are trying to train the younger generation am a --, they need to get their hands wet dirt if he only turning data savvy people -- that is okay. When you train that younger generation they have to have pulled skills or they will - - both skills are they will fail. One thing we saw in our study, a lot of the data -- is unstructured data. Comments people will write. That are unique, it is Pacific to the discipline. Or a specific language. Who is contributing to the unstructured data -- and so forth. Most will tell you their biggest challenge -- capturing data on their website. It is all on structured. -- unstructured. In addition to that, when you make predictive models, a lot of data savvy people -- you collected the data too late. You need to collect it the way we think is correct. Another question, that help donation of your personal data is a patient, there is a big movement, called, patient like me. It has similar outcomes compared [Indiscernible]. My question to do , we do not have good tools to integrate that private donated data with our [Indiscernible] . Those efforts are undergoing. There are initiatives to do that. It is to integrate it with the electronic health. I would like to comment -- we are training -- it is interdisciplinary -- we are not all the way [Indiscernible] dirt they are starting their careers. For an example, we are training these individuals as well. We have to attack from all sides. If someone has health care degree they can be complemented with a computer skill course. We need more of those. My name is Karen Keane. I want to go back to a comment, we do not have data in education. We do have a lot of information in a education that is not turned into data. The marginal cost of electronic data, in long-standing industries -- for the sake of argument. The data, the information collected is often not collected in a digital format. Often, the information that is the lack it -- collected can drive decision. I was interested in from a medical perspective -- this is a problem that the medical -- collecting a broader range of data. Thinking how does an existing group that needs to be able to process that data -- I can imagine a world where we have not, and course state standards. You could create massive data states if you could only collect the data is across these groups. You would also have to change the way teachers and administrators in a variety of the other people work. Thinking about -- how can we learn from that experience to maximize the potential for big data in places where it is not being taken advantage of. Someone who has worked with the electronic healthcare records. It is less rosy than you think. There are records out there in electronic form, when you actually look in their many fields are missing. Many fields are wrong. I think the industry has a way to go before one can make significant use of the information that is in principle -- but it is difficult to get at. In this respect, the educational industry so to speak, because it is coming in later can avoid some of the mistakes that not putting things in appropriate form. Collect things more systematically that -- so one can make that [Indiscernible] for analysis. We know that the significant patterns that could [Indiscernible] . Etiquette would be a tremendous boost on how to teach her children. We are made a tremendous progress. Making it rosier. There is room for improvement in the way we are collecting data. And the way we use existing data and transform it into computable format. That is part of the education we need to inflect not only on the user's but also the professionals who are dealing with healthcare informatics. The need for change it this, I think there is tremendous progress. Going through steps. As much as we would like to leapfrog many steps -- we need stages -- for the record to be useful. That information will be sparse. You do not need all the information. You do not need to ask all of the questions. You need to work with
  • 14. sparse data. And to count the computer science community to develop ways to make the data is work for us. It is to acknowledge the question -- the basis of the question. One thing we found out that was interesting, when we look across all industries -- there are three categories that are merged. There were sectors where everything was ready. The investment was there, they were open and competitive. There was a lot of interaction. All of the things will happen. Then you have another category in the middle. Or healthcare was one. There were challengers that you point out, it was set up to be instrumental industry. Equipment that captures data. Payments going back and forth. The instrument ability -- is very high. There was incentive, and privacy. Education was in the third. It was a challenge. There has been relatively few investments in the system. It is not a highly instrumented -- yet. If you can have standardized test -- it limits the extent where it is instrumental sector. It had the most challenge -- the education was the most challenge sector. At the same time it is a sector as an economy -- we expand enormous dollars per we send a lot. It tends to be more efficient. This set up not to take an advantage. There is a national challenge, it is one sector -- how do we solve that one? It will not solve itself. I do not worry about us -- retail, but I worry about education. I am here writing a story for AOL. Several weeks ago there was the big government warm. -- fourm. I think there is a disconnect between what you are doing and that intelligent is doing -- CIA. We have 20 systems doing this on a massive scale. A lot of new vendor products -- that seemed to be way beyond what you are talking about here. I think it needs to be informed and connected with the intelligence community -- We have gone through generations of leadership. And a lot of money. I do agree with that. The most amazing example of this -- but the defense and tell -- intelligence. The leading edge companies. To look for the most sophisticated models, you do need to look at agencies -- you also look at high-end retail. That is the [Indiscernible] [Indiscernible-low volume] I think we have exhausted our time. They give her a much are in -- Thank you very much. [Event concluded]