1. 3rd Socio-Cultural Data Summit
National Defense University
Center for Technology and National Security Policy
2. Admin
• Unclassified conference
• Chatham House rules
• Lunch in the new fiscal reality (the cafeteria)
• We have breaks and time built into our schedule to continue
discussions or to sidebar
2
3. Data Summit(s) Objective
• “Good” data are required for reliable analysis.
− Socio-cultural data of any sort are hard to find.
− When we do find them, they are messy, fragmented,
disorganized, poorly measured, etc.
• These Data Summits are committed to fostering a community that is
interested in finding, evaluating, collecting, cleaning up, smartly
integrating, and then using socio-cultural data against applied
problems with scientific rigor.
− Focus on a broad community with as few restrictions as possible.
− Focus on rigor and science without sacrificing the ability to
conduct real world applications.
3
4. Logical Progression of these Data Efforts
1. DataCards: quick and dirty effort to find, tag, and index data of all
sorts for as many audiences as possible to reduce search costs for
socio-cultural data.
2. First Data Summit: Take a first cut at data evaluation criteria and
beat the heck out of it in working groups so that can start to
evaluate socio-cultural data that we’ve found.
3. Second Data Summit: Expand the aperture on what constitutes
data and relate working group insights back to prior evaluation
criteria and lessons learned for continuing to find and define data.
4. Third Data Summit: Start to tackle the complex issue of “how we
put the data together” once we have found it.
......more working groups focused on areas where we perceive we can
make concrete progress on data integration, cleaning, and fusion.
4
5. DataCards Overview
• DataCards is a structured wiki-like platform that uses “cards” (like card
catalog cards or baseball cards) to index and describe key details re:
socio-cultural (and related) data sources.
• Objectives of DataCards include:
– Make sources of data discoverable.
– Reduce search costs for data.
– Conduit to discover and share data sources between and among
non-traditional, academic, NGO, defense, law enforcement, and
intelligence communities.
• Accessing DataCards:
− Commercial Internet: http://www.datacards.org/
− Development Site: http://beta.datacards.org/
− SIPRNet: by request, hosted by OSD CAPE
5
6. DataCards Content/Usage Update
• Total cards: 1,682
(2,416 pending additional cards)
• Total datacards.org users: 537
• Since .org launch: 5,703 visits; 54,229 pageviews; 00:10:40 average
time/visit; multiple visits from 28 countries
6
8. Summary of 1st Data Summit
• Data, and the quality of the data, used for applied socio-cultural work for the
DoD and other agencies is generally poor.
• Often general and hard to apply to real world situations
• Rarely evaluated, and even more rarely evaluated objectively
• Worked on data evaluation criteria so that a “smart person” isn’t needed to
evaluate data sources.
• Smart people used to create the criteria, and will use “smart people in
training” to apply the ratings.
• The ratings shouldn’t rely on the experience of the rater, but on the
quality of the criteria.
• The effort acknowledged that one size does not fit all requirements, and
criteria should be flexible enough to accommodate a variety of conceptions of
what constitutes “data.”
• DataCards assists consumers of socio-cultural data to rapidly find the data they
need. The evaluation criteria help assess suitability and quality of possible data
sources for their desired application.
8
9. Summary of 2nd Data Summit
• “Data” is a user-defined term; it is not specific to one particular type of data.
DataCards is a platform with a wide user base with varied data needs.
DataCards should seek to assist with the discovery and evaluation of data
sources.
• Big data is a growing field of interest within analytical and knowledge
communities. Big data, which was defined by the complexity, structure, and
size of data, is not just social media but is generally transactional in
nature, including financial transactions, SMS, and search engine results.
• Many data sources are qualitative in nature and cannot be analyzed and
machine processed the way quantitative or geospatial data are processed and
analyzed.
• The most important considerations for users of geospatial data require robust
searching capabilities, a minimal path to finding data, and complete data.
• There is no one way that individuals use to find data. Discovery is often project
specific and individuals tend to establish and follow predictable patterns of
behavior when finding data because certain sources tend to be proven
relevant and trustworthy.
9
10. What is this Summit About?
• This summit is about getting the mess of socio-cultural “stuff” we
often call data into a usable analytic format.
• The first panel focuses on two unique and innovative approaches
toward putting data together for intelligence and analytic purposes;
and a Phase 3 IARPA program that is rapidly fusing data in support of
the intelligence community’s requirements for integrated and
disparate data.
• The second panel focuses on two of the major types of data that are
often trumpeted as the silver bullet to understanding all things
socio-cultural: social media and polling/surveys. However, these are
great case studies in the potential pitfalls of data aggregation
without careful thought about what it is you are putting together.
10
11. What is this Summit About? (continued)
• The third panel provides three approaches to dealing with socio-
cultural data, with moderate technical detail. This includes a look at
the application of statistics to missing data, the dirty work of getting
socio-cultural data ready for a DARPA program, and dealing with
situations where socio-cultural data are sparse.
• Tomorrow, the fourth panel will focus on scientific and technical
approaches to information extraction and data fusion challenges.
• The fifth panel will offer up thoughts on three compelling and
promising areas for socio-cultural data integration: geospatial data
of multiple resolutions, qualitative/subject matter expert-derived
data, and human geography data.
• We’ll end after lunch with a discussion about how we as a
community want to proceed on this conquest.
11
12. What Do I Want to Get Out Of this Summit?
• Community-building and the invigoration of new ideas to support
better work with socio-cultural data.
• Feedback on what methods we are missing and what has merit.
• Feedback on what the forward operator needs from a group like
this—this includes the warfighter, but also law enforcement
officers, NGOs, partner nations, foreign service officers, economic
development professionals: anyone working in the field to make a
difference.
12