SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Big Data activities
at the U.S. Census Bureau

Cavan Capps
Big Data Lead
U.S. Census Bureau
February 13, 2014
Prepared for
MIT Libraries Program on Information Science Brown Bag Talk
Feb 2014
Big Data Challenge at the Census Bureau
“Designed Data” vs. “Organic Data”
“The world is now producing large amounts of data.. data from Internet
searches, credit card transactions, retail scanners, and social media”.
“ There also are more and more digital administrative data (e.g., tax
records, social security records, Medicare/Medicaid records, food stamp
records, HUD records). Some of these data are not directly linked to the
populations we study; some have item missing data problems; none
offer a real replacement for our surveys, but many will be useful as
auxiliary data sources.”

1
Big Data Challenge at the Census Bureau

Big Data is about creating information to make Big
Decisions from novel, and often massive data sources.

2
Big Data creates new Statistical Agency Challenges

A recent meeting of International Statistical Agencies observed:
1. The volume of data generated outside the government statistical
systems is increasing much faster than the volume of data collected
by the statistical systems; almost all of these data are digitized in
electronic files.

2. As this occurs, the leaders expect that relative cost, timeliness, and
effectiveness of traditional survey and census approaches of the

agencies may become less attractive.

3
Big Data creates new Statistical Agency Challenges

A recent meeting of International Statistical Agencies observed:
3.

Blending together multiple available data sources
(administrative, commercial electronic transactions and internet webpage data, search frequency data, twitter, facebook etc. ) with
traditional surveys and censuses (using paper, telephone, face-toface interviewing) to create high quality, timely statistics that tell a
coherent story of economic, social and environmental progress must

become a major focus of central government statistical agencies.

4. This requires efficient record linkage capabilities, the building of
master universe frames that act as core infrastructure to the blending
of data sources, and the use of modern statistical modeling to
combine data sources with highest accuracy.

4
Big Data creates new Statistical Agency Challenges

A recent meeting of International Statistical Agencies observed:
5. The Agencies will need to develop the analytical capabilities to
distill insights from more integrated views of the world and impart
a stronger systems view across different government and private
sector information systems to provide more geographical and
industry detail.

6. There are growing demands from researchers and policy-related
organizations to analyze the micro-data collected by the agencies, to
extract more timely and detailed information from the data.

5
Big Data Development Challenges for Statistical Agencies

The Meeting Recommended that Statistical Agencies develop:
1.
2.
3.

4.

High-speed, “big data” software/hardware systems for record
linkage and extraction of key information from massive files.
Efficient and sophisticated imputations procedures needed to make
the combined data sources jointly useful.
More use of statistical modeling for statistical estimation, to provide
more:
1. Timely estimates
2. Small area estimates
3. New measures
New ways to give secure access to micro-data for legitimate policy
and research purposes, to increase their impact of their work.

6
In Summary, massive challenges for the Statistical Agencies:

1.

The Internet and Private E-Transactions are generating data faster
and more cheaply than Statistical agencies can afford to do.

2.

To be reliable sources of information on the Demographics, Economy
and Social change in the U.S., this information needs to be mashed
together with traditional surveys and adjusted for bias.

3.

The sizes of the files and the number of computations to mash up the
data will be larger.

4.

Spoiled by the Internet, users expect more timely, and detailed data
provided at lower costs.

5.

Privacy/Confidentiality must be maintained.
7
Big Data Projects at the Census Bureau
The Census Bureau “Big Data”
Information Life Cycle

Data Collection
- Multi-Mode Data Survey Collection model
- New Data sources (Web, E-Transactions, Admin Recs)
Data Integration & Analysis
- Record Linkage
- Small Area Estimation modeling & “Now Casting”
Data Release
- Data Review for Release
- Confidentialize data for public release

8
Big Data Current Process

Future Process (exploring)

• Designed Data

• Designed & Organic Data

• Proprietary Software

• Next Generation Open-Source
& Proprietary Software

• Batch Processing

• More Parallel Processing

• Long processing times

• Faster processing times

9
Big Data Collection: Improving Survey Logistics & Cost

Improving Survey Collection and Imputation Operations(Adaptive
Design)

1.

Multi-modal data collection to reduce operational costs of
data collection
– More effective use of existing data such as
administrative records
– Incorporating new data into decennial operations
• Paradata from Internet Data Capture
• Information from Social Media Feeds

2.
3.

Edits and Imputations
Data Review
10
Big Data Collection: Evaluating Web Data as Inputs

Potential Internet Data Collection

1.
2.

Examine Google & Bing search frequency trend data

3.

Examine Twitter, and other social media trend data

Examine “Web Scraping” of housing data, price data, local
tax data, crime data, corporate profits etc.

11
Big Data Collection : Evaluating Commercial E-Transaction
Input Data

1.

Housing:
–
–

2.

Foreclosures: Use vendor data on new residential properties in
foreclosure to aid analysis of data on new construction and sales.
Building Permits: Web scrape opportunity to access local jurisdictions and
state agencies posting public records online.

Construction:
–
–

3.

Difficulty obtaining electronic data from numerous state and local agencies
Data are needed immediately to tabulate the monthly economic indicators.

Retail Sales: Evaluating electronic payment processing to fill data gaps such
as geographical detail and revenue measures by firm size
– New data products
– Improvements to current data quality

12
Big Data Integration & Analysis: (Current processes)

Data Integration Expertise:
• Record linkage

– Gov’t Admin Records to other Gov’t Admin Records
– Gov’t Admin Records to Gov’t Surveys
– Commercial records to Gov’t Admin Records

• Model based integration

– Small Area Poverty & Income Estimates
– Small Area Health & Income Estimates
– Longitudinal Economic & Housing Dynamics

13
Big Data Integration & Analysis: Exploring “Now Casting”

Exploring “Now Casting” to improve Statistical Timeliness :
1.

Some “real time” Internet data correlates with Official Statistics:
– Google search data modeled to match BLS unemployment &
CDC Flu spread
– Univ. of Michigan Twitter unemployment
– MIT Billion Price Project match to BLS CPI

2.

Census experiments with Gov’t Pension data

14
Big Data Lab

1.

Setting up an experimental Cluster

2.

Testing performance of Hardware

3.

Testing value of Software
– Open Source Big Data Software:
Hadoop, Mahout, Distributed R, Hbase, Pig, Hive,
Casandra, Mongo, Flume, Neo4J, I-Graph,
Allegrograph

– Internally Developed software:
TEA, DataWeb, Matching software
On the Horizon, Development of Big Data Center
Research, capacity building and economic Big Data
Processing:

1.

Proposal to create a new center that will include members from academy and
Census staff to:
1. Help lead work Census Bureau on practices to make sense of Big Data.
Developing principles to apply Big Data to federal statistics.
2. Facilitate CB as unbiased provider for information collected as Big Data
3. Validate new techniques and data sources it at a low cost (field staff
allow us to do ground checks, survey questions)
4. Lead on methods to integrate Big Data and develop standards
5. The Center should provide a way to bring both faculty and graduate
students to Census to facilitate Big Data capacity building at the Census
Bureau

1.

We will explore partnerships with others doing research in this area.
Universities, and Silicon Valley

Weitere ähnliche Inhalte

Was ist angesagt?

Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsJOSEPH FRANCIS
 
how you can use data analytics
how you can use data analytics how you can use data analytics
how you can use data analytics Dan Bart
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
 
Use of Big Data in Government Sector
Use of Big Data in Government SectorUse of Big Data in Government Sector
Use of Big Data in Government Sectorijtsrd
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)Sonu Gupta
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Onyebuchi nosiri
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Onyebuchi nosiri
 
The Future of Big Data
The Future of Big Data The Future of Big Data
The Future of Big Data EMC
 
Lars Lyberg, Inizio: Rapport från konferensen BigSurv18
Lars Lyberg, Inizio: Rapport från konferensen BigSurv18Lars Lyberg, Inizio: Rapport från konferensen BigSurv18
Lars Lyberg, Inizio: Rapport från konferensen BigSurv18Alf Fyhrlund
 
151111 BASE ELN 151112 CIO Big Data Collaboration
151111 BASE ELN 151112 CIO Big Data Collaboration151111 BASE ELN 151112 CIO Big Data Collaboration
151111 BASE ELN 151112 CIO Big Data CollaborationDr. Bill Limond
 
Federal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp WestFederal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp Westbradstenger
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practicesPiyush Malik
 
Big Data Opportunities in Census Bureau Research
Big Data Opportunities in Census Bureau ResearchBig Data Opportunities in Census Bureau Research
Big Data Opportunities in Census Bureau ResearchSudip Bhattacharjee
 

Was ist angesagt? (20)

Big-Data-AryaTadbirNetworkDesigners
Big-Data-AryaTadbirNetworkDesignersBig-Data-AryaTadbirNetworkDesigners
Big-Data-AryaTadbirNetworkDesigners
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
 
how you can use data analytics
how you can use data analytics how you can use data analytics
how you can use data analytics
 
Sample
Sample Sample
Sample
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
Use of Big Data in Government Sector
Use of Big Data in Government SectorUse of Big Data in Government Sector
Use of Big Data in Government Sector
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
 
The Future of Big Data
The Future of Big Data The Future of Big Data
The Future of Big Data
 
Big data-analytics-ebook
Big data-analytics-ebookBig data-analytics-ebook
Big data-analytics-ebook
 
The 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big DataThe 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big Data
 
Lars Lyberg, Inizio: Rapport från konferensen BigSurv18
Lars Lyberg, Inizio: Rapport från konferensen BigSurv18Lars Lyberg, Inizio: Rapport från konferensen BigSurv18
Lars Lyberg, Inizio: Rapport från konferensen BigSurv18
 
151111 BASE ELN 151112 CIO Big Data Collaboration
151111 BASE ELN 151112 CIO Big Data Collaboration151111 BASE ELN 151112 CIO Big Data Collaboration
151111 BASE ELN 151112 CIO Big Data Collaboration
 
Federal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp WestFederal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp West
 
Bigdata Hadoop introduction
Bigdata Hadoop introductionBigdata Hadoop introduction
Bigdata Hadoop introduction
 
Bigdata
BigdataBigdata
Bigdata
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
 
Big Data Opportunities in Census Bureau Research
Big Data Opportunities in Census Bureau ResearchBig Data Opportunities in Census Bureau Research
Big Data Opportunities in Census Bureau Research
 

Ähnlich wie U.S. Census Bureau's Big Data Activities

Big data analytics and its impact on internet users
Big data analytics and its impact on internet usersBig data analytics and its impact on internet users
Big data analytics and its impact on internet usersStruggler Ever
 
exploit_big_data_v1
exploit_big_data_v1exploit_big_data_v1
exploit_big_data_v1Attila Barta
 
QuickView #3 - Big Data
QuickView #3 - Big DataQuickView #3 - Big Data
QuickView #3 - Big DataSonovate
 
What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?Vincenzo Patruno
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
 
Lecture 01-1-IIS.pptx
Lecture 01-1-IIS.pptxLecture 01-1-IIS.pptx
Lecture 01-1-IIS.pptxAsadkhan47384
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.saranya270513
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesT.S. Lim
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAkshata Humbe
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor networkparry prabhu
 
Implementation of application for huge data file transfer
Implementation of application for huge data file transferImplementation of application for huge data file transfer
Implementation of application for huge data file transferijwmn
 

Ähnlich wie U.S. Census Bureau's Big Data Activities (20)

Big data analytics and its impact on internet users
Big data analytics and its impact on internet usersBig data analytics and its impact on internet users
Big data analytics and its impact on internet users
 
exploit_big_data_v1
exploit_big_data_v1exploit_big_data_v1
exploit_big_data_v1
 
QuickView #3 - Big Data
QuickView #3 - Big DataQuickView #3 - Big Data
QuickView #3 - Big Data
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
Big data assignment
Big data assignmentBig data assignment
Big data assignment
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
Big Data Analytics (1).ppt
Big Data Analytics (1).pptBig Data Analytics (1).ppt
Big Data Analytics (1).ppt
 
Lecture 01-1-IIS.pptx
Lecture 01-1-IIS.pptxLecture 01-1-IIS.pptx
Lecture 01-1-IIS.pptx
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
wireless sensor network
wireless sensor networkwireless sensor network
wireless sensor network
 
Implementation of application for huge data file transfer
Implementation of application for huge data file transferImplementation of application for huge data file transfer
Implementation of application for huge data file transfer
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
IM seminor.pptx
IM seminor.pptxIM seminor.pptx
IM seminor.pptx
 
The Big Data Talent Gap
The Big Data Talent GapThe Big Data Talent Gap
The Big Data Talent Gap
 

Mehr von Micah Altman

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesMicah Altman
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 

Mehr von Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 

Kürzlich hochgeladen

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

U.S. Census Bureau's Big Data Activities

  • 1. Big Data activities at the U.S. Census Bureau Cavan Capps Big Data Lead U.S. Census Bureau February 13, 2014 Prepared for MIT Libraries Program on Information Science Brown Bag Talk Feb 2014
  • 2. Big Data Challenge at the Census Bureau “Designed Data” vs. “Organic Data” “The world is now producing large amounts of data.. data from Internet searches, credit card transactions, retail scanners, and social media”. “ There also are more and more digital administrative data (e.g., tax records, social security records, Medicare/Medicaid records, food stamp records, HUD records). Some of these data are not directly linked to the populations we study; some have item missing data problems; none offer a real replacement for our surveys, but many will be useful as auxiliary data sources.” 1
  • 3. Big Data Challenge at the Census Bureau Big Data is about creating information to make Big Decisions from novel, and often massive data sources. 2
  • 4. Big Data creates new Statistical Agency Challenges A recent meeting of International Statistical Agencies observed: 1. The volume of data generated outside the government statistical systems is increasing much faster than the volume of data collected by the statistical systems; almost all of these data are digitized in electronic files. 2. As this occurs, the leaders expect that relative cost, timeliness, and effectiveness of traditional survey and census approaches of the agencies may become less attractive. 3
  • 5. Big Data creates new Statistical Agency Challenges A recent meeting of International Statistical Agencies observed: 3. Blending together multiple available data sources (administrative, commercial electronic transactions and internet webpage data, search frequency data, twitter, facebook etc. ) with traditional surveys and censuses (using paper, telephone, face-toface interviewing) to create high quality, timely statistics that tell a coherent story of economic, social and environmental progress must become a major focus of central government statistical agencies. 4. This requires efficient record linkage capabilities, the building of master universe frames that act as core infrastructure to the blending of data sources, and the use of modern statistical modeling to combine data sources with highest accuracy. 4
  • 6. Big Data creates new Statistical Agency Challenges A recent meeting of International Statistical Agencies observed: 5. The Agencies will need to develop the analytical capabilities to distill insights from more integrated views of the world and impart a stronger systems view across different government and private sector information systems to provide more geographical and industry detail. 6. There are growing demands from researchers and policy-related organizations to analyze the micro-data collected by the agencies, to extract more timely and detailed information from the data. 5
  • 7. Big Data Development Challenges for Statistical Agencies The Meeting Recommended that Statistical Agencies develop: 1. 2. 3. 4. High-speed, “big data” software/hardware systems for record linkage and extraction of key information from massive files. Efficient and sophisticated imputations procedures needed to make the combined data sources jointly useful. More use of statistical modeling for statistical estimation, to provide more: 1. Timely estimates 2. Small area estimates 3. New measures New ways to give secure access to micro-data for legitimate policy and research purposes, to increase their impact of their work. 6
  • 8. In Summary, massive challenges for the Statistical Agencies: 1. The Internet and Private E-Transactions are generating data faster and more cheaply than Statistical agencies can afford to do. 2. To be reliable sources of information on the Demographics, Economy and Social change in the U.S., this information needs to be mashed together with traditional surveys and adjusted for bias. 3. The sizes of the files and the number of computations to mash up the data will be larger. 4. Spoiled by the Internet, users expect more timely, and detailed data provided at lower costs. 5. Privacy/Confidentiality must be maintained. 7
  • 9. Big Data Projects at the Census Bureau The Census Bureau “Big Data” Information Life Cycle Data Collection - Multi-Mode Data Survey Collection model - New Data sources (Web, E-Transactions, Admin Recs) Data Integration & Analysis - Record Linkage - Small Area Estimation modeling & “Now Casting” Data Release - Data Review for Release - Confidentialize data for public release 8
  • 10. Big Data Current Process Future Process (exploring) • Designed Data • Designed & Organic Data • Proprietary Software • Next Generation Open-Source & Proprietary Software • Batch Processing • More Parallel Processing • Long processing times • Faster processing times 9
  • 11. Big Data Collection: Improving Survey Logistics & Cost Improving Survey Collection and Imputation Operations(Adaptive Design) 1. Multi-modal data collection to reduce operational costs of data collection – More effective use of existing data such as administrative records – Incorporating new data into decennial operations • Paradata from Internet Data Capture • Information from Social Media Feeds 2. 3. Edits and Imputations Data Review 10
  • 12. Big Data Collection: Evaluating Web Data as Inputs Potential Internet Data Collection 1. 2. Examine Google & Bing search frequency trend data 3. Examine Twitter, and other social media trend data Examine “Web Scraping” of housing data, price data, local tax data, crime data, corporate profits etc. 11
  • 13. Big Data Collection : Evaluating Commercial E-Transaction Input Data 1. Housing: – – 2. Foreclosures: Use vendor data on new residential properties in foreclosure to aid analysis of data on new construction and sales. Building Permits: Web scrape opportunity to access local jurisdictions and state agencies posting public records online. Construction: – – 3. Difficulty obtaining electronic data from numerous state and local agencies Data are needed immediately to tabulate the monthly economic indicators. Retail Sales: Evaluating electronic payment processing to fill data gaps such as geographical detail and revenue measures by firm size – New data products – Improvements to current data quality 12
  • 14. Big Data Integration & Analysis: (Current processes) Data Integration Expertise: • Record linkage – Gov’t Admin Records to other Gov’t Admin Records – Gov’t Admin Records to Gov’t Surveys – Commercial records to Gov’t Admin Records • Model based integration – Small Area Poverty & Income Estimates – Small Area Health & Income Estimates – Longitudinal Economic & Housing Dynamics 13
  • 15. Big Data Integration & Analysis: Exploring “Now Casting” Exploring “Now Casting” to improve Statistical Timeliness : 1. Some “real time” Internet data correlates with Official Statistics: – Google search data modeled to match BLS unemployment & CDC Flu spread – Univ. of Michigan Twitter unemployment – MIT Billion Price Project match to BLS CPI 2. Census experiments with Gov’t Pension data 14
  • 16. Big Data Lab 1. Setting up an experimental Cluster 2. Testing performance of Hardware 3. Testing value of Software – Open Source Big Data Software: Hadoop, Mahout, Distributed R, Hbase, Pig, Hive, Casandra, Mongo, Flume, Neo4J, I-Graph, Allegrograph – Internally Developed software: TEA, DataWeb, Matching software
  • 17. On the Horizon, Development of Big Data Center Research, capacity building and economic Big Data Processing: 1. Proposal to create a new center that will include members from academy and Census staff to: 1. Help lead work Census Bureau on practices to make sense of Big Data. Developing principles to apply Big Data to federal statistics. 2. Facilitate CB as unbiased provider for information collected as Big Data 3. Validate new techniques and data sources it at a low cost (field staff allow us to do ground checks, survey questions) 4. Lead on methods to integrate Big Data and develop standards 5. The Center should provide a way to bring both faculty and graduate students to Census to facilitate Big Data capacity building at the Census Bureau 1. We will explore partnerships with others doing research in this area. Universities, and Silicon Valley

Hinweis der Redaktion

  1. This work by Cavan Capps <www.linkedin.com/pub/cavan-capps/12/201/523> is licensed under the Creative Commons Attribution-Share Alike 4.0 International License.To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.The use of Big Data isn’t new for the Census Bureau. We’ve been using administrative records, such as tax data, for decades to improve our collections. However, there is a new generation of Big Data – as the electronic environment flourishes – that we must keep up with. We must research ways to utilize these new data sources in our collections in order to increase efficiencies and to reduce costs and the time it takes to disseminate statistics. At the same time, we must also continue to maintain the quality of the official statistics. I’ll be addressing these aspects throughout my talk today. I was asked today to talk about two specific questions. I’ll address these questions broadly and then share some case studies of how Big Data is being used in our programs at the Census Bureau. I’ll also briefly touch on a Big Data source coming from the Census Bureau and ways the private sector could use our data in concert with Big Data.
  2. Bob Groves called traditional survey data specifically created to measure something “Designed data”The private sector maintains vast troves of transactional data, much of which is “data exhaust‟, or data created as a by-product of other transactions. With the use of mobile phones, much of this data can be associated with individuals and their locations. The public sector in most countries also maintains enormous datasets in the form of census data, health indicators, and tax and expenditure information. … The global internet is currently offering near real-time data on durable and nondurable goods prices, housing sales, and other relevant events. This data exhaust can also be termed “Organic data” which has its own strengths and weaknesses.The Census Bureau is the largest statistical agency in the U.S. Many of the Nation’s economic indicators and other critical socio-economic measures come from the Census Bureau. Similarly, much of the data we collect and process are critical inputs to major economic indicators and measures produced by other statistical organizations. We can not afford to ignore the opportunities offered by these new data sources and techniques.My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…
  3. The first question I was asked to consider is if the Census Bureau is working on Big Data projects and how these differ from other projects. My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…
  4. The first question I was asked to consider is if the Census Bureau is working on Big Data projects and how these differ from other projects. My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…
  5. The first question I was asked to consider is if the Census Bureau is working on Big Data projects and how these differ from other projects. My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…
  6. The first question I was asked to consider is if the Census Bureau is working on Big Data projects and how these differ from other projects. My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…
  7. The first question I was asked to consider is if the Census Bureau is working on Big Data projects and how these differ from other projects. My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…
  8. The first question I was asked to consider is if the Census Bureau is working on Big Data projects and how these differ from other projects. My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…
  9. Data Collection. For data collection, we are moving to a Multi-Mode Data Collection model (for survey and census data collection) that utilizes different collection modes based on survey response rate, quality, cost and several other factors to effectively collect data. We are architecting a Big Data environment that makes it easier to collect large volumes of data from various sources, integrate with internal and external sources of data, and make real-time decisions about effective collection modes.In terms of Data Analysis, we are researching Big Data methodological techniques, such as modeling or mashing (or integrating) together a variety of data sources, that allow us to work effectively with the Big Data. We’re also exploring technology solutions, such as High Performance and Distributed Computing Environments, to improve the effectiveness and speed of data analytics aided by better visualization techniques that incorporate geographic information. And for Data Release, we are exploring a using correlated “Big Data” sources to improve and speed data review and to test that the released data maintain privacy and confidentiality.
  10. Currently most of the Census statistical processing is based on designed surveys or designed measures from administrative data. Most of the processing is batch processing in SAS. Depending on the size of the data, processing times can be lengthy. Most speed improvements have been achieved by increasing the size of the machine.In the future as more data may be combined with various sources of organic as well as designed data, data sizes may grow rapidly. User expectations are also growing, expecting data to be released more timely, with more geographic, historical and industrial detail. The stress to deliver this information while maintaining strict confidentiality will explode. As a result new estimation and data processing paradigms are being explored.
  11. The first question I was asked to consider is if the Census Bureau is working on Big Data projects and how these differ from other projects. My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…
  12. The first question I was asked to consider is if the Census Bureau is working on Big Data projects and how these differ from other projects. My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…
  13. The use of alternative data sources such as administrative records or Big Data poses a number of opportunities for improving the current construction statistics produced by the Census Bureau and reducing data collection costs for these programs. For example:ForeclosuresData on residential properties in various stages of foreclosure could aid in our analysis of data on new residential construction and sales. These data are currently collected by a couple of private data vendors (for Bill’s info: CoreLogic and Realty Trac). The Census Bureau has purchased address-level files from a data vendor for analysis related to household surveys, but it did not easily allow for calculation of totals needed for analysis of national data. We also purchased annual totals by state from another vendor for use in data analysis; however, the vendor does not allow purchasers to disseminate data to the public.Manufactured HomesCensus conducts the Manufactured Housing Survey (MHS) for the U.S. Department of Housing and Urban Development, or HUD, to provide data that they are required to collect on manufactured home placements. By law, manufactured homes must be inspected at the factory. These inspections are conducted by the Institute for Building Technology and Safety (IBTS), which provides information on the inspections that becomes the universe and sampling frame for the Manufactured Housing Survey. If we could partner withHUD and IBTS as well as manufactured home manufacturers and dealers to follow through on the inspection forms to collect information on the placement of the home, we could use this information to tabulate data on placements. The data would have no sampling error and data collection costs would be drastically reduced.Public ConstructionOur estimates of construction spending include spending on construction funded by federal, state, and local governments, collected using voluntary surveys. Much of the information on government spending can be gleaned from publicly available budget documents. We do this to supplement and benchmark the data that we collect, but we could partner with government agencies that conduct construction (especially at the federal level) to obtain data files that would reduce our data collection costs and improve data quality. We have contacts at most agencies, but we have not yet undertaken a concerted effort to obtain the detailed electronic files that we need. Property OwnersCensus Bureau surveys collect information from homeowners on owner-occupied properties. Data on non-owner-occupied properties are more difficult to obtain because the owner of the property must be located. Various administrative sources such as the Business Register, tax data, and local deed records could provide information on property owners and their individual properties. Data on improvements to non-owner-occupied properties are no longer included in the construction spending estimates because the cost of finding the owners was prohibitive. The Residential Housing Finance Survey (RHFS, a HUD-sponsored survey) had the same problem. Reducing the cost could make it feasible to improve the construction spending estimates and would allow Census to conduct other surveys more cost effectively. However, startup costs to create an up-to-date list of property owners could be significant.Building PermitsThe largest opportunity for using administrative records for the construction area is data on building permits issued by local governments. Issuance of building permits in the U.S. is mostly at the local level, where approximately 20,000 unique jurisdictions issue permits. Some states are capturing data on all permits issued in their states, but this is not as prevalent.Building Permits SurveyCensus conducts a monthly and annual Building Permits Survey (BPS) to obtain data on the numbers of new housing units authorized from local jurisdictions. Because of cost and respondent burden concerns, data on nonresidential permits and permits for alterations and repairs are not collected.As more and more jurisdictions computerize their operations and more states begin compiling permit data from their jurisdictions, we have the opportunity to capture individual permits (which are public records) for use in our estimates. Information on individual new residential permits could replace the current Building Permits Survey data collection, and it also has tremendous potential for updating the Master Address File used for many household surveys and for the decennial Census. Staff working on this survey are partnering with colleagues in the Census Bureau’s Geography Division to encourage local governments to work toward providing files of permits. Lists of individual permits would also greatly improve the annual population estimates, which currently rely on the use of statistical algorithms to allocate the Building Permits Survey jurisdiction totals to more local areas. Survey of ConstructionThe Survey of Construction (SOC), which collects data on housing starts and new home sales, requires field representatives to list individual permits in a sample of jurisdictions to create the sampling frame. Use of individual permits received from jurisdictions could eliminate this expensive operation. Use of Certificate of Occupancy permits would also eliminate the need to follow up cases in sample until the building is completed. This would reduce the cost of interviewing by about one-third and save up to $1 million per year.To collect data on spending on nonresidential construction, we currently purchase a list of new projects from a third party vendor each month. This list is incomplete and expensive. If we could acquire data on nonresidential permits from local jurisdictions, it would be much less expensive and more complete.There are many opportunities when looking at Big Data for use in official construction statistics. There are also many challenges. We have had discussions with our government counterparts about how we could assist governments with automation and with standardizing the format of their data files, but jurisdictions have local regulations and custom computer systems that make standardization challenging. Likewise, these surveys are voluntary, so obtaining all permits in the U.S. would not be feasible without legislative changes. An iterative approach would be needed, starting with obtaining information from large jurisdictions with automated systems that are willing to participate.
  14. The first question I was asked to consider is if the Census Bureau is working on Big Data projects and how these differ from other projects. My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…
  15. The first question I was asked to consider is if the Census Bureau is working on Big Data projects and how these differ from other projects. My initial response is a resounding, yes, the Census Bureau is incorporating Big Data solutions to improve the efficiency of its operations throughout the information lifecycle. We are exploring new sources of data and processing techniques to improve our products and increase the efficiency of our operations. The intent is for these project to result in enterprise-wide solutions that support all surveys and census operations across the Census Bureau. This is different from our current processes and technology that are developed to support individual surveys and census operations.Examples of these efforts include…