SlideShare a Scribd company logo
1 of 45
Download to read offline
F R O M D ATA P O I N T S T O
D ATA L A K E S
O P E N D A TA
D R J R O G E L - S A L A Z A R
I M P E R I A L C O L L E G E L O N D O N
A N D U N I V E R S I T Y O F H E R T F O R D S H I R E
J . R O G E L @ P H Y S I C S . O R G
@ Q U A N T U M _ T U N N E L / @ H I D D E N _ N O D E
U S E
# D I A L O G O _ O P E N D ATA
S O C I A L M E D I A
D ATA ?
L E T ’ S S TA R T A T T H E B E G I N N I N G
D ATA
I N T E R C O N N E C T E D
K N O W L E D G E
K N O W L E D G E
L I N K E D
I N F O R M AT I O N
I N F O R M AT I O N
S T R U C T U R E D D ATA
D ATA E V E RY W H E R E !
• Lots of data is being collected 

and warehoused
• Scientific studies
• Web data, e-commerce
• Purchases at department/

grocery stores
• Bank/Credit card 

transactions
• Social network
H O W M U C H D ATA ?
• Google processes 100 PB a day (2014)
• Facebook 600 TB/day (2014)
• Twitter 100 TB/day (2013/14)
• CERN’s Large Hydron Collider (LHC) generates 15
PB a year
640K ought to be
enough for anybody.
Source: https://followthedata.wordpress.com/2014/06/24/data-size-estimates/
Maximilien Brice, © CERN
T H E E A R T H S C O P E
•The Earthscope is also a large
science project. Designed to
track North America's
geological evolution, this
observatory records data
over 3.8 million square miles,
amassing 67 terabytes of
data. It analyses seismic slips
in the San Andreas fault, sure,
but also the plume of magma
underneath Yellowstone and
much, much more.
1.
http://www.msnbc.msn.com/id/44363598/ns/technology_and_science-future_of_technology/#.TmetOdQ--uI
T Y P E O F D ATA
• Relational Data (Tables/
Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic
Web (RDF), …
• Streaming Data
• You can only scan the
data once
W H AT T O D O W I T H
T H E S E D ATA ?
• Aggregation and Statistics
• Data warehouse and
OLAP
• Indexing, Searching, and
Querying
• Keyword based search
• Pattern matching (XML/
RDF)
• Knowledge discovery
• Data Mining
• Statistical Modeling
T H E D ATA
• Fundamental to research
• Basis for writing papers
• Important for experiment
replication
• Meet contractual/funding
requirements
• Settle intellectual property
claims
• Defense against a charge of
fraud
Images from the front covers of Circulation Research – S. Elliott (Van Eyk Lab)
I N D I V I D U A L R E S P O N S I B I L I T Y 

D ATA M A N A G E M E N T
Some aspects to consider:
• Ownership
• Collection
• Storage/protection of confidentiality/sharing
Interpretation and publication
W H AT I S
C O P Y R I G H T
?
- U S C O N S T I T U T I O N
“To promote the progress of science and useful
arts, by securing for limited times to authors and
inventors the exclusive right to their respective
writings and discoveries.”
N O T A T O O L
T O C O N T R O L
A L L C O N T E N T
F O R E V E R I N
A L L M E D I A
A S E T O F R I G H T S
• The right to reproduce the work
• The right to prepare derivative works
• The right to distribute the work
• The right to perform the work
• The right to display the work
• The right to license any of the above to third parties
H O W ?
First, it must meet some basic requirements:
• It must be original.
• It must have some level of creativity.
• It must be in a fixed medium.
In the old-days, you would use this symbol:
Provide a date and register it.
I T ’ S I N S TA N T !
N O WA D A Y S
Copyright protects…
Writing
Choreography
Music
Visual art
Film
Architectural works
Copyright doesn’t protect…
Ideas
Facts
Data (mostly)
Useful articles (that’s patent)
H O W L O N G
D O E S I T L A S T ?
F O R N O W…
The life of the author
plus 70 years
And then?
THE PUBLIC DOMAIN
G E N E R A L R U L E S F O R S TAT U S
Works No Longer Protected by Copyright
• Published before 1923
• Published between '23 and '63, but it depends.
• Authored by the Federal Government (US)
V E R B O S E M O D E …
• All works published in the United States before 1923 are in the public domain.
• Works published after 1922, but before 1978 are protected for 95 years from the date
of publication. If the work was created, but not published, before 1978, the copyright
lasts for the life of the author plus 70 years. However, even if the author died over 70
years ago, the copyright in an unpublished work lasts until December 31, 2002.
• For works published after 1977, the copyright lasts for the life of the author plus 70
years. However, if the work is a work for hire (that is, the work is done in the course of
employment or has been specifically commissioned) or is published anonymously or
under a pseudonym, the copyright lasts between 95 and 120 years, depending on the
date the work is published.
• Lastly, if the work was published between 1923 and 1963, you must check with the U.S.
Copyright Office to see whether the copyright was properly renewed. If the author
failed to renew the copyright, the work has fallen into the public domain and you may
use it.
C O N F U S E D ?
Hard to share
W H Y S H A R E ?
A L L O F T H E M C A N …
A N D S H O U L D B E
S H A R E D !
A L L O F T H E M W H E R E B U I LT U P O N
O T H E R P E O P L E ’ S W O R K
W H Y ?
T E R R E N C E TA O B L O G
H T T P S : / / T E R RY TA O . W O R D P R E S S . C O M
K A G G L E
H T T P S : / / W W W. K A G G L E . C O M / C O M P E T I T I O N S
Public
Domain
All
Rights
Reserve
d
least
restrictive
most
restrictive
A SPECTRUM OF RIGHTS
W H AT I S O P E N
D ATA ?
S O …
Open data is information that is available for
anyone to use, for any purpose, at no cost.
G O O D O P E N D ATA
• Can be linked: shared more easily
• Available in a standard format: easily processed
• Guaranteed availability and consistency: easily reliable
• Traceable: easily trusted
F R O M O P E N
A C C E S S T O O P E N
D ATA
O V E R L E A F
H T T P S : / / W W W. O V E R L E A F. C O M
D RYA D
H T T P : / / D A TA D RYA D . O R G
D ATAV E R S E
H T T P S : / / D A TA V E R S E . H A R VA R D . E D U
D ATA G O V U K
H T T P : / / D A TA . G O V. U K
D ATA G O V U S
H T T P : / / W W W. D A TA . G O V
D AT O S G O B M E X
H T T P : / / D A T O S . G O B . M X
A N Y B I G I N S T I T U T I O N
C O U L D P U B L I S H O P E N D ATA
A N Y O N E E L S E ?
T H E G U A R D I A N
H T T P : / / W W W. T H E G U A R D I A N . C O M / N E W S / D A TA B L O G /
O P E N D ATA 5 0 0
H T T P : / / W W W. O P E N D A TA 5 0 0 . C O M
F I G S H A R E
H T T P : / / F I G S H A R E . C O M
U C I M A C H I N E L E A R N I N G
H T T P : / / A R C H I V E . I C S . U C I . E D U / M L /
T R A N S P O R T F O R L O N D O N
H T T P S : / / T F L . G O V. U K / I N F O - F O R / O P E N - D A TA - U S E R S /
W H AT C A N W E
D O W I T H I T ?
W H AT C A N W E D O W I T H I T ?
W H E R E T O F I N D
O P E N D ATA ?
D ATA H U B
H T T P : / / D A TA H U B . I O
F I G S H A R E
H T T P : / / F I G S H A R E . C O M
R E G I S T E R O F D ATA R E P O S
H T T P : / / W W W. R E 3 D A TA . O R G
D ATA B I B
H T T P : / / D A TA B I B . O R G
D ATA C I T E
H T T P S : / / W W W. D A TA C I T E . O R G
O P E N D O A R
H T T P : / / W W W. O P E N D O A R . O R G
C K A N
H T T P : / / C K A N . O R G
G I T H U B
H T T P S : / / G I T H U B . C O M
G I T H U B - D ATA D RYA D
H T T P S : / / G I T H U B . C O M / D A TA D RYA D
S P I R A L - I M P E R I A L C O L L E G E
H T T P S : / / S P I R A L . I M P E R I A L . A C . U K
D S PA C E
H T T P : / / W W W. D S PA C E . O R G
S O U N D S
G O O D …
N O W
W H AT ?
S I M P L E
G U I D E L I N E S
3 T H I N G S
• Keep it simple
• Engage early and often
• Address common fears
and misunderstandings
4 S T E P S
• Choose your dataset(s)
• Licensing
• Make the data available
• Make it discoverable
D ATA S E T S
• Asking the community
• Cost basis
• Ease of release
• Observe peers
L I C E N S I N G
Data that doesn’t explicitly have an open license is
NOT open data
C O P Y R I G H T O V E R
W O R K S Y O U C R E AT E
A N D A R E O R I G I N A L
T O Y O U .
D ATA B A S E R I G H T
O V E R C O L L E C T I O N S
O F D ATA Y O U H AV E
P U T A S U B S TA N T I A L
E F F O R T I N T O
O B TA I N I N G ,
V E R I F Y I N G O R
P R E S E N T I N G ( O N LY
E U , M E X I C O , B R A Z I L )
O W N E R S H I P
C R E AT I V E C O M M O N S L I C E N S I N G
K N O W T H E
T Y P E S ! !
O P E N D ATA C O M M O N S
H T T P : / / O P E N D A TA C O M M O N S . O R G
AVA I L A B I L I T Y
• Data should be complete
• In a (open) machine-
readable format
• It should contain metadata
H O W ?
• Your website
• Existing repositories
• Creating your own repository
M A K E I T D I S C O V E R A B L E
• Publish it in Public services (Datahub)
• Index it in Catalog (Databib)
• Promote it in your community
• Engage with users
D ATA S C I E N C E A N D
D ATA L A K E S
W H AT I S D ATA S C I E N C E
A set of tools and techniques used to extract useful
information from data.
An interdisciplinary, problem-oriented subject.
T H E I N G R E D I E N T S O F A D ATA
S C I E N T I S T
O N E M O R E
T H I N G !
C O M M U N I C AT I O N
S K I L L S
W H AT I S I T ?
D A TA L A K E
Analytics
DW
Hadoop
Y O U R
T H O U G H T S ?
T H A N K S !
D R J R O G E L - S A L A Z A R
J . R O G E L @ P H Y S I C S . O R G
@ Q U A N T U M _ T U N N E L / @ H I D D E N _ N O D E

More Related Content

What's hot

Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01
Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01
Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01Colleen Sedgwick
 
WordPress in Higher Education
WordPress in Higher EducationWordPress in Higher Education
WordPress in Higher EducationShane Pearlman
 
Prepare for the Future: Tech Strategies You Need to Know (Part 1)
Prepare for the Future: Tech Strategies You Need to Know (Part 1)Prepare for the Future: Tech Strategies You Need to Know (Part 1)
Prepare for the Future: Tech Strategies You Need to Know (Part 1)ALATechSource
 
Introduction to Creative Commons
Introduction to Creative CommonsIntroduction to Creative Commons
Introduction to Creative CommonsAndres Guadamuz
 
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Jonathon Hare
 
SEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsSEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsJonathon Hare
 
Drupal Decoupled on ARTE
Drupal Decoupled on ARTEDrupal Decoupled on ARTE
Drupal Decoupled on ARTEjolidog
 
Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)Andrea Volpini
 
Tech rfp template
Tech rfp templateTech rfp template
Tech rfp templateAnna Duin
 
Public Sector Social Media Innovation
Public Sector Social Media Innovation Public Sector Social Media Innovation
Public Sector Social Media Innovation Dustin Haisler
 
RadioActive Europe - Presentation at PLE-Conference 2014
RadioActive Europe - Presentation at PLE-Conference 2014 RadioActive Europe - Presentation at PLE-Conference 2014
RadioActive Europe - Presentation at PLE-Conference 2014 Andreas Auwärter
 
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web Design
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web DesignMaking Peace: Resolving the Content/ UX Tug-of-War in Responsive Web Design
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web DesignJenny Magic
 
CIA For WordPress Developers
CIA For WordPress DevelopersCIA For WordPress Developers
CIA For WordPress DevelopersDavid Brumbaugh
 
INCLUSIVE TRADE: THE RISE OF FAB LABS
INCLUSIVE TRADE: THE RISE OF FAB LABSINCLUSIVE TRADE: THE RISE OF FAB LABS
INCLUSIVE TRADE: THE RISE OF FAB LABSMartina F. Ferracane
 
Open Data & Health: food for thoughts
Open Data & Health: food for thoughtsOpen Data & Health: food for thoughts
Open Data & Health: food for thoughtsMatteo Brunati
 
Social media creates social power
Social media creates social powerSocial media creates social power
Social media creates social powerStephanie G. Brooks
 

What's hot (20)

Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01
Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01
Sedgwick e0498336-d0105-sp7-module 01-30537a-assessment 01
 
WordPress in Higher Education
WordPress in Higher EducationWordPress in Higher Education
WordPress in Higher Education
 
Prepare for the Future: Tech Strategies You Need to Know (Part 1)
Prepare for the Future: Tech Strategies You Need to Know (Part 1)Prepare for the Future: Tech Strategies You Need to Know (Part 1)
Prepare for the Future: Tech Strategies You Need to Know (Part 1)
 
Introduction to Creative Commons
Introduction to Creative CommonsIntroduction to Creative Commons
Introduction to Creative Commons
 
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
Mining Events from Multimedia Streams (WAIS Research group seminar June 2014)
 
SEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia StreamsSEWM'14 keynote: Mining Events from Multimedia Streams
SEWM'14 keynote: Mining Events from Multimedia Streams
 
Drupal Decoupled on ARTE
Drupal Decoupled on ARTEDrupal Decoupled on ARTE
Drupal Decoupled on ARTE
 
Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)
 
Tech rfp template
Tech rfp templateTech rfp template
Tech rfp template
 
Public Sector Social Media Innovation
Public Sector Social Media Innovation Public Sector Social Media Innovation
Public Sector Social Media Innovation
 
Event Planning & Trends: Design, Technology & F&B
Event Planning & Trends: Design, Technology & F&BEvent Planning & Trends: Design, Technology & F&B
Event Planning & Trends: Design, Technology & F&B
 
South Africa & Data Flows
South Africa & Data FlowsSouth Africa & Data Flows
South Africa & Data Flows
 
RadioActive Europe - Presentation at PLE-Conference 2014
RadioActive Europe - Presentation at PLE-Conference 2014 RadioActive Europe - Presentation at PLE-Conference 2014
RadioActive Europe - Presentation at PLE-Conference 2014
 
Help Ukraine
Help UkraineHelp Ukraine
Help Ukraine
 
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web Design
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web DesignMaking Peace: Resolving the Content/ UX Tug-of-War in Responsive Web Design
Making Peace: Resolving the Content/ UX Tug-of-War in Responsive Web Design
 
CIA For WordPress Developers
CIA For WordPress DevelopersCIA For WordPress Developers
CIA For WordPress Developers
 
INCLUSIVE TRADE: THE RISE OF FAB LABS
INCLUSIVE TRADE: THE RISE OF FAB LABSINCLUSIVE TRADE: THE RISE OF FAB LABS
INCLUSIVE TRADE: THE RISE OF FAB LABS
 
Open Data & Health: food for thoughts
Open Data & Health: food for thoughtsOpen Data & Health: food for thoughts
Open Data & Health: food for thoughts
 
Social media creates social power
Social media creates social powerSocial media creates social power
Social media creates social power
 
Intro to Hackathons (Winter 2015)
Intro to Hackathons (Winter 2015)Intro to Hackathons (Winter 2015)
Intro to Hackathons (Winter 2015)
 

Viewers also liked

Psikologi modul 3 kb 5
Psikologi modul 3 kb 5Psikologi modul 3 kb 5
Psikologi modul 3 kb 5Uwes Chaeruman
 
Trabalho de fotografia
Trabalho de fotografiaTrabalho de fotografia
Trabalho de fotografiacarolinarosa24
 
RedHawk Certificate
RedHawk CertificateRedHawk Certificate
RedHawk CertificateRaja Sekhar
 
MCA certificate
MCA certificateMCA certificate
MCA certificatealfianwira
 
Collaboration with arctic cruise i norway - profile your company
Collaboration with arctic cruise i norway  - profile your company Collaboration with arctic cruise i norway  - profile your company
Collaboration with arctic cruise i norway - profile your company Arctic Cruise In Norway AS
 
PPAR alpha final research paper
PPAR alpha final research paperPPAR alpha final research paper
PPAR alpha final research paperEric Cobb
 
13 algo pencocokankurva
13 algo pencocokankurva13 algo pencocokankurva
13 algo pencocokankurvaArif Rahman
 
Shell Presentation
Shell Presentation Shell Presentation
Shell Presentation aya.samy
 
Design Thinking - A Short Workshop
Design Thinking - A Short Workshop Design Thinking - A Short Workshop
Design Thinking - A Short Workshop Innomantra
 
Project front page, index, certificate, and acknowledgement
Project front page, index, certificate, and acknowledgementProject front page, index, certificate, and acknowledgement
Project front page, index, certificate, and acknowledgementAnupam Narang
 

Viewers also liked (15)

Psikologi modul 3 kb 5
Psikologi modul 3 kb 5Psikologi modul 3 kb 5
Psikologi modul 3 kb 5
 
Armazém geral icms sp
Armazém geral icms spArmazém geral icms sp
Armazém geral icms sp
 
Trabalho de fotografia
Trabalho de fotografiaTrabalho de fotografia
Trabalho de fotografia
 
Barcelona
BarcelonaBarcelona
Barcelona
 
RedHawk Certificate
RedHawk CertificateRedHawk Certificate
RedHawk Certificate
 
MCA certificate
MCA certificateMCA certificate
MCA certificate
 
Collaboration with arctic cruise i norway - profile your company
Collaboration with arctic cruise i norway  - profile your company Collaboration with arctic cruise i norway  - profile your company
Collaboration with arctic cruise i norway - profile your company
 
C.v. ing. mecanico electricista 3
C.v. ing. mecanico electricista 3C.v. ing. mecanico electricista 3
C.v. ing. mecanico electricista 3
 
PPAR alpha final research paper
PPAR alpha final research paperPPAR alpha final research paper
PPAR alpha final research paper
 
13 algo pencocokankurva
13 algo pencocokankurva13 algo pencocokankurva
13 algo pencocokankurva
 
Shell Presentation
Shell Presentation Shell Presentation
Shell Presentation
 
Biliary Disease
Biliary DiseaseBiliary Disease
Biliary Disease
 
Design Thinking - A Short Workshop
Design Thinking - A Short Workshop Design Thinking - A Short Workshop
Design Thinking - A Short Workshop
 
Visys: a arte de vender
Visys: a arte de venderVisys: a arte de vender
Visys: a arte de vender
 
Project front page, index, certificate, and acknowledgement
Project front page, index, certificate, and acknowledgementProject front page, index, certificate, and acknowledgement
Project front page, index, certificate, and acknowledgement
 

Similar to From Data Points to Data Lakes

Reach Out to Research (R2R) Bergen #uhbib2015
Reach Out to Research (R2R) Bergen #uhbib2015Reach Out to Research (R2R) Bergen #uhbib2015
Reach Out to Research (R2R) Bergen #uhbib2015Guus van den Brekel
 
20151015 earthsimulationoceanusoct
20151015 earthsimulationoceanusoct20151015 earthsimulationoceanusoct
20151015 earthsimulationoceanusoctAnselm Hook
 
Practical Approaches to Managing International Development Projects in the Fa...
Practical Approaches to Managing International Development Projects in the Fa...Practical Approaches to Managing International Development Projects in the Fa...
Practical Approaches to Managing International Development Projects in the Fa...Emanuel Souvairan
 
Listening To a Forest for Project Health
Listening To a Forest for Project HealthListening To a Forest for Project Health
Listening To a Forest for Project HealthShelley Lambert
 
ALA Midwinter 2015 Tech Wrap-Up: Griffey Slides
ALA Midwinter 2015 Tech Wrap-Up: Griffey SlidesALA Midwinter 2015 Tech Wrap-Up: Griffey Slides
ALA Midwinter 2015 Tech Wrap-Up: Griffey SlidesALATechSource
 
Learning online 030215
Learning online 030215Learning online 030215
Learning online 030215Sanna Brauer
 
Accidental Ecologist
Accidental EcologistAccidental Ecologist
Accidental EcologistLaura James
 
Hackathon Tips and Tricks
Hackathon Tips and TricksHackathon Tips and Tricks
Hackathon Tips and TricksDaniel Duan
 
Fab Labs: a global network for local entrepreneurship
Fab Labs: a global network for local entrepreneurshipFab Labs: a global network for local entrepreneurship
Fab Labs: a global network for local entrepreneurshipMartina F. Ferracane
 
UX in E-commerce & Conversion
UX in E-commerce & ConversionUX in E-commerce & Conversion
UX in E-commerce & ConversionElymar Apao
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsRonald Ashri
 
Mobile UX - Things to consider
Mobile UX - Things to considerMobile UX - Things to consider
Mobile UX - Things to considerRichard Hewitt
 
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015Toan Bach Quang Bao
 
Hard to Reach Users in Easy to Reach Places
Hard to Reach Users in Easy to Reach PlacesHard to Reach Users in Easy to Reach Places
Hard to Reach Users in Easy to Reach PlacesMike Crabb
 
Final presentation
Final presentationFinal presentation
Final presentationJP Day
 
FinalPresentation_JPD
FinalPresentation_JPDFinalPresentation_JPD
FinalPresentation_JPDJP Day
 
Multimedia information and Media
Multimedia information and MediaMultimedia information and Media
Multimedia information and MediaJalen Rebolledo
 
multimediainfomediac17-180302055121.pdf
multimediainfomediac17-180302055121.pdfmultimediainfomediac17-180302055121.pdf
multimediainfomediac17-180302055121.pdfClaesTrinio
 
2014 Power of Habits - Best Marketing BIZ 21 mai 2014
2014 Power of Habits - Best Marketing BIZ 21 mai 20142014 Power of Habits - Best Marketing BIZ 21 mai 2014
2014 Power of Habits - Best Marketing BIZ 21 mai 2014Julian Walder
 

Similar to From Data Points to Data Lakes (20)

Reach Out to Research (R2R) Bergen #uhbib2015
Reach Out to Research (R2R) Bergen #uhbib2015Reach Out to Research (R2R) Bergen #uhbib2015
Reach Out to Research (R2R) Bergen #uhbib2015
 
20151015 earthsimulationoceanusoct
20151015 earthsimulationoceanusoct20151015 earthsimulationoceanusoct
20151015 earthsimulationoceanusoct
 
Practical Approaches to Managing International Development Projects in the Fa...
Practical Approaches to Managing International Development Projects in the Fa...Practical Approaches to Managing International Development Projects in the Fa...
Practical Approaches to Managing International Development Projects in the Fa...
 
Listening To a Forest for Project Health
Listening To a Forest for Project HealthListening To a Forest for Project Health
Listening To a Forest for Project Health
 
ALA Midwinter 2015 Tech Wrap-Up: Griffey Slides
ALA Midwinter 2015 Tech Wrap-Up: Griffey SlidesALA Midwinter 2015 Tech Wrap-Up: Griffey Slides
ALA Midwinter 2015 Tech Wrap-Up: Griffey Slides
 
Learning online 030215
Learning online 030215Learning online 030215
Learning online 030215
 
Accidental Ecologist
Accidental EcologistAccidental Ecologist
Accidental Ecologist
 
ACM Teach - Hackathon Tips and Tricks - Spring 2014
ACM Teach - Hackathon Tips and Tricks - Spring 2014ACM Teach - Hackathon Tips and Tricks - Spring 2014
ACM Teach - Hackathon Tips and Tricks - Spring 2014
 
Hackathon Tips and Tricks
Hackathon Tips and TricksHackathon Tips and Tricks
Hackathon Tips and Tricks
 
Fab Labs: a global network for local entrepreneurship
Fab Labs: a global network for local entrepreneurshipFab Labs: a global network for local entrepreneurship
Fab Labs: a global network for local entrepreneurship
 
UX in E-commerce & Conversion
UX in E-commerce & ConversionUX in E-commerce & Conversion
UX in E-commerce & Conversion
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the Dots
 
Mobile UX - Things to consider
Mobile UX - Things to considerMobile UX - Things to consider
Mobile UX - Things to consider
 
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015
GENSummit - WEARABLE NEWS BEYOND THE GADGETS 2015
 
Hard to Reach Users in Easy to Reach Places
Hard to Reach Users in Easy to Reach PlacesHard to Reach Users in Easy to Reach Places
Hard to Reach Users in Easy to Reach Places
 
Final presentation
Final presentationFinal presentation
Final presentation
 
FinalPresentation_JPD
FinalPresentation_JPDFinalPresentation_JPD
FinalPresentation_JPD
 
Multimedia information and Media
Multimedia information and MediaMultimedia information and Media
Multimedia information and Media
 
multimediainfomediac17-180302055121.pdf
multimediainfomediac17-180302055121.pdfmultimediainfomediac17-180302055121.pdf
multimediainfomediac17-180302055121.pdf
 
2014 Power of Habits - Best Marketing BIZ 21 mai 2014
2014 Power of Habits - Best Marketing BIZ 21 mai 20142014 Power of Habits - Best Marketing BIZ 21 mai 2014
2014 Power of Habits - Best Marketing BIZ 21 mai 2014
 

Recently uploaded

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

From Data Points to Data Lakes

  • 1. F R O M D ATA P O I N T S T O D ATA L A K E S O P E N D A TA D R J R O G E L - S A L A Z A R I M P E R I A L C O L L E G E L O N D O N A N D U N I V E R S I T Y O F H E R T F O R D S H I R E J . R O G E L @ P H Y S I C S . O R G @ Q U A N T U M _ T U N N E L / @ H I D D E N _ N O D E U S E # D I A L O G O _ O P E N D ATA S O C I A L M E D I A
  • 2. D ATA ? L E T ’ S S TA R T A T T H E B E G I N N I N G D ATA I N T E R C O N N E C T E D K N O W L E D G E K N O W L E D G E L I N K E D I N F O R M AT I O N I N F O R M AT I O N S T R U C T U R E D D ATA
  • 3. D ATA E V E RY W H E R E ! • Lots of data is being collected 
 and warehoused • Scientific studies • Web data, e-commerce • Purchases at department/
 grocery stores • Bank/Credit card 
 transactions • Social network H O W M U C H D ATA ? • Google processes 100 PB a day (2014) • Facebook 600 TB/day (2014) • Twitter 100 TB/day (2013/14) • CERN’s Large Hydron Collider (LHC) generates 15 PB a year 640K ought to be enough for anybody. Source: https://followthedata.wordpress.com/2014/06/24/data-size-estimates/
  • 4. Maximilien Brice, © CERN T H E E A R T H S C O P E •The Earthscope is also a large science project. Designed to track North America's geological evolution, this observatory records data over 3.8 million square miles, amassing 67 terabytes of data. It analyses seismic slips in the San Andreas fault, sure, but also the plume of magma underneath Yellowstone and much, much more. 1. http://www.msnbc.msn.com/id/44363598/ns/technology_and_science-future_of_technology/#.TmetOdQ--uI
  • 5. T Y P E O F D ATA • Relational Data (Tables/ Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data • Social Network, Semantic Web (RDF), … • Streaming Data • You can only scan the data once W H AT T O D O W I T H T H E S E D ATA ? • Aggregation and Statistics • Data warehouse and OLAP • Indexing, Searching, and Querying • Keyword based search • Pattern matching (XML/ RDF) • Knowledge discovery • Data Mining • Statistical Modeling
  • 6. T H E D ATA • Fundamental to research • Basis for writing papers • Important for experiment replication • Meet contractual/funding requirements • Settle intellectual property claims • Defense against a charge of fraud Images from the front covers of Circulation Research – S. Elliott (Van Eyk Lab) I N D I V I D U A L R E S P O N S I B I L I T Y 
 D ATA M A N A G E M E N T Some aspects to consider: • Ownership • Collection • Storage/protection of confidentiality/sharing Interpretation and publication
  • 7. W H AT I S C O P Y R I G H T ? - U S C O N S T I T U T I O N “To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.”
  • 8. N O T A T O O L T O C O N T R O L A L L C O N T E N T F O R E V E R I N A L L M E D I A A S E T O F R I G H T S • The right to reproduce the work • The right to prepare derivative works • The right to distribute the work • The right to perform the work • The right to display the work • The right to license any of the above to third parties
  • 9. H O W ? First, it must meet some basic requirements: • It must be original. • It must have some level of creativity. • It must be in a fixed medium. In the old-days, you would use this symbol: Provide a date and register it. I T ’ S I N S TA N T ! N O WA D A Y S
  • 10. Copyright protects… Writing Choreography Music Visual art Film Architectural works Copyright doesn’t protect… Ideas Facts Data (mostly) Useful articles (that’s patent) H O W L O N G D O E S I T L A S T ?
  • 11. F O R N O W… The life of the author plus 70 years And then? THE PUBLIC DOMAIN G E N E R A L R U L E S F O R S TAT U S Works No Longer Protected by Copyright • Published before 1923 • Published between '23 and '63, but it depends. • Authored by the Federal Government (US)
  • 12. V E R B O S E M O D E … • All works published in the United States before 1923 are in the public domain. • Works published after 1922, but before 1978 are protected for 95 years from the date of publication. If the work was created, but not published, before 1978, the copyright lasts for the life of the author plus 70 years. However, even if the author died over 70 years ago, the copyright in an unpublished work lasts until December 31, 2002. • For works published after 1977, the copyright lasts for the life of the author plus 70 years. However, if the work is a work for hire (that is, the work is done in the course of employment or has been specifically commissioned) or is published anonymously or under a pseudonym, the copyright lasts between 95 and 120 years, depending on the date the work is published. • Lastly, if the work was published between 1923 and 1963, you must check with the U.S. Copyright Office to see whether the copyright was properly renewed. If the author failed to renew the copyright, the work has fallen into the public domain and you may use it. C O N F U S E D ?
  • 13. Hard to share W H Y S H A R E ?
  • 14.
  • 15. A L L O F T H E M C A N … A N D S H O U L D B E S H A R E D ! A L L O F T H E M W H E R E B U I LT U P O N O T H E R P E O P L E ’ S W O R K W H Y ?
  • 16. T E R R E N C E TA O B L O G H T T P S : / / T E R RY TA O . W O R D P R E S S . C O M
  • 17. K A G G L E H T T P S : / / W W W. K A G G L E . C O M / C O M P E T I T I O N S Public Domain All Rights Reserve d least restrictive most restrictive A SPECTRUM OF RIGHTS
  • 18. W H AT I S O P E N D ATA ? S O …
  • 19. Open data is information that is available for anyone to use, for any purpose, at no cost. G O O D O P E N D ATA • Can be linked: shared more easily • Available in a standard format: easily processed • Guaranteed availability and consistency: easily reliable • Traceable: easily trusted
  • 20. F R O M O P E N A C C E S S T O O P E N D ATA
  • 21. O V E R L E A F H T T P S : / / W W W. O V E R L E A F. C O M D RYA D H T T P : / / D A TA D RYA D . O R G
  • 22. D ATAV E R S E H T T P S : / / D A TA V E R S E . H A R VA R D . E D U D ATA G O V U K H T T P : / / D A TA . G O V. U K
  • 23. D ATA G O V U S H T T P : / / W W W. D A TA . G O V D AT O S G O B M E X H T T P : / / D A T O S . G O B . M X
  • 24. A N Y B I G I N S T I T U T I O N C O U L D P U B L I S H O P E N D ATA A N Y O N E E L S E ? T H E G U A R D I A N H T T P : / / W W W. T H E G U A R D I A N . C O M / N E W S / D A TA B L O G /
  • 25. O P E N D ATA 5 0 0 H T T P : / / W W W. O P E N D A TA 5 0 0 . C O M F I G S H A R E H T T P : / / F I G S H A R E . C O M
  • 26. U C I M A C H I N E L E A R N I N G H T T P : / / A R C H I V E . I C S . U C I . E D U / M L / T R A N S P O R T F O R L O N D O N H T T P S : / / T F L . G O V. U K / I N F O - F O R / O P E N - D A TA - U S E R S /
  • 27. W H AT C A N W E D O W I T H I T ? W H AT C A N W E D O W I T H I T ?
  • 28. W H E R E T O F I N D O P E N D ATA ?
  • 29. D ATA H U B H T T P : / / D A TA H U B . I O F I G S H A R E H T T P : / / F I G S H A R E . C O M
  • 30. R E G I S T E R O F D ATA R E P O S H T T P : / / W W W. R E 3 D A TA . O R G D ATA B I B H T T P : / / D A TA B I B . O R G
  • 31. D ATA C I T E H T T P S : / / W W W. D A TA C I T E . O R G O P E N D O A R H T T P : / / W W W. O P E N D O A R . O R G
  • 32. C K A N H T T P : / / C K A N . O R G G I T H U B H T T P S : / / G I T H U B . C O M
  • 33. G I T H U B - D ATA D RYA D H T T P S : / / G I T H U B . C O M / D A TA D RYA D S P I R A L - I M P E R I A L C O L L E G E H T T P S : / / S P I R A L . I M P E R I A L . A C . U K
  • 34. D S PA C E H T T P : / / W W W. D S PA C E . O R G S O U N D S G O O D … N O W W H AT ?
  • 35. S I M P L E G U I D E L I N E S 3 T H I N G S • Keep it simple • Engage early and often • Address common fears and misunderstandings
  • 36. 4 S T E P S • Choose your dataset(s) • Licensing • Make the data available • Make it discoverable D ATA S E T S • Asking the community • Cost basis • Ease of release • Observe peers
  • 37. L I C E N S I N G Data that doesn’t explicitly have an open license is NOT open data C O P Y R I G H T O V E R W O R K S Y O U C R E AT E A N D A R E O R I G I N A L T O Y O U . D ATA B A S E R I G H T O V E R C O L L E C T I O N S O F D ATA Y O U H AV E P U T A S U B S TA N T I A L E F F O R T I N T O O B TA I N I N G , V E R I F Y I N G O R P R E S E N T I N G ( O N LY E U , M E X I C O , B R A Z I L ) O W N E R S H I P
  • 38. C R E AT I V E C O M M O N S L I C E N S I N G K N O W T H E T Y P E S ! !
  • 39. O P E N D ATA C O M M O N S H T T P : / / O P E N D A TA C O M M O N S . O R G AVA I L A B I L I T Y • Data should be complete • In a (open) machine- readable format • It should contain metadata
  • 40. H O W ? • Your website • Existing repositories • Creating your own repository M A K E I T D I S C O V E R A B L E • Publish it in Public services (Datahub) • Index it in Catalog (Databib) • Promote it in your community • Engage with users
  • 41. D ATA S C I E N C E A N D D ATA L A K E S
  • 42. W H AT I S D ATA S C I E N C E A set of tools and techniques used to extract useful information from data. An interdisciplinary, problem-oriented subject.
  • 43. T H E I N G R E D I E N T S O F A D ATA S C I E N T I S T O N E M O R E T H I N G ! C O M M U N I C AT I O N S K I L L S
  • 44.
  • 45. W H AT I S I T ? D A TA L A K E Analytics DW Hadoop Y O U R T H O U G H T S ? T H A N K S ! D R J R O G E L - S A L A Z A R J . R O G E L @ P H Y S I C S . O R G @ Q U A N T U M _ T U N N E L / @ H I D D E N _ N O D E