SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Datacamp
                 Data publishing and management
                 for small and medium organizations




Stefan Urbanek                                        March 2010
stefan@knowerce.sk
Introduction and
     context
Datacamp




                                               nt
                                       elo pme
                                    dev
Datacamp ETL                  und
                                 er



  Manager                 Brewery
3
4
5
Datacamp
Fair-play Alliance




7
=   +



8
+

             manage data
    with the quality process of an enterprise


9
publish data like documents in CMS




10
For Visitors




     data catalogue   searching   sharing




11
For Owners




data descriptions   import   quality management




12
For Remixers



              application
         programming interface




13
Features
Data Storage

                hosted and managed
                     internally




15
Refillable Datasets

                dataset – container with
                  predefined structure




16
Metadata

            localizable descriptions of
                datasets and fields




17
Number of Datasets




     10      100            1 000          10 000+


          small to medium number of datasets
18
19
import
                   new




                         check/publish



                                    publish
                  active                        suspended
                (published)                      (hidden)
                                   suspend


              delete
                                     delete

                                              undelete

                 deleted
                 (closed)



                         destroy




20
Application
Programming Interface
<?xml version="1.0" encoding="UTF-8"?>
<dataset-description>
  <category-id type="integer">1</category-id>
  <collection-mode></collection-mode>
  <created-at type="datetime">2009-09-13T09:59:51Z</created-at>
  <data-provider></data-provider>
  <data-source-type></data-source-type>
  <database nil="true"></database>
  <format-rule-id type="integer" nil="true"></format-rule-id>
  <granularity></granularity>
  <id type="integer">5</id>
  <identifier>ds_eurodonations</identifier>




               dataset and field
                  metadata
                                                                  22
raw data

           23
Command Line Tool
     $ datacamp -h
     Usage: tools/datacamp [-h] [OPTIONS] REQUEST [ARGUMENTS]
     Send REQUEST to a Datacamp application and return server reply.
     Options:
       -b url          specify base URL for Datacamp. Default: http://localhost:3000
       -k api_key      specify API key for accessing Datacamp data
       -f format       request different format, if available. Options are: xml
       -g get_method   method of accessing the datacamp: curl (default), wget

     Environment variables:
       DATACAMP_BASE_URL
       DATACAMP_API_KEY
       DATACAMP_FORMAT
       DATACAMP_GET_METHOD

     Example:
       datacamp version
       datacamp datasets



24
25
     new applications
The Application
27
28
29
30
31
32
33
Future
Short-term


     ■ search engine improvement
     ■ expose predicates API
     ■ “known datacamps”
     ■ public quality rating


35
Long Term


     ■ “business” rules
     ■ UI for change history
     ■ attachments
      ■ scanned invoices, reports, ...



36
Datacamp ETL Manager


             slightly more technical
files with data

      web




staging        ETL             datasets       application
              extraction
            transformation
                loading



                                                            38
company register extraction

                   public procurement extraction

                   ...




      extraction          staging                loading   datasets
web     manager


                  temporary or downloaded files


                                                                39
40
List of Jobs

     ■ job identification
     ■ enabled?
     ■ order in which jobs are run
     ■ schedule – days on which jobs are run
     ■ flag jobs to force run again despite scheduling

41
failed
         42
Current State

     ■ map one table to other
     ■ identity mapping for convenience
     ■ append
     ■ update by key
     ■ compare tables
     ■ automatically finalize loading on success

43
SQL

44
Future

     ■ ETL jobs without programming
       ■ no SQL, no Ruby
       ■ covers most of the cases
     ■ parallelisation of jobs
     ■ finer scheduling
     ■ mail notification

45
we have data,
 what now?
47
Thank you
Copyrights and Credits
     ■   Silos by Noodle Snacks: http://commons.wikimedia.org/wiki/File:Maria_Cement_Silos.jpg, CC Attribution, Share Alike 3.0 Unported
     ■   Icons by Oxygen Team: http://www.iconfinder.net/search/1/?q=iconset:oxygen, GPL
     ■   Icons by Alessandro Rei, KDE, GPL
     ■   Angel Wings by *Spyrogs, Deviant Art: http://spirogs.deviantart.com/art/Angel-Wings-Tatoo-87089782
     ■   Coins by Mnemo, Wikimedia Commons, http://commons.wikimedia.org/wiki/File:Swedish_coins_20050924.jpg, CC Attribution, Share Alike 3.0
     ■   Folder icon: Benji Garner, Icon set: Rise, Free for commercial use
     ■   Application icon by Sergio Sanchez Lopez, GPL
     ■   Network icon by Everaldo Coelho, Icon set: Crystal Clear, LGPL




49

Weitere ähnliche Inhalte

Ähnlich wie Datacamp @ Transparency Camp 2010

Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengBuilding Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengDatabricks
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Stefan Urbanek
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! Embarcadero Technologies
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFramesDatabricks
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFramesSpark Summit
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...Databricks
 
Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106Mark Tabladillo
 
Lightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataLightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataPavel Hardak
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Databricks
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesSingleStore
 
About The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe AnalyticsAbout The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe AnalyticsKevin Haag
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 

Ähnlich wie Datacamp @ Transparency Camp 2010 (20)

Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengBuilding Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
 
Big Data Analytics With MATLAB
Big Data Analytics With MATLABBig Data Analytics With MATLAB
Big Data Analytics With MATLAB
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Informatica slides
Informatica slidesInformatica slides
Informatica slides
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
 
Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
 
Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106
 
Lightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataLightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional data
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
About The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe AnalyticsAbout The Event-Driven Data Layer & Adobe Analytics
About The Event-Driven Data Layer & Adobe Analytics
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Datacamp @ Transparency Camp 2010

  • 1. Datacamp Data publishing and management for small and medium organizations Stefan Urbanek March 2010 stefan@knowerce.sk
  • 3. Datacamp nt elo pme dev Datacamp ETL und er Manager Brewery 3
  • 4. 4
  • 5. 5
  • 8. = + 8
  • 9. + manage data with the quality process of an enterprise 9
  • 10. publish data like documents in CMS 10
  • 11. For Visitors data catalogue searching sharing 11
  • 12. For Owners data descriptions import quality management 12
  • 13. For Remixers application programming interface 13
  • 15. Data Storage hosted and managed internally 15
  • 16. Refillable Datasets dataset – container with predefined structure 16
  • 17. Metadata localizable descriptions of datasets and fields 17
  • 18. Number of Datasets 10 100 1 000 10 000+ small to medium number of datasets 18
  • 19. 19
  • 20. import new check/publish publish active suspended (published) (hidden) suspend delete delete undelete deleted (closed) destroy 20
  • 22. <?xml version="1.0" encoding="UTF-8"?> <dataset-description> <category-id type="integer">1</category-id> <collection-mode></collection-mode> <created-at type="datetime">2009-09-13T09:59:51Z</created-at> <data-provider></data-provider> <data-source-type></data-source-type> <database nil="true"></database> <format-rule-id type="integer" nil="true"></format-rule-id> <granularity></granularity> <id type="integer">5</id> <identifier>ds_eurodonations</identifier> dataset and field metadata 22
  • 23. raw data 23
  • 24. Command Line Tool $ datacamp -h Usage: tools/datacamp [-h] [OPTIONS] REQUEST [ARGUMENTS] Send REQUEST to a Datacamp application and return server reply. Options: -b url specify base URL for Datacamp. Default: http://localhost:3000 -k api_key specify API key for accessing Datacamp data -f format request different format, if available. Options are: xml -g get_method method of accessing the datacamp: curl (default), wget Environment variables: DATACAMP_BASE_URL DATACAMP_API_KEY DATACAMP_FORMAT DATACAMP_GET_METHOD Example: datacamp version datacamp datasets 24
  • 25. 25 new applications
  • 27. 27
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33
  • 35. Short-term ■ search engine improvement ■ expose predicates API ■ “known datacamps” ■ public quality rating 35
  • 36. Long Term ■ “business” rules ■ UI for change history ■ attachments ■ scanned invoices, reports, ... 36
  • 37. Datacamp ETL Manager slightly more technical
  • 38. files with data web staging ETL datasets application extraction transformation loading 38
  • 39. company register extraction public procurement extraction ... extraction staging loading datasets web manager temporary or downloaded files 39
  • 40. 40
  • 41. List of Jobs ■ job identification ■ enabled? ■ order in which jobs are run ■ schedule – days on which jobs are run ■ flag jobs to force run again despite scheduling 41
  • 42. failed 42
  • 43. Current State ■ map one table to other ■ identity mapping for convenience ■ append ■ update by key ■ compare tables ■ automatically finalize loading on success 43
  • 45. Future ■ ETL jobs without programming ■ no SQL, no Ruby ■ covers most of the cases ■ parallelisation of jobs ■ finer scheduling ■ mail notification 45
  • 46. we have data, what now?
  • 47. 47
  • 49. Copyrights and Credits ■ Silos by Noodle Snacks: http://commons.wikimedia.org/wiki/File:Maria_Cement_Silos.jpg, CC Attribution, Share Alike 3.0 Unported ■ Icons by Oxygen Team: http://www.iconfinder.net/search/1/?q=iconset:oxygen, GPL ■ Icons by Alessandro Rei, KDE, GPL ■ Angel Wings by *Spyrogs, Deviant Art: http://spirogs.deviantart.com/art/Angel-Wings-Tatoo-87089782 ■ Coins by Mnemo, Wikimedia Commons, http://commons.wikimedia.org/wiki/File:Swedish_coins_20050924.jpg, CC Attribution, Share Alike 3.0 ■ Folder icon: Benji Garner, Icon set: Rise, Free for commercial use ■ Application icon by Sergio Sanchez Lopez, GPL ■ Network icon by Everaldo Coelho, Icon set: Crystal Clear, LGPL 49

Hinweis der Redaktion