Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Dockerizing a multi-component Open Data app

3.330 Aufrufe

Veröffentlicht am

using Docker and Docker Compose to orchestrate a multi-component application with database querying over HTTP API, data-processing with Virtuoso, stats visualizations with OpenStreetMap

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

Dockerizing a multi-component Open Data app

  1. 1. Dockerizing a multi- component Open Data app Athens Docker Meetup, June 2016 Dimitris Negkas, Stergios Tsiafoulis dimneg@gmail.com, s.tsiafoulis@gmail.com
  2. 2. Description and Scope LinkedEconomy (http://linkedeconomy.org/).  is a publicly available web platform and linked data repository.  its scope is to transform, curate, aggregate, interlink and publish economic data in machine- readable format, to enable  citizens awareness  research with unprecedented data  evidence-based policy
  3. 3. Data Sources  Sources Currently used:  Transparency – DIAVGEIA  Central Electronic Registry of Public Procurement - E- Procurement  National Strategic Reference Framework (NSRF)  Central Market of Thessaloniki (CMT)  e-Prices  Fuel Prices  Municipality of Athens, Municipality of Thessaloniki  Government of Australia
  4. 4. Data growth  we use Open Link Virtuoso for 15 different sources of nearly 1B triples  we host 27 datasets in CKAN from 15 organizations  data is increased respectively each month
  5. 5. Data processing  Each data source is separately handled and processed as its available data are not uniformly provided or in machine- readable format.  Diavgeia, “NSRF” and Observatories for product and fuel prices provide a rich API interface that can be easily queried in order to provide machine-readable data in JSON format.  In the cases of E-Procurement, “CMT” and “Municipalities of Athens and Thessaloniki” there is no API available. Thus, we have developed a software module, which gathers online information in an automated way, storing it in a machine-readable format.
  6. 6. General Architecture  Process model  Open economic data related to public budgeting, spending and prices are characterized of high volume, velocity, variety and veracity  We have to build custom components under the common logic of transforming static data to linked open data streams.
  7. 7. Process model: Nucleus  The nucleus of our approach is semantic modelling, data enrichment and interconnections.  Data are stored in raw (as harvested from sources), in RDF and json formats.
  8. 8. Process model : Data distribution  Enriched data are distributed though five channels: 1. Data dumps (CKAN), 2. SPARQL queries, 3. Web, 4. Social media 5. Structured inputs to Business Intelligence (BI) systems.  Additionally, data can be further analysed and exchanged with relevant platforms (e.g. SPARQL to R).
  9. 9. Process model : Validation and messenger  The validation component runs throughout the whole process in order to safeguard high data quality by detecting errors.  The messaging component works as an internal messaging and alert system for all components.
  10. 10. Process flow
  11. 11. Infrastructure Functionalities / Components Services / Data sources VM1 linkedeconomy.org apache, php, mysql, drupal VM2 SPARQL endpoint, demo site OLV, apache, php, mysql, drupal VM3 Harvester CouchDB, Lucene, apache, mysql / CKAN (Greek Datasets) VM4 Harvester, Messenger mysql, LinkedEconomy dropbox VM5 Storage - Secondary triplestore CouchDB, OLV, CouchDB-Lucene, docker VM6 Harvester apache, php, mysql, drupal / CKAN (Foreign Datasets) VM7 SPARQL endpoint OLV (Foreign graphs) VM8 Management JIRA, mysql, tomcat VM9 Dashboard front-end, CMS, INSPINIA VM10 System administration VPN, firewalls, etc. Physical Storage - Core triplestore OLV (Greek graphs) As core infrastructure we use ~okeanos, which is an established cloud-based service provided for the Greek research and academic community.
  12. 12. LinkedEconomy
  13. 13. CKAN
  14. 14. “Hottest” Prices per municipality
  15. 15. Supermarkets Geoinformation
  16. 16. Application System Small Applications Java, Php and UNIX Scripts Di@vgeia KHMDHS Virtuoso CouchDB Drupal MySql ePrices CKAN fuelPricesQGIS
  17. 17. Dockerize the System Di@vgeia KHMDHS ePrices Virtuoso Drupal MySql QGIS Desktop CouchDB QGIS Server Small Applications CKAN
  18. 18. With Compose 2
  19. 19. Docker MySQL  version: '2'  services:  mysql:  build: ./mysql-docker/5.6  container_name: eLodDrupalmySQL  volumes:  - /mysql_drupal:/var/lib/mysql  environment:  - MYSQL_DATABASE=drupalelod  - MYSQL_ROOT_PASSWORD=eLodmysqlpass  restart: on-failure Save your data !! Will build the image from your directory Do not use flag “always” in your development environment!
  20. 20. Docker Drupal  drupal:  build: ./docker-drupal  command:  - /start.sh  depends_on:  - mysql  container_name: eLodDrupal  #image: eLodDrupal  ports:  - "8081:80"  volumes:  - "/data_drupal:/var/www/html"  links:  - "mysql"  environment:  - MYSQL_DATABASE=drupalelod  - MYSQL_USER=root  - MYSQL_PASSWORD=eLodmysqlpass  - DRUPAL_ADMIN_PW=eLODDR  - DRUPAL_ADMIN=admin  - MYSQL_HOST=eLodDrupalmySQL  - DRUPAL_ADMIN_EMAIL=stetsiafoulis@gmail.com  restart: on-failure Will start the service only after MySQL service Will link the container with MySQL container
  21. 21. Docker Virtuoso  virtuoso:  build: ./docker-virtuoso  container_name: eLodVirtuoso  ports:  - "8890:8890"  volumes:  - /virtuoso/db:/var/lib/virtuoso/db  environment:  - DBA_PASSWORD=eLodVir  - SPARQL_UPDATE=true  - DEFAULT_GRAPH=http://localhost:8890/DAV  restart: on-failure
  22. 22. Docker QGIS  qgisdesktop:  #image: kartoza/qgis-desktop:2.14  build: ./qgis-desktop/2.14  hostname: qgis-server  volumes:  #Wherever you want to mount your data from  - ./gis:/gis  #Unix socket for X11  - "/tmp/.X11-unix:/tmp/.X11-unix"  links:  - db:db  environment:  - DISPLAY=unix:1  command: /usr/bin/qgis
  23. 23. Build the system  Clone the repository from github https://github.com/stetsiafoulis/eLOD  Create the directories where you are going to link your data  Enter docker-compose up -d and that’s it !!
  24. 24. Why Docker ? o Portable o Lightweight o Move to different cloud infrastructures and to Physical servers o Run on Virtual Machines for development and testing o Easily Scale o Easy Delivery and deployment o Run Anywhere (regardless host distro, physical, cloud or not ) o Run Anything
  25. 25. What’s Next ??
  26. 26. Scaling per Source Di@ygeia KHMDHS Virtuoso Drupal MySql QGIS Desktop CouchDB QGIS Server Small Applications Virtuoso Drupal MySql CouchDB QGIS Server Small ApplicationsQGIS Desktop
  27. 27. Run Small Apps through Docker API Small Applications
  28. 28. Next Steps - Swarm Virtuoso Drupal MySql CouchDB QGIS Server Cluster management Scaling State reconciliation Multi-host networking Service discovery Load balancing
  29. 29. Next Steps - Consul Health CheckingService Discovery Multi Datacenter support
  30. 30. Any Questions ??
  31. 31. Appendix - Data Sources links  LinkedEconomy (http://linkedeconomy.org/).  linkedeconomy@gmail.com  Sources Currently used:  Transparency - DIAVGEIA: https://diavgeia.gov.gr  Central Electronic Registry of Public Procurement - E-Procurement (KHDMHS): http://www.eprocurement.gov.gr  National Strategic Reference Framework (NSRF):https://www.espa.gr/en  Central Market of Thessaloniki (CMT):http://www.kath.gr/  e-Prices: http://www.e-prices.gr/  Fuel Prices: http://www.fuelprices.gr/  Municipality of Athens: https://www.cityofathens.gr/khe/proypologismos  Municipality of Thessaloniki: http://www.thessaloniki.gr/portal/page/portal/DioikitikesYpiresies/GenDnsiDioikOikonYpiresion/DnsiDiafanEksipirDimoton/Tmima Diafaneias/AnoiktiDdiathesiDedomenon/DimosiefsiEktelesisProipologismou/ektelesi-proypologismou  Government of Australia: http://data.gov.au/

×