Anzeige

SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

South Tyrol Free Software Conference
23. Nov 2017
Anzeige

Más contenido relacionado

Similar a SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"(20)

Anzeige

Más de South Tyrol Free Software Conference(20)

Anzeige

SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

  1. Exploring data with Elasticsearch and Kibana Patrick Puecher, Developer SFSCon, November 10th 2017
  2. Elastic Stack (the ELK Stack) Elasticsearch Kibana BeatsLogstash
  3. Elasticsearch - Distributed, RESTful search engine - Based on Lucene - Written in Java - Apache License - APIs - Indexes APIs - Document APIs - Search APIs - …
  4. Kibana - Visualize your data - Histograms, line graphs, pie charts, … - Time Series with Timelion
  5. Logstash - Server-side data processing pipeline - How Logstash works - Inputs - file, syslog, redis, beats, … - Filters - split, mutate (convert, rename, add_field, remove_field), date, … - Outputs - elasticsearch, file, email, exec, …
  6. Beats - Send data from machines to Logstash and Elasticsearch - Beats family - Filebeat - log files - Metricbeat - system and service metrics - Packetbeat - network data - Winlogbeat - windows event logs - Heartbeat (beta) - uptime monitoring
  7. Demo time! 1Big Data 4 Tourism - Input: CSV file - Data processing: Java API - Visualizing: Kibana 2Instagram Data - Input: JSON files - Data processing: Logstash & jq - Visualizing: Kibana ./jq
  8. Demo 1: Big Data 4 Tourism - Collect and visualize accommodation enquiries and bookings ○ Create Elasticsearch index ○ Tourism Data Collector (https://github.com/idm-suedtirol/big-data-for-tourism) - Upload and process CSV files - Written in Java - Open Source ツ ○ Kibana to visualize - Big Data 4 Tourism working group by IDM Südtirol - Alto Adige ○ Brandnamic, HGV, IDM Südtirol - Alto Adige, Internet Consulting, Limitis, LTS, Peer GmbH, SiMedia …
  9. PUT /tourism-data_2017 { "mappings" : { "enquiry" : { "properties" : { "arrival" : { "type" : "date", "format" : "epoch_millis||date" }, "departure" : { "type" : "date", "format" : "epoch_millis||date" }, "country.code" : { "type" : "keyword" }, "country.name" : { "type" : "keyword" }, "country.latlon" : { "type" : "geo_point" }, "adults" : { "type" : "byte" }, "children" : { "type" : "byte" }, "destination.code" : { "type" : "short" }, "destination.name" : { "type" : "keyword" }, "destination.latlon" : { "type" : "geo_point" }, "category.code" : { "type" : "byte" }, "category.name" : { "type" : "keyword" }, "booking" : { "type" : "boolean" }, "cancellation" : { "type" : "boolean" }, "submitted_on" : { "type" : "date", "format" : "epoch_millis||date||date_hour_minute_second"}, "length_of_stay" : { "type" : "short" } } } } } Demo 1: Create Elasticsearch index "2015-01-01","2015-01-03","","2","0","21027","1","1","0","2015-01-01T01:59:00"
  10. Demo 1: Tourism Data Collector
  11. Demo 1: Visualize sample data (I) 1 2
  12. Demo 1: Visualize sample data (II) 2 1 3
  13. How to get Instagram posts from South Tyrol? Demo 2: Instagram data Mission: Must-see places for route planner
  14. Demo 2: Instagram data 1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/)
  15. Demo 2: Instagram data 1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/) 2. Use QGIS to create a regularly-spaced grid of points
  16. Demo 2: Instagram data 1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/) 2. Use QGIS to create a regularly-spaced grid of points 3. Export points as latitude and longitude coordinates
  17. Demo 2: Instagram rate limits & scopes - Global rate limits on the Instagram platform (https://www.instagram.com/developer/limits/) - 5000 API calls / hour - Scopes - public_content - to read any public profile info and media on a user’s behalf (applications no longer accepted) :’(
  18. Demo 2: Instagram search API { "data":[ { "id":"1614761577805643016_1157147895", "user":{ "id":"1157147895","full_name":"Marc Hochstaffl","profile_picture":"…","username":"marc_hochstaffl" }, "images":{ "thumbnail":{"width":150,"height":150,"url":"…"},"low_resolution":{…},"standard_resolution":{…} }, "created_time":"1506714602", "caption":{ … }, "user_has_liked":false, "likes":{"count":181}, "tags":["sam","karposfasttrail","autumnud83cudf41","ahrntal","hundskehljoch"], "filter":"Normal", "comments":{"count":3}, "type":"image", "link":"https://www.instagram.com/p/BZoyRmCDf0I/", "location":{"latitude":47.05,"longitude":12.06667,"name":"Hundskehljoch","id":1033509208}, "attribution":null, "users_in_photo":[] } ], "meta":{"code":200} } https://api.instagram.com/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000
  19. PUT /_template/ instagram { "template" : "instagram*" , "mappings" : { "_default_" : { "properties" : { "images" : { … }, "carousel_media" : { … }, "geoip" : { "type": "geo_point" }, "users_in_photo" : { … }, "link" : { … }, "created_time" : { "type" : "date", "format" : "strict_date_optional_time||epoch_second" }, "caption" : { … }, "type" : { "type": "keyword" }, "tags" : { "type": "keyword" }, "filter" : { "type": "keyword" }, "likes.count" : { "type" : "integer" }, "comments.count" : { "type" : "integer" }, "location" : { … }, "id" : { "type" : "keyword" }, "user" : { … } } } } } Demo 2: Create Elasticsearch index
  20. input { http_poller { urls => { insta1 => "/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000" insta2 => "/v1/media/search?lat=47.049359378811829&lng=12.105570031601609&access_token=key&distance=5000" … } keepalive => false cookies => false request_timeout => 30 schedule => { every => "10m" } codec => "json" } } output { elasticsearch { hosts => ["127.0.0.1:9200"] index => "instagram-%{+YYYYMM}" document_id => "%{id}" } } Demo 2: Grab and store posts using Logstash (I)
  21. Demo 2: Grab and store posts using Logstash (II) filter { split { field => "data" } if [data][id] { mutate { convert => { "[data][comments][count]" => "integer" "[data][likes][count]" => "integer" } rename => { "[data][created_time]" => "[created_time]" "[data][images]" => "[images]" "[data][comments][count]" => "[comments_count]" … "[data][id]" => "[id]" "[data][user]" => "[user]" "[data][likes][count]" => "[likes_count]" } add_field => [ "geoip", "%{[location][latitude]},%{[location][longitude]}" ] remove_field => ["data", "meta"] } date { match => ["[caption][created_time]" , "UNIX"] target => [ "[caption][created_time]" ] } date { match => ["[created_time]" , "UNIX"] remove_field => [ "[created_time]" ] } } }
  22. Demo 2: Grab and store posts using Linux Shell #!/bin/bash insta=( 'https://api.instagram.com/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000' 'https://api.instagram.com/v1/media/search?lat=47.049359378811829&lng=12.105570031601609&access_token=key&distance=5000' ) count=0 while [ "x${insta[count]} " != "x" ] do MIN= `date -d '11 minutes ago' +"%s"` # reduce bandwidth URL= "${insta[count]} &min_timestamp= $MIN" curl -s $URL | jq -c '.data[] | .geoip = ((.location.latitude | tostring) + "," + (.location.longitude | tostring)) | {'index': {'_index': ("instagram-" + (.created_time | ' tonumber ' | gmtime | strftime("%Y%m"))), ' _type': "feed", ' _id': .id}}, .' | curl -s -XPOST localhost:9200/_bulk --data-binary @- & # start in background if [ $((($count + 1) % 20)) = 0 ]; then # parallelize wait fi count= $(( $count + 1 )) done Use a cron job to run the shell script every 10 minutes!
  23. Demo 2: Visualize posts by date July - August
  24. Demo 2: Daily rhythm (1 for monday … 7 for sunday) Sunday… 2 pm - 7 pm
  25. Demo 2: Top locations by number of posts (I) Riva del Garda Trento Bolzano Tre Cime Merano
  26. Demo 2: Top tags by number of posts snukiefulmartinisisters giuliavalentina valentinavignali valentinavignali querly_official igworld_global manueldietrich photography
  27. Demo 2: Top travellers
  28. Demo 2: Influencer Trentino
  29. Demo 2: Influencer South Tyrol
  30. Demo 2: Glassy human
  31. Data-Driven Advertising
Anzeige