Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Data Beats Emotions – How DATEV Generates Business Value with Data-driven Decisions

130 Aufrufe

Veröffentlicht am

Four years ago we started with a data analytics platform to learn more about how our customers use our on-premise software and how it behaves out in the fields regarding function usage, exception rates and overall performance.

The talk is about the journey we had to take, coming from an existing web apps statistics tracking system to our current and still evolving Hadoop based ETL system. This includes the current technologies we use and the approcache on how we support reporting and dashboards.
This new Hadoop platform is used to collect, transform, enrich with data warehouse data, and analyze millions of log files every day. The generated insights help us to make data driven decisions for portfolio management, UX-Design, and overall software quality improvements with real business value.

You will hear about the Dos and Donts we learned, what we think are best practices, and the new challenges we have to deal with while data volume and management awareness is still emerging.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Data Beats Emotions – How DATEV Generates Business Value with Data-driven Decisions

  1. 1. DATEV eG Data Beats Emotions How DATEV Generates Business Value with Data-driven Decisions matthias.mueller@datev.de / @bicaluv
  2. 2. DATEV eG Agenda  About the data  Processing  Business values  What’s next 22.04.2019 Data Beats Emotions 2
  3. 3. DATEV eG DATEV – Company  Founded in 1966 as a co-operative organization  Main business is software for tax consulting, accounting, and law business  Our customers are mostly tax consultants and their clients  B2B market  7,500 employees (1,800 devs)  1 billion euro annual revenue in 2018  Typical tax consultant has around 10 employees. Few up to 1,500  40,000 co-operative members  160,000 companies using our software on behalf of their tax consultants 22.04.2019 Data Beats Emotions 3
  4. 4. DATEV eG DATEV – Software on-premises running at customers site we do have data center applications, but not focused in this talk MS Windows based, incl. MS SQL Server 250 different applications 22.04.2019 Data Beats Emotions 4
  5. 5. DATEV eG About the data  Based on in-memory logs generated for every on-prem application  Logs include  Clicks / Tracked User Interactions  Exceptions  Performance data  + metadata: OS, screen resolution, touch device, UI themes, no IP ! 22.04.2019 Data Beats Emotions 5
  6. 6. DATEV eG About the data – General Data Protection Regulation Compliance  Personal data tracking requires agreement / consent management  Dialog shown to each user  no agreement, no tracking data  2 data schemas from client  actual data with GUID (Globally Unique Identifier, generated at client site)  agreement with GUID and User ID (for data warehouse joins)  Essential for handling right to be forgotten without requiring big data deletes 22.04.2019 Data Beats Emotions 6 { GUID, [data] } Click { GUID, UserID, [ true | false ] } Agreement
  7. 7. DATEV eG About the data – GDPR Compliance 22.04.2019 Data Beats Emotions 7 { A1, „File.Open“ } Click1 { A1, User42, true } Agreement { A1, „File.Quit“ } Click2 Big Data World …
  8. 8. DATEV eG About the data – GDPR Compliance 22.04.2019 Data Beats Emotions 8 { A1, „File.Open“ } Click1 { A1, User42, true } Agreement { A1, „File.Quit“ } Click2 Big Data World …
  9. 9. DATEV eG About the data – Current Figures 22.04.2019 Data Beats Emotions 9 1 2 Agreements Consent Rate Startup of every Application 60 GB Logfiles per day (decompressed) 200million events per day (6,000/s) Components with 1,250 dynamic trace points 30 Total Client Events in Hadoop Cluster billion Unique User per day 200,000 Approx. 50 83%
  10. 10. DATEV eG22.04.2019 Data Beats Emotions 10© Galusha Photography / fotolia.com
  11. 11. DATEV eG In early 2015 we tried using online tracking tool 22.04.2019 Data Beats Emotions 11 © kirill_makarov / fotolia.com
  12. 12. DATEV eG …starting in 2016 we experimented with 22.04.2019 Data Beats Emotions 12 © Henry Schmitt / fotolia.com
  13. 13. DATEV eG …at the end of 2016 it settled down to be a more mature approach 22.04.2019 Data Beats Emotions 13 © joerg dirmeitis / fotolia.com
  14. 14. DATEV eG Actual Processing 22.04.2019 Data Beats Emotions 14 Data Center HTTPS Hadoop ClusterOn-premises ReportingInternet Tracking Server ISA DEV Team of 7, including Devs, Data Scientist, Master of Ceremony, Requirements Engineer, and Product Owner OP Team of 2, operate the data center platforms DMZ
  15. 15. DATEV eG Actual Processing 22.04.2019 Data Beats Emotions 15 Data Center HTTPS Hadoop ClusterOn-premises ReportingInternet Tracking Server ISA DMZ
  16. 16. DATEV eG Actual Processing – Client  Continuous monitoring of client logs using ring buffer (remember: no individual agreement, no data)  on-premises clients send data every 3 hours (random distribution of sending time based on installation time)   continuous flow of data  BTW: We do dogfooding for client site data tracking, like buffer overruns, CPU, and memory usage 22.04.2019 Data Beats Emotions 16 HTTPS
  17. 17. DATEV eG Actual Processing – Ingestion  Proprietary protocol to get from ISA to Cluster (DMZ)  Transfers incoming unsecure data to secure data center every 5 minutes   continuous flow of data to Hadoop Edge Node 22.04.2019 Data Beats Emotions 17
  18. 18. DATEV eG Actual Processing – Ingestion  CRON & Batch: Once every night, data gets processed  Decompress  Filter (valid timestamp, test data)  Store and upload to HDFS in file chunks of 100 MB 22.04.2019 Data Beats Emotions 18
  19. 19. DATEV eG Actual Processing – ETL Phase 1  CRON & Batch: Once every night, data gets processed  Start Spark job for agreement data  Start Spark jobs for hot data (window of 5 days) – De-duplicate data – Add delayed received data – Generate ORC files with data partitioned by day – Optimize partitions (e.g. delete outdated partitions due to retention policy) – Automated check of internal compliance regulations (it is not allowed that data contains customer confidential data) 22.04.2019 Data Beats Emotions 19
  20. 20. DATEV eG Actual Processing – ETL Phase 2  Start Spark jobs to update data for reports  Generate ORC files for Star Schema (facts and dimensions)  Aggregations and calculations for reporting  Update files of report tool incrementally by reading ORC files using Hive ODBC (external tables) 22.04.2019 Data Beats Emotions 20
  21. 21. DATEV eG HDP 2.6.5 Production Cluster 22.04.2019 Data Beats Emotions 21 Data Center Rack 1 Rack 2 Edge Master Workers …0001 …0003 …000 …0015 …0016 …0002 …0004 …0006 …0013 …0014 each 48 Cores, 512 GB RAM, 16 TB HDD, RHEL 7 each 48 Cores, 512 GB RAM, 1 TB HDD, RHEL 7 each 48 Cores, 512 GB RAM, 16 TB HDD, RHEL 7 Edge Master Workers each 48 Cores, 512 GB RAM, 1 TB HDD, RHEL 7
  22. 22. DATEV eG Reporting 22.04.2019 Data Beats Emotions 22 Guided Analytics using © Saklakova / fotolia.com
  23. 23. DATEV eG Actual Processing - Reporting  UX (including click counts)  Exceptions  Performance 22.04.2019 Data Beats Emotions 23 22 different default reports
  24. 24. DATEV eG Actual Processing – Reporting Example 22.04.2019 Data Beats Emotions 24 Top 10 Screen Resolution
  25. 25. DATEV eG Actual Processing – Reporting Example 22.04.2019 Data Beats Emotions 25 Top 10 Screen Resolution by Target Market Clients / Companies Tax Consultants Data Warehouse Other Lawyers
  26. 26. DATEV eG 0 5,000 10,000 15,000 20,000 25,000 1 2 3 4 5 6 7 Actual Processing – Reporting Example 22.04.2019 Data Beats Emotions 26 Program Usage by Target Market Clients / Companies Tax Consultants Data Warehouse Member CountMember Type Education Institutes Public Sector Lecturer Other
  27. 27. DATEV eG22.04.2019 Data Beats Emotions 27 © artiemedvedev / fotolia.com
  28. 28. DATEV eG Business Values  UX, e.g. optimized screen resolution  Check „Payed Beta Testers“ actual program usage  A/B comparison (usage and performance)  Proof of sales license bundles  Performance anomaly detection, e.g. based on OSs 22.04.2019 Data Beats Emotions 28
  29. 29. DATEV eG Business Values  Discontinuation of over 10 applications and over 30 features within apps  saves hours in dev and support  €  Detailed field analysis for new application  „saved trouble“ from 4,500 customers caused by missing features  Counting of real SQL server licenses in use  saves € 22.04.2019 Data Beats Emotions 29
  30. 30. DATEV eG22.04.2019 Data Beats Emotions 30 © bluedesign / fotolia.com
  31. 31. DATEV eG  Too many different reports requested  Too many domain/application specific reports  Too much domain specific know-how required  Requested to support more data sources like splunk, AppDynamics, and online apps Obstacles 22.04.2019 Data Beats Emotions 31 © gustavofrazao / fotolia.com
  32. 32. DATEV eG Evolve from Guided Analytics… 22.04.2019 Data Beats Emotions 32 On-Prem Statistics Data Program Statistics Add. Data Warehouse Statistics Team only Producer Consumers POs Standard Reports
  33. 33. DATEV eG  Decentralize Analytics  Open report generation for more users  Supporting add-hoc SQL queries using Hive 3 + LLAP  Supporting Excel remember: Excel is king (for BI) Self-Service Analytics 22.04.2019 Data Beats Emotions 33 © vege / fotolia.com
  34. 34. DATEV eG …to Self-Service Analytics 22.04.2019 Data Beats Emotions 34 On-Prem Online Statistics Data Source A Data Abstraction Data Catalog Reporting Environment Data Scientist Power User Producers Consumers Manager Data Governance Process Publishing Workflow Program Statistics Add. Data Warehouse Source B Source …
  35. 35. DATEV eG New Challenges  Data Governance / Guidance for KPIs  Teaching  Data literacy 22.04.2019 Data Beats Emotions Seite 35 © Neyro/ fotolia.com
  36. 36. DATEV eG Self-Service Analytics PoC Example  Exception Path Analysis using Kibana + Elasticsearch 22.04.2019 Data Beats Emotions 36 previous
  37. 37. DATEV eG Self-Service Analytics PoC Example  Exception Path Analysis using Kibana + Elasticsearch 22.04.2019 Data Beats Emotions 37 previous
  38. 38. DATEV eG Self-Service Analytics PoC Example 22.04.2019 Data Beats Emotions 38  Number of Exceptions on DVD after Release using Qlik Sense Example Data only
  39. 39. DATEV eG Self-Service Analytics PoC Example 22.04.2019 Data Beats Emotions 39  Top 5 Exceptions by DVDs using Qlik Sense Example Data only
  40. 40. DATEV eG22.04.2019 Data Beats Emotions 40 © abramsdesign / fotolia.com
  41. 41. DATEV eG22.04.2019 Data Beats Emotions 41 © Brian Jackson / fotolia.com
  42. 42. DATEV eG22.04.2019 Data Beats Emotions 42
  43. 43. DATEV eG

×