Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Overview of modern software ecosystem for big data analysis

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 13 Anzeige

Overview of modern software ecosystem for big data analysis

Herunterladen, um offline zu lesen

Brief summary of modern software available today to provide the core infrastructure to provide collection and analysis of big data collected from sensors (internet of everything). Presented at the Dec 2015 Trillion Sensors Summit in Orlando FL.

Brief summary of modern software available today to provide the core infrastructure to provide collection and analysis of big data collected from sensors (internet of everything). Presented at the Dec 2015 Trillion Sensors Summit in Orlando FL.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Overview of modern software ecosystem for big data analysis (20)

Anzeige

Aktuellste (20)

Overview of modern software ecosystem for big data analysis

  1. 1. Overview of Modern Software Ecosystem for Big Data Analysis Michael Bryzek mbryzek@alum.mit.edu / @mbryzek Co-Founder and Chairman Flow Commerce Co-Founder and ex-CTO Gilt Groupe Trillion Sensors Summit - Dec 9 2015
  2. 2. Overview of modern practices related to software architecture for high volume big data applications Encourage reuse of infrastructure that has already been built so you can focus on analysis and information Goals
  3. 3. Representational State Transfer (REST) a uniform connector interface ● Resources - “nouns” ● Clear set of limited methods ● Standard (e.g. authorization) Cost of integration of nth service approaches 0 Roy Thomas Fielding’s Dissertation - https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf examples Stripe Twilio Github
  4. 4. Frameworks for REST API first - the most critical design element ● http://apidoc.me * ● http://swagger.io ● http://apiary.io Some companies (incl. Amazon) focus on API and care very little about the implementation. * my personal open source project
  5. 5. JSON - The “fat-free” alternative to XML
  6. 6. Javascript Object Notation (JSON) It’s just javascript - the most widely adopted programming language in the world Pros Cons Simple Readable Dense Still verbose No strong typing CPU overhead
  7. 7. Binary Protocols - Ideal for sensor data Key Features ● Language to describe schema ● Space efficient ● Fast serialization / deserialization Leading Protocols ● Protocol Buffers https://developers.google.com/protocol-buffers ● Avro https://avro.apache.org/ - tight integration with Hadoop ● Thrift https://thrift.apache.org/
  8. 8. Example: Avro Schema Definition
  9. 9. Data processing and analytics From https://aws.amazon.com/iot/
  10. 10. Data Platforms ● https://aws.amazon.com/iot/ - Amazon Kinesis, S3, Redshift, IOT - ● http://influxdata.com -open source time series database + analytics platform* ● http://confluent.io - data pipeline / real time processing built by Jay Kreps ● http://spark.apache.org/ - UC Berkeley / Cloudera led effort Currently seeing high activity and investment in both open source and commercial ventures. * I am an investor in influx
  11. 11. Summary and Recommendation Learning from history of evolution of software on internet… ● Define standards for interconnectivity (ala REST) ○ Avoid standards for data types (e.g. ECG) ● Choose simplicity as number one requirement ○ Avoid XML ● Adopt existing binary protocols, w/ code generation at boundaries ○ Avoid creating new protocols focused on last 5-10% improvement ● Adopt existing messaging / storage platforms for large data sets Keeping up to date: https://www.thoughtworks.com/radar
  12. 12. Thank you Michael Bryzek mbryzek@alum.mit.edu / @mbryzek Co-Founder and Chairman Flow Commerce Co-Founder and ex-CTO Gilt Trillion Sensors Summit - Dec 9 2015

×