My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania
http://www.eurocloud.ro/en/events/all-there-is-to-know-about-big-data/#.UXZFaUDvlVI
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Cloud as a Data Platform
1. Cloud as a Data Platform
What is (Big) Data? Amazon Data Services
2. Andrei Savu
Founder of Axemblr.com
Co-organizer of Bucharest JUG
Lead of Apache Provisionr
Passion for Automation & Data Analysis
Connect with me on LinkedIn
3. @ Axemblr
Data Processing Infrastructure
Deployment Automation on IaaS platforms
Product: Hadoop On-Demand Appliance
Apache Provisionr (Open Source)
Consulting & Professional Services
8. 1. Volume
"Simple models work better with more data"
The Unreasonable Effectiveness of Data
Alon Halevy, Peter Norvig, and Fernando Pereira, Google
Challenging from a technical perspective
Needs scalable storage
Distributed query engines (massively parallel)
9. 2. Velocity
Nothing new for financial traders
Tight feedback loop as competitive advantage
Complex event processing (CEPs)
Online stream summarization (estimation)
Online aggregation (key-value stores)
Long term storage for batch processing
10. 3. Variety
The reality of data is messy and the format
evolves over time
Entity Resolution, Language Detection etc.
Mantra: Detect Schema, Annotate, Enrich
12. (Big) data is messy
80% efforts go into identifying sources,
integration and cleaning
Messy and disconnected: different systems,
different networks, different departments
Consider data-markets
13. (Big) data has gravity
Tends to attract processing services
The cost of moving may be large
14. Cloud or in-house?
Cloud:
● for development & exploration
● low usage or variable capacity needs
In-house:
● due to strict regulations
● for performance and cost efficiency
15. People & Data Science
You need a team that combines: math,
programming and scientific instinct
Building data-science teams
http://radar.oreilly.com/2011/09/building-data-science-teams.html
19. Rule of thumb
"Advice to businesses starting out with big data:
first, decide what problem you want to solve." *
Christer Johnson, IBM’s leader for advanced
analytics in North America
* create data-driven business processes (more)
21. Based on my work at
Magnolia Labs Inc. http://magnolialabs.com/
San Francisco, CA based company with R&D
in Romania
Various products: RTB (real-time bidding),
Secure Browsing etc.
They are hiring! info@magnolialabs.com