Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Big data 4 webmonday
1. BigData - ...what is it all
about? Daniel Koller, @dakoller,
http://blog.dakoller.net
2. Data has always been there
http://www.flickr.com/photos/charlestilford/2552654321/in/photostream/
3. A Boeing jet produces 20 TB
data in flight ...
http://www.flickr.com/photos/idfonline/5707336691/in/photostream/
4. A Boeing jet produces 20 TB
data in flight ... PER HOUR!
http://www.flickr.com/photos/idfonline/5707336691/in/photostream/
5. New dimensions of
data
• handles more than 1 million customer
transactions every hour, > 2.5 petabytes of data -
the equivalent of 167 times the information
contained in all the books in the US Library of
Congress.
• handles 40 billion photos from its user
base.
• Decoding the originally took 10 years to
process; now it can be achieved in one week.
6. ...a kind of summary:
„Tools and techniques to
manage different types of data,
in high volume, in high velocity
with varied requirements to
mine them„
Size
Scale up and scale out: Terabyte, Stream
Petabyte … Torrent of real-time information
Structure Operation
Structured Massively Parallel Processing (MPP)
Unstructured : Audio, Video, Text,
GeoSpatial
Schema Less Structures
7. Which techniques can
you use to handle it?
• Machine Learning • Crowd Sourcing
• Natural Language • Regression Models
Processing (NLP) • Sentiment Analysis
• Cohort Analysis • Processing Signals
• Network or Path • Spatial Analytics
Analysis
• Visualization
• Predictive Models
• Time-series Analysis
8. Techniques: Machine
learning
• „A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E„ (E. Mitchell)
• Supervised / unsupervised learning
• Use cases: computer vision, recommender systems (Netflix prize), self-
driving cars
• http://en.wikipedia.org/wiki/Machine_learning
9. Techniques: Natural Language
Processing (NLP)
• Covers:
• stopword identification,
• entity recognition,
• machine translation,
• parsing & chunking of sentences
• Useful everywhere, where user generated content comes up
• Very good support in English, good support in european languages,
limited support for other languages
10. Techniques: Spatial
Analytics
• Discover geographic
contexts in an
information source
• Requires localizable data
(e.g. location names,
coordinates) of some
quality
• Examples: visualize social
networks, black death in
Europe (see on the
right), Google Flu Trends
11. Which techniques can
you use to handle it?
• Machine Learning • Crowd Sourcing
• Natural Language • Regression Models
Processing (NLP) • Sentiment Analysis
• Cohort Analysis • Processing Signals
• Network or Path • Spatial Analytics
Analysis
• Visualization
• Predictive Models
• Time-series Analysis
12. 3 Items to take home
• You can solve tasks now
which previously were
just not possible due to
limited resources.
• State your business
problem before looking
at the data.
• Try a combination of
different techniques to
optimize result