This document provides an overview and introduction to big data. It discusses the technical challenges of big data including issues of volume, variety, velocity and veracity. It also discusses solutions like Hadoop, MapReduce, and big data databases. Additionally, it covers big data analytics including different levels of analytics maturity and techniques like data mining, machine learning, and predictive analytics. Finally, it provides resources for learning more about big data including online courses, sandbox environments, open source tools, and public datasets.
1. Big Data, First Steps
First Steps on Big Data
Alexandre Simundi
Apr 2016
2. Agenda
• What’s all this fuzz about?
• Technical Challenges & Solutions
• Big Data Analytics
• Where to find knowledge?
3. Software Engineer / Solutions Architect
Over 11 years of experience in the IT market
Passionate about distributed architecture and
big data technologies
Linkedin: https://cl.linkedin.com/in/simundi
Contact: simundi@bdatalabs.com
Twitter: @simundi
Alexandre
Simundi
9. The real revolution is not in the machine
that calculates data but in the data itself
and how we use it
Big Data: a revolution that will transform how we live, work and think
11. 95% of data is unstructured
Name Country Football team Favorite Food
Alexandre Brazil Grêmio Barbeque
Structured
Unstructured
Hi, my name is Alexandre and
I’m from brazil. I support the
best football team in the world:
Grêmo and my favorite food is
barbeque.
12. If you can’t analyze all your data, you are blind to its
opportunities
31. Machine Learning
• Statistics + AI
• Predictive Methods
–Use some variables to
predict some unknown or
future values of other
variables
• Descriptive Methods
–Find human –
interpretable patterns
that describe the data
32. Supervised vs Unsupervised
• Supervised
– Learning in a presence of an expert/teacher
– Training data set is labeled with a class value
– Goal: Predict a class or value label
Unsupervised
– No knowledge of the output class/value
– Data is NOT labeled
– Goal: learn patterns/groupings
48. Take away
1. Big Data is a new section in the industry
(Photography vs Film)
2. N = All
3. Build up your strengths, catch up with your
weakness
4. Take advantage of open source power
5. Give it back
Editor's Notes
A little story..
In 1826, when the first time someoe managed to capture and record light.
1890, when motion picture cameras were invented and companies were established
A little story..
In 1826, when the first time someoe managed to capture and record light.
1890, when motion picture cameras were invented and companies were established
90% of the world’s data was created in the last 2 years
90% of the world’s data was created in the last 2 years
Photo with 5%http://science-all.com/images/mona-lisa/mona-lisa-06.jpg
Tell about the structured and unstructured data.
How power full
Talk about the scenarios
If you or your business have a facebook page/Blog/Trip Advisor, whatever page, where people make comments. You should be able to analyze this data
If watching customers online and not watching what they are doing what they are wearing
Flight tickets
At its core, correlation quantifies the statistical relationship between two data values. A strong correlation means than when one of the data value changes, the other is highly likely to change as well
With correlation there is no certainty, only probability. But if a correlatino is strong, the likelihood of a link is high.
Precitions basd on correlations lie at the heart of big data
What Is Data Mining?
Combination of AI and statistical analysis to discover information that is “hidden” in the data
History
Emerged late 1980sFlourished in 1990sRoots traced back along three family lines
Classical Statistics Artificial Intelligence Machine Learning
AI uses heuristics to simulate humans brain
Machine Learning Blends AI heuristics with advanced Statistical Analysis
What can be hidden in data?
Associations
Sequences
Classifications
Forecasting
Anomalies Grouping/Clusters/Segments
Data Businesspeople are those that are most focused on the organization and how data projects yield profit.
Data Creatives. We think of Data Creatives as the broadest of data scientists, those who excel at applying a wide range of tools and technologies to a problem, or creating innovative Data Developer. prototypes at hackathons — the quintessential Jack of All Trades.
We think of Data Developers as people focused on the technical problem of managing data — how to get it, store it, and learn from it
Data Researchers. One of the interesting career paths that leads to a title like “data scientist” starts with academic research in the physical or social sciences, or in statistics