Dirigida a directivos y analistas de mediana y gran empresa, Big Data Spain celebró una charla previa a la conferencia de la segunda edición del 7y 8 de noviembre del 2013.
Vídeo youtube: https://www.youtube.com/watch?v=6HbWErRCD1g
¿Quieres saber más?
http://www.paradigmatecnologico.com/
Oscar Méndez, co-fundador de www.paradigmatecnologico.com y www.stratio.com, habló de Big Data desde un punto de vista de negocio, y despejó dudas acerca del coste y recursos necesarios para aprovechar esta tecnología.
Las plataformas v2.0 post-Hadoop permiten el despligue rápido y simple de herramientas integradas de data mining, data processing, data analysis y data visualization. Los avances de los últimos 12 meses dejan atrás las limitaciones de sistemas de Business Intelligence tradicionales.
2. Big Data
Is it a real need or just trendy?
Why does it apply to my case?
3. Petabytes: Google 300 PB, facebook: 45 PB, Yahoo! 180 PB
Exabytes: U.S. healthcare
Zetabytes: 2011, 1.8 ZB created. World Information 9.57 ZB
YottaByte, Brontobyte, GeopByte to be reached
I do not have such a big volume of data
A big European company = Terabytes
4. But could or will have it:
Ever increasing amount of data, and more
heterogeneous:
Ubiquity, mobility, geolocation, social
networks, internet, sensors, M2M
CRMs, Call Centers, Emails, Documents, logs,
voice…
5. "There were 5 exabytes of information created by the entire world between the
dawn of civilization and 2003. Now that same amount is created every two days."
Google Ceo Eric Schmidt
6. Unstructured or semi structured data, equal to 85% of available data,
is not used by companies
This represent the new Fuel for companies
7. 83% of the surveyed companies were
able to do things with Big Data that
seemed impossible to achieve before
“The art of possible”
“Impossible is not a fact, it’s an opinion”
8. Value and real ROI are the best KPIs
•Increase of client acquisitions
• Increase in sales
• Resource optimization
• Customer loyalty
15. Extract value from data in any point of their life cycle
• Past: Stored data, Batch
mode
• Present: Current data
flows, Real time
• Future: Data and future
actions, Predictive
16. Big volume of data
Get value from Unstructured data
Get value from external data
Need for time or cost processsing reduction
Need for Data streaming analysis in real time
Algorithms, prediction or interactive analysis
Transform data into insights and value
Transformation to a Data driven company
20. Iterative and Cyclical
Choose a particular use case with a clear ROI
and time and budget limits
vs Big Bang
Avoid building a Big Data generic system and
then implementing projects over them
23. CUSTOMER SOLUTION
Big Data 2.0
∙
Up to 100x faster than Big Data 1.0
∙
Interactive analysis
∙
NoSQL with SQL Interface
∙
No need to change previous way of work
24. Which technology?
BIG DATA 2.0
Stratio
Cloudera Impala
Cloudera CDH4*
BIG DATA 1.0
NoSQL
Stream Processing
Hortonworks HDP*
EMC Pivotal HD
VoltDB
Storm
Microsoft HDInsight
C-Store
Apache HBASE
MapR Apache Drill
Espresso
Apache CouchDB
Scribe Aurora
SQLStream Platform
Cassandra FS
Apache HDFS
Open Source
Google Big Query
IBM Inphosphere Biginsight
Datastax Platform
Hadapt platform
Basho Riak
VMWare Redis
HP Vertica
Hstreaming Platform
Apache Giraph
Amazon EMR _& Red shift
MapR M3-M5-M7
EMC Greenplum
Voldemort
Apache S4 Apache Flume Kafka
NEO Techonology Neo4j*
Almacenamiento
Intel Hadoop
Mencache
EsperTech ESPER
Graph database
Hortonworks Stinger
StreamBase Platform
IBM Inphosphere Streams
FlockDB
EMC Isilon OnFS
Closed based on Open Source
Closed
Apache Cassandra
25. From Big Data 1.0
Batch of new technologies that allow us to extract value out of a dataset which, due
to it’s volume, variety or velocity, was not previously exploited
To Big Data 2.0
“Set of new technologies that extract value from all the available data of a
company”
29. Antena 3, nubeox : Big Data Recommendation engine
Monitoring of Streaming Videos
Description:
Recommendation Engine based not
only in the purchase history of the
customer, but also in their navigation
Advantages:
Increase in clickthrough
Increasing Conversions
Increase in sales
30. Customizing Web Sites: Behavioural Customization
Description:
Customizing homepages based on user navigation
Analysis and customization of the homepage and site in
real time for each user based on their browsing
Modification of contents, highlights, ads, in real time
based on user history
Advantages:
Over 300% increase in clickthrough
Creating millions of web pages in real time
Increasing Conversions
Increase in sales
Cost ten times lower than other solutions
Recommended links
News Interests
Top Searches
+79% clicks +160% clicks +43% clicks
vs. randomly selected
vs. one size fits all
vs. editor selected
31. Personalized Marketing with DataShake integration
Description:
Newsletter development, email-marketing or any
other sent material segmented by individual
preferences
Analyzes and takes into account:
• Financial information and user data
• Navigation and usage information from previous
marketing shipments
• Mobile app data (GPS, payments, browsing of
offers…)
• Users’ information from the social networks
Advantages:
Increased clickthrough
Increase in conversions and sales
Natural language processing – semantics and
sentiments
Combines private and public data
32. Complement private structured data with unstructured and
public data
Description:
Complementing the internal data of a company by
combining the structured and the unstructured
data, with the data generated by the web and
social networks, allows us to determine the validity
of the data of our brand, product or company.
The comparison and analysis of internal and
external data (web) increases the value of our data
and allows us to gain a competitive advantage over
our competitors.
Advantages:
It allows sales improvement.
Improves loyalty.
Increases Conversions.
Detects errors or data manipulation.
SEO improvement with regards to the users and
the public data.
Improves marketing and product boosting with
regards to trends.
Big Data
Page 32
33. BI and data analytics
Description:
Creation and/or complementation of BI systems and
data analytics
ETL tools and data uploading with a much higher
volume than the traditional ones
Capacity for analysis and visualization of all types of
data, including graphs and new data types
Advantages:
Ability to work with larger datasets without the need
to add or delete
Much faster and reliable systems
Massive reduction in cost (M € versus k €)
Natural language processing – semantics and
sentiments
A possibility to combine internal data with external
data (private and public data)
34. Telefónica Dynamic Insights (Smart Steps)
Description:
Collect mobile data, anonymised and
aggregated, to understand how segments of
the population collectively behave. Trace
trends and the behaviours of crowds, not
individuals. Use this insight to enlighten the
space between organisations and their
users, enabling them to improve their
propositions, and businesses.
Focus:
By being able to measure real behaviour, in
near real-time, 24/7, 365 days a year, we
can show the actual impact on society,
therefore enabling businesses and local
government to make better decisions.
35. Security and fraud detection
Description:
Analysis of large volumes of data, logs, security
systems, transactional systems
Faster correlation mechanisms and machine learning
algorithms allow early detection of attacks and
security risks with extra care to false positives
Internal fraud detection analyzing data and events
from applications and risk operations
Advantages:
Combines data from transactional systems with the
SIEM to help fight fraud
Tracks and identifies new fraud methods and trends
via user reviews
Fraud detection techniques specified through the use
of built-in patterns
Much larger data volumes and much higher velocity
Combines private and public data
36. M2M IoT: PARK AIR SYSTEMS
NORWAY (RMMS)
Description:
The Remote Maintenance & Monitoring System
(RMMS), provide a powerful, scalable and flexible
SCADA system to perform and wide range of tasks
required by CNS agents such as maintenance,
supervision, configuration and operation.
Integration of different systems and equipment shall be
possible and straightforward using open standard
protocols, real time monitoring, data storage, testing,
reporting, events notification,…
Focus:
The main task of the RMMS is to provide complete
access to the equipment supervised in order to monitor
every single available parameter as a mean of avoiding
personnel mobilization to the remote location.
Different levels of control over the system are also
provided to cover the requirements of supervision,
maintenance and control.
Five main elements compose the RMM system:
• RCSU: Remote Control and Status Unit.
• TP: Tower Panel.
• RMM: Remote Management & Monitoring.
• LMT/RMT: Local / Remote Management Terminal.
• CMMS: Central Management & Monitoring System.
37. Search Engines
Description:
Big Data Search Assist: Search engines optimized for Big
Data with self-learning improvements based on use
Search engines for websites, intranets, apps
With instant real-time search, single box with natural
language processing, suggestions, highlighting,
automatic corrections, “you wanted to say” tips, etc ...
Advantages:
Easy management for business users: Order of results,
filters, etc ...
Advanced features of the search engines with a cost ten
times lower than other solutions
Improved performance and scalability compared to
other solutions
Easy to integrate and use
38. ORM and social dialogue
Description:
It gives a full 360 º of a company or brand online,
showing a tool that integrates the three aspects that
define your actual online image:
How am I doing on social networks?:
Do I know how to usevfacebook, twitter, google +,
youtube, linkedin? How many followers do you have,
are you an influencer, do you generate content that
spreads out?
What is my presence and reputation on the Internet:
When it comes to me, how do people talk about me,
what is said, how does it evolve over time, what is my
position on the Internet regarding my competitors in
the different aspects that interest me.
SEO:
Simple and practical analysis of both internal SEO and
external SEO to complement and give an integrated
view of the above aspects of reputation and social
dialogue.
Advantages:
Real improvement of the company or the product by
analysing the evolution over time of the three major
aspects that define your online reputation.
It improves the negative aspects, and reinforce the
positive ones.
Increase in sales: Helps optimize and follow
marketing campaigns and improve sales.
Improving conversions and attracting new customers.
39. Social Mining
Description:
Analyzing various social networks and
movements, looking for brand penetration,
identifying influencers in conversations and
a static map of associated terms.
Advantages:
Entering the social dialogue and hot topics at
the right time multiplies by 100 times the
viralization
View how a social network moves as time goes
by
Allows to know what that the user is talking
about when referring to my products or my
brand.
Detection of influencers and detractors
Optimal visualization of the information.
Identification of the tags used most frequently
by the network to improve your SEO.
40. Social Network Tracking
Description:
Search the social network comments and
mentions of interest of a particular issue or event
for further evaluation, influencers detection and
graphical display of the conversation to facilitate
analysis.
Advantages:
Show real-time event (symposium, forum,
seminar, etc..) with visual information.
Get opinions and feelings about a topic in social
networks in real time
Identify the influencers of a hot topic
Risk detection and prevention
Emotional mining: Know the term that is most
popular for some people, brand, event, etc.and
this way you can know about the generated
feelings by the most important terms.
41. Web Content Scraping
Description:
Search the network content and publications on
specific subjects of our interest, to detect, filter,
collect and process relevant information in semireal time or batch.
Associated with the semantic analysis this allows
the detection and classification of the contents
effectively.
Advantages:
Allows the generating of sites in a dynamic way
without any intervention or exhaustive searches,
with the contents collected and categorized.
Unifies in a single web all the tasks that users have
to do manually, so it saves them money and
generates loyalty.
42. Tele5: Monitoring of logs for Streaming Videos
Description:
Monitoring the download and
streamming of videos.
Analysis of streaming
Quality of streaming
Peaks of service and bottle neck
Advantages:
Problems detection and alerts
Optimization of service
Tracking of campains
43. Massive information tagging
Description:
Allows you to label and categorize automatically and
massively, any type of content or information.
Advantages:
Allows searching, categorization, clustering, and be
able to extract value out of information otherwise
hardly findable and usable.
Utilizes state of the art tools to identify entities, NED
systems, NERD. These tools combined with the use of
disambiguation of entities using a Big Data system
containing the Wikipedia and other sources of
information.
Speed processing capabilities and data volume
superior to that of other systems.
45. Is not about Big Data, is about getting maximum value from data:
Get all the value data can give
Process and analyze new types of data: Unstructured, semistructured, streams of data
Convert data into big insights
Become a Data driven company
46. “the best way to predict the future is to create it”
Hilo de la presentación:TESIS----------Aparación de Big Data 2.0 (cambioedparadigma Big Query)Requerimientos: 100XNecesidad de arquitectura NO-HADOOP paraconseguirestosrequerimientosOPORTUNIDAD------------------------Dado quees la únicaplataforma NO-HADOOP open source, si la tesisescorrectaserá:The Open Source Big Data 2.0 Platform
A technological Change from Big Data 1.0 to Big Data 2.0, from Batchanalysis 12 years old technology Batch analysis, to interactive analysisstate of the art.Este proyecto se basa en la tesis de que se estáproduciendo un cambiotecnológico en el mundo de Big Data, querequiere un mayor rendimientocon capacidades de analisisinteractivo y capacidades de queries entiempo real. Se requiere un rendimiento 100X superior paraconvertir enunospocosminutoslashorasque se necesitaban con lastecnologíasanteriores.Para conseguirestascapacidadesesnecesarioabandonarhadoop, cuyaarquitecuraestálimitadaporconceptos con 12 años de antiguedad, comosunecesidad y dependendia de la persistencia en disco, y escrituras nooptimizads, que no permitiráalcanzar los requerimientos de 100XPerformace.En lugar de sin seguir con retraso los pasosya dados porotros, Stratiodesarrolla y proporciona la únicaplataforma Big Data open source nobasada en hadoop, creando y definiendonuevosparadigmas y posibilidadesquehanpermitidorealizarunaarquitecturaintegradaúnicatotalmenteconcebidapara el máximorendimiento 100X requeridoactualmente,adaptable, y sin vendor lock-in.