SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Framework for Real time Analytics
By Mohsin Hakim
Real Time Analytics
Index
Introduction
Evolving BI and Analytics for Big Data
Impacts to Traditional BI Databases
Challenges
MongoDB with Hadoop
Case Studies
Current Scenario
Introduction
 Analytics falls along a spectrum. On one end of the spectrum sit batch analytical applications, which are
used for complex, long-running analyses. They tend to have slower response times (up to minutes, hours, or
days) and lower requirements for availability. Examples of batch analytics include Hadoop-based workloads
 On the other end of the spectrum sit real-time analytical applications, which provide lighter-weight
analytics very quickly. Latency is low (sub-second) and availability requirements are high (e.g., 99.99%).
MongoDB is typically used for real-time analytics. Example applications include:
Business Intelligence (BI) and analytics provides an essential set of technologies and processes
that organizations have relied upon over many years to guide strategic business decisions.
Introduction
1. Predictable Frequency. Data is extracted from source systems at regular intervals -
typically measured in days, months and quarters
2. Static Sources. Data is sourced from controlled, internal systems supporting established
and well-defined back-office processes
3. Fixed Models. Data structures are known and modeled in advance of analysis. This
enables the development of a single schema to accommodate data from all of the source
systems, but adds significant time to the upfront design
4. Defined Queries. Questions to be asked of the data (i.e., the reporting queries) are
pre-defined. If not all of the query requirements are known upfront, or requirements
change, then the schema has to be modified to accommodate changes
5. Slow-changing requirements. Rigorous change-control is enforced before the
introduction of new data sources or reporting requirements
6. Limited users. The consumers of BI reports are typically business managers and senior
executives
Evolving BI and Analytics for Big Data
Higher Uptime Requirements
The immediacy of real-time analytics accessed
from multiple fixed and mobile devices places
additional demands on the continuous availability
of BI systems.
Batch-based systems can often tolerate a certain
level of downtime, for example for scheduled
maintenance. Online systems on the other hand
need to maintain operations during both failures
and planned upgrades.
The Need for Speed & Scale
Time to value is everything. For example, having
access to real-time customer sentiment or
logistics tracking is of little benefit unless the data
can be analyzed and reported in real-time. As a
consequence, the frequency of data acquisition,
integration and analysis must increase from days
to minutes or less, placing significant operational
overhead on BI systems.
Agile Analytics and Reporting
With such a diversity of new data sources,
business analysts can not know all of the
questions they need to ask in advance.
Therefore an essential requirement is that
the data can be stored before knowing how
it will be processed and queried.
The Changing Face of Data
Data generated by such workloads as social,
mobile, sensor and logging, is much more
complex and variably structured than
traditional transaction data from back-office
systems such as ERP, CRM, PoS (Point of Sale)
and Accounts Receivable.
Taking BI to the Cloud
The drive to embrace cloud computing to
reduce costs and improve agility means BI
components that have traditionally relied on
databases deployed on monolithic, scale-up
systems have to be re-designed for the
elastic scale-out, service-oriented
architectures of cloud.
Impacts to Traditional BI Databases
The relational databases underpinning many of today’s traditional BI platforms are not well suited to the requirements of big
data:
• Semi-structured and unstructured data typical in mobile, social and sensor-driven applications cannot be efficiently
represented as rows and columns in a relational database table
• Rapid evolution of database schema to support new data sources and rapidly changing data structures is not
possible in relational databases, which rely on costly ALTER TABLE operations to add or modify table attributes
• Performance overhead of JOINs and transaction semantics prevents relational databases from keeping pace with the
ingestion of high-velocity data sources
• Quickly growing data volumes require scaling databases out across commodity hardware, rather than the scale-up
approach typical of most relational databases
Relational databases’ inability to
handle the speed, size and diversity
of rapidly changing data generated
by modern applications is already
driving the enterprise adoption of
NoSQL and Big Data technologies in
both operational and analytical
roles.
The purpose
• Flume in Hadoop, for batch processing, which make the data relevant time-wise; it can be used
for real time because it would be too fresh, only from several min to even a second late.
• Flume engine, using server side in order to make decisions regarding the current state of
affairs.
• Decisions Making are made based on whatever data is received from customers’ current
condition without all of the history in their user profiles, which would enable a much more
informed decision.
• State of Art Auto updating charting and report creation with Dashboard UI.
Increase scalability and performance of Organizations using Real
Time Analysis platform with a focus on storing, processing and
analyzing the exponentially growing data using big data
technologies.
Challenges
1. Getting data metrics to the right people
Often, social media is treated like the ugly stepchild within the marketing department and real-time
social media analytics are either absent or ignored.
2. Visualization
Visualizing real-time social media analytics is another key element involved in developing insights
that matter.
Simply displaying values graphically helps in making the kinds of fast interpretations necessary for
making decisions with real-time data, but adding more complex algorithms and using models
provides deeper insights, especially when visualized.
3. Unstructured data is challenging
Unlike the survey data firms are used to dealing with, most (IBM estimates 80%) is unstructured —
meaning it consists of words rather than numbers. And, text analytics lags seriously behind numeric
analysis.
4. Increasing signal to noise
Social media data is inherently noisy. Reducing noise to even detect signal is challenging — especially
in real time. Sure, with enough time, new analytics tools can ferret out the few meaningful
Top 10 Priorities
1 Enable new fast-paced business practices
2 Don’t expect the new stuff to replace the old stuff
3 Do not assume that all the data needs to be in real time, all the time
4 Correlate real-time data with data from other sources and latencies
5 Start with a proof of value with measurable outcomes
6 As a safe starter project, accelerate successful latent processes into near real time
7 Think about operationalizing analytics
8 Think about the skills you need
9 Examine application business rules to ensure they are ready for real-time data flows
10 Evaluate technology platforms and expertise for availability and reliability
Challenges
Real-Time Analytics is Hard
Can’t Stay Ahead. You need to account for
many types of data, including unstructured
and semi-structured data. And new sources
present themselves unpredictably.
Relational databases aren’t capable of
handling this, which leaves you hamstrung.
Can’t Scale. You need to analyze terabytes
or petabytes of data. You need sub-second
response times. That’s a lot more than a
single server can handle. Relational
databases weren’t designed for this
Batch. Batch processes are the right
approach for some jobs. But in many cases,
you need to analyze rapidly changing,
multi-structured data in real time. You
don’t have the luxury of lengthy ETL
processes to cleanse data for later.
MongoDB Makes it Easy
Do the Impossible. MongoDB can incorporate any
kind of data – any structure, any format, any
source – no matter how often it changes. Your
analytical engines can be comprehensive and real-
time.
Scale Big. MongoDB is built to scale out on
commodity hardware, in your data center or in the
cloud. And without complex hardware or extra
software. This shouldn’t be hard, and with
MongoDB, it isn’t.
Real Time. MongoDB can analyze data of any
structure directly within the database, giving you
results in real time, and without expensive data
warehouse loads.
Why Other Databases Fall Short and MangoDB
Most databases make you chose between a flexible data
model, low latency at scale, and powerful access. But
increasingly you need all three at the same time.
 Rigid Schemas. You should be able to analyze unstructured, semi-structured, and
polymorphic data. And it should be easy to add new data. But this data doesn’t
belong in relational rows and columns. Plus, relational schemas are hard to
change incrementally, especially without impacting performance or taking the
database offline.
 Scaling Problems. Relational databases were designed for single-server
configurations, not for horizontal scale-out. They were meant to serve 100s of ops
per second, not 100,000s of ops per second. Even with a lot of engineering hours,
custom sharding layers, and caches, scaling an RDBMS is hard at best and
impossible at worst.
 Takes Too Long. Analyzing data in real time requires a break from the familiar
ETL and data warehouse approach. You don’t have time for lengthy load
schedules, or to build new query models. You need to run aggregation queries
against variably structured data. And you should be able to do so in place, in real
time.
Organizations are using MongoDB for analytics because it
lets them store any kind of data, analyze it in real time,
and change the schema as they go.
New Data. MongoDB’s document model enables you to store and process data
of any structure: events, time series data, geospatial coordinates, text and
binary data, and anything else. You can adapt the structure of a document’s
schema just by adding new fields, making it simple to bring in new data as it
becomes available.
Horizontal Scalability. MongoDB’s automatic sharding distributes data across
fleets of commodity servers, with complete application transparency. With
multiple options for scaling – including range-based, hash-based and location-
aware sharding – MongoDB can support thousands of nodes, petabytes of
data, and hundreds of thousands of ops per second without requiring you to
build custom partitioning and caching layers.
Powerful Analytics, In Place, In Real Time. With rich index and query
support – including secondary, geospatial and text search indexes – as well as
the aggregation framework and native MapReduce, MongoDB can run complex
ad-hoc analytics and reporting in place.
MongoDB with Hadoop
MongoDB Hadoop
Ebay
User data and metadata
management for product
catalog
User analysis for personalized
search & recommendations
Orbitz
Management of hotel data
and pricing
Hotel segmentation to support
building search facets
Pearson
Student identity and access
control. Content
management of course
materials
Student analytics to create
adaptive learning programs
Foursquare
User data, check-ins,
reviews, venue content
management
User analysis, segmentation and
personalization
Tier 1
Investment
Bank
Tick data, quants analysis,
reference data distribution
Risk modeling, security and fraud
detection
Industrial
Machinery
Manufactur
er
Storage and real-time
analytics of sensor data
collected from connected
vehicles
Preventive maintenance
programs for fleet optimization.
In-field monitoring of vehicle
components for design
enhancements
SFR
Customer service applications
accessed via online portals
and call centers
Analysis of customer usage,
devices & pricing to optimize
plans
The following table provides examples of customers using MongoDB together with Hadoop to power big
data applications.
Whether improving customer service, supporting cross-sell and upsell, enhancing business efficiency or
reducing risk, MongoDB and Hadoop provide the foundation to operationalize big data.
Future Trends in Real-Time Data, BI, and
Analytics
Data types handled in real time today. Numerous TDWI surveys have shown that structured
data (which
includes relational data) is by far the most common class of data types handled for BI and
analytic purposes, as well as many operational and transactional ones. It’s no surprise that
structured data bubbled to the top of Figure 16. Other data types and sources commonly
handled in real time today include application logs (33%), event data (26%), semi-structured
data (26%), and hierarchical and raw data (24% each).
Data types to be handled in real time within three years. Looking ahead, a number of data
types are poised for greater real-time usage. Some are in limited use today but will
experience aggressive adoption within three years, namely social media data (38%), Web logs
and clickstreams (34%), and unstructured data (34%). Others are handled in real time today
and will become even more so, namely event (36%), semi-structured (33%), structured (31%),
and hierarchical (30%) data.
Case Studies
MongoDB Integration with BI and Analytics
Tools
To make online big data actionable through dashboards, reports,
visualizations and integration with other data sources, it must be
accessible to established BI and analytics tools. MongoDB offers integration
with more of the leading BI tools than any other NoSQL or online big data
technology, including:
Actuate Alteryx Informatica
Jaspersoft Logi Analytics MicroStrategy
Pentaho Qliktech SAP Lumira
WindyGrid’s
One person, one laptop, and MongoDB’s technology jumpstarted a project that, with
other people joining in, went from prototype to one of the nation’s pioneering projects
to analyze and act on municipal data in real time. In just four months.
WindyGrid put Chicago on the path of revolutionizing how it operates not by replacing
the administrative systems already in place, but by using MongoDB to bring that data
together into a new application. With MongoDB’s flexible data model, WindyGrid doesn’t
have to go back and redo the schema for each new piece of data. Instead, it can evolve
schemas in real time. Which is crucial as WindyGrid expands and adds predictive
analytics, growing by millions of pieces of structured and unstructured data each day.
Crittercism is A Mobile Pioneer
Crittercism doesn’t just monitor apps or gather information. Using MongoDB’s powerful built in
query functions, it analyzes avalanches of unstructured and non-uniform data in real time. It
recognizes patterns, identifies trends, and diagnoses problems. That means that Cirttercism’s
customers immediately understand the root cause of problems and the impact they’re having on
business. So they know how to prioritize and correct the problems they’re facing and improve
performance
The kind of real time analysis that Crittercism provides customers would also be impossible
with traditional databases. Crittercism is using MongoDB’s powerful query functions to
analyze the broad variety of data it collects, in real time, within the database. A more
traditional data warehouse approach, with ETLs and long loading times, can’t match this
type of speed.
At the same time, MongoDB lets Crittercism efficiently handle the tons of data it’s
collecting. During the past two years, the number of requests that Crittercism gathers and
analyzes has jumped from 700 to 45,000 per second. Relational databases have a hard time
scaling to meet these kinds of demands, typically requiring expensive add-on software, or
additional layers of proprietary code, to keep up. With MongoDB, horizontal scalability
across multiple data centers is a native function.
McAfee - Global Cybersecurity
GTI analyzes cyberthreats from all angles, identifying threat relationships, such as malware used in
network intrusions, websites hosting malware, botnet associations, and more. Threat information is
extremely time sensitive; knowing about a threat from weeks ago is useless.
In order to provide up to date, comprehensive threat information, needs to quickly process terabytes of
different data types (such as IP address or domain) into meaningful relationships:
e.g. Is this web site good or bad? What other sites have been interacting with it? The success of the cloud-based system also
depends on a bidirectional data flow: GTI gathers data from millions of client sensors and provides real-time intelligence
back to these end products, at a rate of 100 billion queries per month.
Was unable to address these needs and effectively scale out to millions of records with their existing solutions. For example,
the HBase / Hadoop setup made it difficult to run interesting, complex queries, and experienced bugs with the Java garbage
collector running out of memory. Another issue was with sharding and syncing;
Lucene was able to index in interesting ways, but required too much customization.
compensated for all the rebuilding and redeploying of Katta shards with “the usual scripting duct tape,” but what they really
needed was a solution that could seamlessly handle the sharding and updating on its own.
selected MongoDB, which had excellent documentation and a growing community that was “on fire.”
Power Journalism
BuzzFeed, the social news and entertainment company, relies on MongoDB to analyze all performance data
for its content across the social web. A core part of BuzzFeed’s publishing platform, MongoDB exposes
metrics to editors and writers in real time, to help them understand how its content is performing and to
optimize for the social web. The company has been using MongoDB since 2010. Here’s why.
1.Analytics provide more insight, more quickly. relies on MongoDB for its strategic analytics platform. With apps and
dashboards built on MongoDB, can pinpoint when content is viewed and how it is shared. With this approach, is able to quickly
gain insight on how its content performs, nimbly optimize user’s experience for posts that are performing best and is able to
deliver critical feedback to its writers and editors.
2.BuzzFeed is data-driven. At BuzzFeed, data drives decision-making and powers the company. MongoDB enables to
effectively analyze, track and expose a range of metrics to writers and employees. This includes: the number of clicks; how
often and where posts are being shared; which views on different social media properties lead to the most shares; and how
views differ across mobile and desktop.
3.Successful web journalism demands scale. processes large volumes of data and this is increasing each year as the site’s
traffic continues to grow. Originally built on a relational data store, decided to use MongoDB, a more scalable solution, to
collect and track the data they need with a richer functionality than a standard key-value store.
4.Editors gain edge with access to data in minutes. Fast, easy access to data is critical to helping editors determine what
content will be most shareable in the social media world. With MongoDB, is able to expose performance data shortly after
publication, enabling editors to quickly respond by tweaking headlines and determine the best way to promote.
5.Setting the infrastructure for new applications. As continues its efforts to leverage stats and optimization, MongoDB will
feature prominently in the new infrastructure. MongoDB makes it easy to build apps quickly – a requirement as rolls out
additional products.
Current Scenario
Current Offerings

Weitere ähnliche Inhalte

Was ist angesagt?

Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleVasu S
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data managementDavid Walker
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analyticsDendej Sawarnkatat
 
Lecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data WarehouseLecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data Warehousephanleson
 
The principles of the business data lake
The principles of the business data lakeThe principles of the business data lake
The principles of the business data lakeCapgemini
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Accenture hana-in-memory-pov
Accenture hana-in-memory-povAccenture hana-in-memory-pov
Accenture hana-in-memory-povK Thomas
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
 
Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)Denodo
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive AnalyticsNandita Nityanandam
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft Private Cloud
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehousemark madsen
 
Redefining Data Analytics Through Search
Redefining Data Analytics Through SearchRedefining Data Analytics Through Search
Redefining Data Analytics Through SearchConnexica
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse EMC
 

Was ist angesagt? (18)

Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data management
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analytics
 
Lecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data WarehouseLecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data Warehouse
 
The principles of the business data lake
The principles of the business data lakeThe principles of the business data lake
The principles of the business data lake
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Accenture hana-in-memory-pov
Accenture hana-in-memory-povAccenture hana-in-memory-pov
Accenture hana-in-memory-pov
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)
 
IBM 2016 - Six reasons to upgrade your database
IBM 2016 - Six reasons to upgrade your databaseIBM 2016 - Six reasons to upgrade your database
IBM 2016 - Six reasons to upgrade your database
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehouse
 
Redefining Data Analytics Through Search
Redefining Data Analytics Through SearchRedefining Data Analytics Through Search
Redefining Data Analytics Through Search
 
Best practices and trends in people soft
Best practices and trends in people softBest practices and trends in people soft
Best practices and trends in people soft
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
 

Andere mochten auch

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
KDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialKDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialNeera Agarwal
 
High-Volume Data Collection and Real Time Analytics Using Redis
High-Volume Data Collection and Real Time Analytics Using RedisHigh-Volume Data Collection and Real Time Analytics Using Redis
High-Volume Data Collection and Real Time Analytics Using Rediscacois
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
Real-Time Analytics with MemSQL and Spark
Real-Time Analytics with MemSQL and SparkReal-Time Analytics with MemSQL and Spark
Real-Time Analytics with MemSQL and SparkSingleStore
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsArun Kejariwal
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 

Andere mochten auch (8)

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
KDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics TutorialKDD 2016 Streaming Analytics Tutorial
KDD 2016 Streaming Analytics Tutorial
 
High-Volume Data Collection and Real Time Analytics Using Redis
High-Volume Data Collection and Real Time Analytics Using RedisHigh-Volume Data Collection and Real Time Analytics Using Redis
High-Volume Data Collection and Real Time Analytics Using Redis
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Real-Time Analytics with MemSQL and Spark
Real-Time Analytics with MemSQL and SparkReal-Time Analytics with MemSQL and Spark
Real-Time Analytics with MemSQL and Spark
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 

Ähnlich wie Real Time Analytics

Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Business Intelligence Solution on Windows Azure
Business Intelligence Solution on Windows AzureBusiness Intelligence Solution on Windows Azure
Business Intelligence Solution on Windows AzureInfosys
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives☁Jake Weaver ☁
 
Traditional data word
Traditional data wordTraditional data word
Traditional data wordorcoxsm
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)Moacyr Passador
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWADecember 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWACarsten Roland
 

Ähnlich wie Real Time Analytics (20)

Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Business Intelligence Solution on Windows Azure
Business Intelligence Solution on Windows AzureBusiness Intelligence Solution on Windows Azure
Business Intelligence Solution on Windows Azure
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Accelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data InitiativesAccelerating Time to Success for Your Big Data Initiatives
Accelerating Time to Success for Your Big Data Initiatives
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWADecember 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big data
Big dataBig data
Big data
 

Mehr von Mohsin Hakim

Mohsin Hakim summery
Mohsin Hakim summeryMohsin Hakim summery
Mohsin Hakim summeryMohsin Hakim
 
History and Kings in India
History and Kings in IndiaHistory and Kings in India
History and Kings in IndiaMohsin Hakim
 
For freshers presentation
For freshers presentationFor freshers presentation
For freshers presentationMohsin Hakim
 
Engineering - Iinformation for teenagers
Engineering - Iinformation for teenagersEngineering - Iinformation for teenagers
Engineering - Iinformation for teenagersMohsin Hakim
 

Mehr von Mohsin Hakim (8)

MohsinHakim
MohsinHakimMohsinHakim
MohsinHakim
 
Mohsin hakim
Mohsin hakimMohsin hakim
Mohsin hakim
 
Iphone
IphoneIphone
Iphone
 
Mohsin Hakim summery
Mohsin Hakim summeryMohsin Hakim summery
Mohsin Hakim summery
 
History and Kings in India
History and Kings in IndiaHistory and Kings in India
History and Kings in India
 
For freshers presentation
For freshers presentationFor freshers presentation
For freshers presentation
 
Engineering - Iinformation for teenagers
Engineering - Iinformation for teenagersEngineering - Iinformation for teenagers
Engineering - Iinformation for teenagers
 
Job help
Job helpJob help
Job help
 

Kürzlich hochgeladen

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Kürzlich hochgeladen (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 

Real Time Analytics

  • 1. Framework for Real time Analytics By Mohsin Hakim Real Time Analytics
  • 2. Index Introduction Evolving BI and Analytics for Big Data Impacts to Traditional BI Databases Challenges MongoDB with Hadoop Case Studies Current Scenario
  • 3. Introduction  Analytics falls along a spectrum. On one end of the spectrum sit batch analytical applications, which are used for complex, long-running analyses. They tend to have slower response times (up to minutes, hours, or days) and lower requirements for availability. Examples of batch analytics include Hadoop-based workloads  On the other end of the spectrum sit real-time analytical applications, which provide lighter-weight analytics very quickly. Latency is low (sub-second) and availability requirements are high (e.g., 99.99%). MongoDB is typically used for real-time analytics. Example applications include: Business Intelligence (BI) and analytics provides an essential set of technologies and processes that organizations have relied upon over many years to guide strategic business decisions.
  • 4. Introduction 1. Predictable Frequency. Data is extracted from source systems at regular intervals - typically measured in days, months and quarters 2. Static Sources. Data is sourced from controlled, internal systems supporting established and well-defined back-office processes 3. Fixed Models. Data structures are known and modeled in advance of analysis. This enables the development of a single schema to accommodate data from all of the source systems, but adds significant time to the upfront design 4. Defined Queries. Questions to be asked of the data (i.e., the reporting queries) are pre-defined. If not all of the query requirements are known upfront, or requirements change, then the schema has to be modified to accommodate changes 5. Slow-changing requirements. Rigorous change-control is enforced before the introduction of new data sources or reporting requirements 6. Limited users. The consumers of BI reports are typically business managers and senior executives
  • 5. Evolving BI and Analytics for Big Data Higher Uptime Requirements The immediacy of real-time analytics accessed from multiple fixed and mobile devices places additional demands on the continuous availability of BI systems. Batch-based systems can often tolerate a certain level of downtime, for example for scheduled maintenance. Online systems on the other hand need to maintain operations during both failures and planned upgrades. The Need for Speed & Scale Time to value is everything. For example, having access to real-time customer sentiment or logistics tracking is of little benefit unless the data can be analyzed and reported in real-time. As a consequence, the frequency of data acquisition, integration and analysis must increase from days to minutes or less, placing significant operational overhead on BI systems. Agile Analytics and Reporting With such a diversity of new data sources, business analysts can not know all of the questions they need to ask in advance. Therefore an essential requirement is that the data can be stored before knowing how it will be processed and queried. The Changing Face of Data Data generated by such workloads as social, mobile, sensor and logging, is much more complex and variably structured than traditional transaction data from back-office systems such as ERP, CRM, PoS (Point of Sale) and Accounts Receivable. Taking BI to the Cloud The drive to embrace cloud computing to reduce costs and improve agility means BI components that have traditionally relied on databases deployed on monolithic, scale-up systems have to be re-designed for the elastic scale-out, service-oriented architectures of cloud.
  • 6. Impacts to Traditional BI Databases The relational databases underpinning many of today’s traditional BI platforms are not well suited to the requirements of big data: • Semi-structured and unstructured data typical in mobile, social and sensor-driven applications cannot be efficiently represented as rows and columns in a relational database table • Rapid evolution of database schema to support new data sources and rapidly changing data structures is not possible in relational databases, which rely on costly ALTER TABLE operations to add or modify table attributes • Performance overhead of JOINs and transaction semantics prevents relational databases from keeping pace with the ingestion of high-velocity data sources • Quickly growing data volumes require scaling databases out across commodity hardware, rather than the scale-up approach typical of most relational databases Relational databases’ inability to handle the speed, size and diversity of rapidly changing data generated by modern applications is already driving the enterprise adoption of NoSQL and Big Data technologies in both operational and analytical roles.
  • 7. The purpose • Flume in Hadoop, for batch processing, which make the data relevant time-wise; it can be used for real time because it would be too fresh, only from several min to even a second late. • Flume engine, using server side in order to make decisions regarding the current state of affairs. • Decisions Making are made based on whatever data is received from customers’ current condition without all of the history in their user profiles, which would enable a much more informed decision. • State of Art Auto updating charting and report creation with Dashboard UI. Increase scalability and performance of Organizations using Real Time Analysis platform with a focus on storing, processing and analyzing the exponentially growing data using big data technologies.
  • 8. Challenges 1. Getting data metrics to the right people Often, social media is treated like the ugly stepchild within the marketing department and real-time social media analytics are either absent or ignored. 2. Visualization Visualizing real-time social media analytics is another key element involved in developing insights that matter. Simply displaying values graphically helps in making the kinds of fast interpretations necessary for making decisions with real-time data, but adding more complex algorithms and using models provides deeper insights, especially when visualized. 3. Unstructured data is challenging Unlike the survey data firms are used to dealing with, most (IBM estimates 80%) is unstructured — meaning it consists of words rather than numbers. And, text analytics lags seriously behind numeric analysis. 4. Increasing signal to noise Social media data is inherently noisy. Reducing noise to even detect signal is challenging — especially in real time. Sure, with enough time, new analytics tools can ferret out the few meaningful
  • 9. Top 10 Priorities 1 Enable new fast-paced business practices 2 Don’t expect the new stuff to replace the old stuff 3 Do not assume that all the data needs to be in real time, all the time 4 Correlate real-time data with data from other sources and latencies 5 Start with a proof of value with measurable outcomes 6 As a safe starter project, accelerate successful latent processes into near real time 7 Think about operationalizing analytics 8 Think about the skills you need 9 Examine application business rules to ensure they are ready for real-time data flows 10 Evaluate technology platforms and expertise for availability and reliability
  • 10. Challenges Real-Time Analytics is Hard Can’t Stay Ahead. You need to account for many types of data, including unstructured and semi-structured data. And new sources present themselves unpredictably. Relational databases aren’t capable of handling this, which leaves you hamstrung. Can’t Scale. You need to analyze terabytes or petabytes of data. You need sub-second response times. That’s a lot more than a single server can handle. Relational databases weren’t designed for this Batch. Batch processes are the right approach for some jobs. But in many cases, you need to analyze rapidly changing, multi-structured data in real time. You don’t have the luxury of lengthy ETL processes to cleanse data for later. MongoDB Makes it Easy Do the Impossible. MongoDB can incorporate any kind of data – any structure, any format, any source – no matter how often it changes. Your analytical engines can be comprehensive and real- time. Scale Big. MongoDB is built to scale out on commodity hardware, in your data center or in the cloud. And without complex hardware or extra software. This shouldn’t be hard, and with MongoDB, it isn’t. Real Time. MongoDB can analyze data of any structure directly within the database, giving you results in real time, and without expensive data warehouse loads.
  • 11. Why Other Databases Fall Short and MangoDB Most databases make you chose between a flexible data model, low latency at scale, and powerful access. But increasingly you need all three at the same time.  Rigid Schemas. You should be able to analyze unstructured, semi-structured, and polymorphic data. And it should be easy to add new data. But this data doesn’t belong in relational rows and columns. Plus, relational schemas are hard to change incrementally, especially without impacting performance or taking the database offline.  Scaling Problems. Relational databases were designed for single-server configurations, not for horizontal scale-out. They were meant to serve 100s of ops per second, not 100,000s of ops per second. Even with a lot of engineering hours, custom sharding layers, and caches, scaling an RDBMS is hard at best and impossible at worst.  Takes Too Long. Analyzing data in real time requires a break from the familiar ETL and data warehouse approach. You don’t have time for lengthy load schedules, or to build new query models. You need to run aggregation queries against variably structured data. And you should be able to do so in place, in real time. Organizations are using MongoDB for analytics because it lets them store any kind of data, analyze it in real time, and change the schema as they go. New Data. MongoDB’s document model enables you to store and process data of any structure: events, time series data, geospatial coordinates, text and binary data, and anything else. You can adapt the structure of a document’s schema just by adding new fields, making it simple to bring in new data as it becomes available. Horizontal Scalability. MongoDB’s automatic sharding distributes data across fleets of commodity servers, with complete application transparency. With multiple options for scaling – including range-based, hash-based and location- aware sharding – MongoDB can support thousands of nodes, petabytes of data, and hundreds of thousands of ops per second without requiring you to build custom partitioning and caching layers. Powerful Analytics, In Place, In Real Time. With rich index and query support – including secondary, geospatial and text search indexes – as well as the aggregation framework and native MapReduce, MongoDB can run complex ad-hoc analytics and reporting in place.
  • 12. MongoDB with Hadoop MongoDB Hadoop Ebay User data and metadata management for product catalog User analysis for personalized search & recommendations Orbitz Management of hotel data and pricing Hotel segmentation to support building search facets Pearson Student identity and access control. Content management of course materials Student analytics to create adaptive learning programs Foursquare User data, check-ins, reviews, venue content management User analysis, segmentation and personalization Tier 1 Investment Bank Tick data, quants analysis, reference data distribution Risk modeling, security and fraud detection Industrial Machinery Manufactur er Storage and real-time analytics of sensor data collected from connected vehicles Preventive maintenance programs for fleet optimization. In-field monitoring of vehicle components for design enhancements SFR Customer service applications accessed via online portals and call centers Analysis of customer usage, devices & pricing to optimize plans The following table provides examples of customers using MongoDB together with Hadoop to power big data applications. Whether improving customer service, supporting cross-sell and upsell, enhancing business efficiency or reducing risk, MongoDB and Hadoop provide the foundation to operationalize big data.
  • 13. Future Trends in Real-Time Data, BI, and Analytics Data types handled in real time today. Numerous TDWI surveys have shown that structured data (which includes relational data) is by far the most common class of data types handled for BI and analytic purposes, as well as many operational and transactional ones. It’s no surprise that structured data bubbled to the top of Figure 16. Other data types and sources commonly handled in real time today include application logs (33%), event data (26%), semi-structured data (26%), and hierarchical and raw data (24% each). Data types to be handled in real time within three years. Looking ahead, a number of data types are poised for greater real-time usage. Some are in limited use today but will experience aggressive adoption within three years, namely social media data (38%), Web logs and clickstreams (34%), and unstructured data (34%). Others are handled in real time today and will become even more so, namely event (36%), semi-structured (33%), structured (31%), and hierarchical (30%) data.
  • 15. MongoDB Integration with BI and Analytics Tools To make online big data actionable through dashboards, reports, visualizations and integration with other data sources, it must be accessible to established BI and analytics tools. MongoDB offers integration with more of the leading BI tools than any other NoSQL or online big data technology, including: Actuate Alteryx Informatica Jaspersoft Logi Analytics MicroStrategy Pentaho Qliktech SAP Lumira
  • 16. WindyGrid’s One person, one laptop, and MongoDB’s technology jumpstarted a project that, with other people joining in, went from prototype to one of the nation’s pioneering projects to analyze and act on municipal data in real time. In just four months. WindyGrid put Chicago on the path of revolutionizing how it operates not by replacing the administrative systems already in place, but by using MongoDB to bring that data together into a new application. With MongoDB’s flexible data model, WindyGrid doesn’t have to go back and redo the schema for each new piece of data. Instead, it can evolve schemas in real time. Which is crucial as WindyGrid expands and adds predictive analytics, growing by millions of pieces of structured and unstructured data each day.
  • 17. Crittercism is A Mobile Pioneer Crittercism doesn’t just monitor apps or gather information. Using MongoDB’s powerful built in query functions, it analyzes avalanches of unstructured and non-uniform data in real time. It recognizes patterns, identifies trends, and diagnoses problems. That means that Cirttercism’s customers immediately understand the root cause of problems and the impact they’re having on business. So they know how to prioritize and correct the problems they’re facing and improve performance The kind of real time analysis that Crittercism provides customers would also be impossible with traditional databases. Crittercism is using MongoDB’s powerful query functions to analyze the broad variety of data it collects, in real time, within the database. A more traditional data warehouse approach, with ETLs and long loading times, can’t match this type of speed. At the same time, MongoDB lets Crittercism efficiently handle the tons of data it’s collecting. During the past two years, the number of requests that Crittercism gathers and analyzes has jumped from 700 to 45,000 per second. Relational databases have a hard time scaling to meet these kinds of demands, typically requiring expensive add-on software, or additional layers of proprietary code, to keep up. With MongoDB, horizontal scalability across multiple data centers is a native function.
  • 18. McAfee - Global Cybersecurity GTI analyzes cyberthreats from all angles, identifying threat relationships, such as malware used in network intrusions, websites hosting malware, botnet associations, and more. Threat information is extremely time sensitive; knowing about a threat from weeks ago is useless. In order to provide up to date, comprehensive threat information, needs to quickly process terabytes of different data types (such as IP address or domain) into meaningful relationships: e.g. Is this web site good or bad? What other sites have been interacting with it? The success of the cloud-based system also depends on a bidirectional data flow: GTI gathers data from millions of client sensors and provides real-time intelligence back to these end products, at a rate of 100 billion queries per month. Was unable to address these needs and effectively scale out to millions of records with their existing solutions. For example, the HBase / Hadoop setup made it difficult to run interesting, complex queries, and experienced bugs with the Java garbage collector running out of memory. Another issue was with sharding and syncing; Lucene was able to index in interesting ways, but required too much customization. compensated for all the rebuilding and redeploying of Katta shards with “the usual scripting duct tape,” but what they really needed was a solution that could seamlessly handle the sharding and updating on its own. selected MongoDB, which had excellent documentation and a growing community that was “on fire.”
  • 19. Power Journalism BuzzFeed, the social news and entertainment company, relies on MongoDB to analyze all performance data for its content across the social web. A core part of BuzzFeed’s publishing platform, MongoDB exposes metrics to editors and writers in real time, to help them understand how its content is performing and to optimize for the social web. The company has been using MongoDB since 2010. Here’s why. 1.Analytics provide more insight, more quickly. relies on MongoDB for its strategic analytics platform. With apps and dashboards built on MongoDB, can pinpoint when content is viewed and how it is shared. With this approach, is able to quickly gain insight on how its content performs, nimbly optimize user’s experience for posts that are performing best and is able to deliver critical feedback to its writers and editors. 2.BuzzFeed is data-driven. At BuzzFeed, data drives decision-making and powers the company. MongoDB enables to effectively analyze, track and expose a range of metrics to writers and employees. This includes: the number of clicks; how often and where posts are being shared; which views on different social media properties lead to the most shares; and how views differ across mobile and desktop. 3.Successful web journalism demands scale. processes large volumes of data and this is increasing each year as the site’s traffic continues to grow. Originally built on a relational data store, decided to use MongoDB, a more scalable solution, to collect and track the data they need with a richer functionality than a standard key-value store. 4.Editors gain edge with access to data in minutes. Fast, easy access to data is critical to helping editors determine what content will be most shareable in the social media world. With MongoDB, is able to expose performance data shortly after publication, enabling editors to quickly respond by tweaking headlines and determine the best way to promote. 5.Setting the infrastructure for new applications. As continues its efforts to leverage stats and optimization, MongoDB will feature prominently in the new infrastructure. MongoDB makes it easy to build apps quickly – a requirement as rolls out additional products.