SlideShare a Scribd company logo
1 of 4
IQT QUARTERLY




PREDICTIVE ANALYTICS ALONE
IS NOT THE ANSWER
By Ron Bodkin and Rick Farnell




What is the Hype About Analytics?                          The opportunity to re-compute analyses. It’s
                                                           impressive to analyze data just once, comprehensively
In the last year, the world has seen a significant trend
                                                           and thoroughly, to yield accurate results. But what
emerge — a much greater capacity to perform
                                                           happens when you later need to analyze the data from
analytics as a result of greater data storage and
                                                           a different perspective or to add data you didn’t have at
processing power. This trend was underscored in
                                                           the time of the original analysis? This capability to
February 2011 in a classic human vs. computer
                                                           re-compute an analysis is becoming more powerful as
challenge on the TV show Jeopardy!. Watson, an
                                                           companies invest in data science capabilities. The very
analytic supercomputer built using Hadoop, took on
                                                           role of a data scientist is a new one, having emerged
two of the greatest human champions and won.
                                                           from this explosion in analytic capabilities. Data
Since the adoption of the Internet in the late 1990s,      science is common in genetics research or in the
there has been an exponential increase in the amount       financial trading industry but data scientists are
of data being produced, much of it less structured than    beginning to work in retail, advertising, manufacturing,
traditional database data, and a surge of data             and other domains that have traditionally not included
integration, correlating a wider array of data than ever   scientists or quants. Algorithm development is now
before assembled. Today’s Big Data compute clusters        not just optimized for text search relevance but also
(whether hosted in the cloud or in traditional data        for advertising scenarios, recommendations, complex
centers) are capable of processing massive data sets.      trading products, understanding sentiment on social
Each year brings new milestones in the capacity for        networks, determining security risks across multi-
processing data, and in the volumes of data that can       channel outlets, and many more areas.
be usefully integrated. Our connected world is made
                                                           Flexible data. Flexible data is the ability to compute
up of machines, sensors, and humans all producing
                                                           what you want over a data set. Today there are many
data onto the connected network. Where is this data
                                                           articles written on “Big Data,” but this trend is not just
going? It is being consumed, stored, and analyzed by
                                                           about the size of data. One organization’s Big Data is
tomorrow’s leading organizations.
                                                           another company’s sample set. It’s about flexible data
Think Big Analytics was founded in 2010 to help            — using data to solve a business problem, often in a
organizations leverage the power of advanced               way that wasn’t anticipated. What happens when your
analytics, making it easier to assemble the right          business purpose changes? Can you get access to the
technologies and reduce the time to value gained           original raw data before it was processed? This
by applying these techniques. Let’s look at some of        concept is core to the flexible data principle. Storing
the new opportunities that we are seeing at our            data in its raw format and holding on to it for future
customer deployments.                                      analysis was possible a couple years ago, but it was




                                                           IQT QUARTERLY SPRING 2011          Vol. 2 No. 4       05
IQT QUARTERLY




painfully expensive and time consuming. The most           A new cadre of “agile” organizations are building
common practice at the time was to pre-compute             integrated predictive modeling, active listening
specific summaries in a data warehouse to answer            capabilities, advanced dashboards, and flexible
questions that were anticipated in advance, often at       business response procedures. These organizations
great investment of time and money. Thanks to              are more successful because they are able to better
Facebook, Quantcast, and other web properties that         harness the explosion of available data and begin to
invested in building the open source Hadoop                take measures to shape the activity and actions of
distributed storage and processing system, the rest of     their clients, partners, and influencers.
the world now has the ability to store, access, and
                                                           Predictive analytics examples. Predictive analytics is
process raw data for a relatively small price with
                                                           the practice of applying machine learning principles to
unprecedented scalability and flexibility.
                                                           drive operational decision-making. For example:
Not a static dashboard any more. Organizations have
                                                           • In online advertising, firms like Quantcast build
traditionally invested in analytics to populate static
                                                             lookalikes that are predictive models for what ad
dashboards that reflect metrics deemed to be
                                                             impressions are likely to convert, and establish a
important at one level of the organization or another.       value for real-time bidding on ad exchanges.
While there are many approaches to building these
dashboards, the common theme has been that they            • For search, predictive analytics allows optimized
are backward looking. What if you wanted to predict          results matched to individual interests.
what is going to happen in the future? What if your        • Brands can anticipate behaviors and shape the
goal was to predict this future in a timely fashion and      sentiment of customers.
with high accuracy? What if you wanted to listen to
your data and make course corrections to influence          • In IT asset management, organizations can predict
                                                             problems before systems fail, and proactively
these predictions? Leading organizations are
                                                             schedule repairs.
designing and developing data science capabilities that
can predict their business activities with surprising      • In banking and insurance, firms can provide risk
accuracy. In order to accomplish this, successful            models for their field organizations to help with
companies are making investments in continuous               decision-making that matches the overall risk
analysis, course correction, flexible data integration,       preferences of the firm.
and A/B testing of their algorithms.                       • In manufacturing, organizations can predict factory
Individual level data mining. Traditional analyses are       machine failures and have parts pre-ordered and
based on aggregating data about the people or events         waiting for service installation to minimize
                                                             production downtime.
that underlie the data. By contrast, individual level
data mining allows investigation down to the details of    • In retail, companies can better predict a customer’s
a single occurrence or a single event, to allow building     preferences, offering unique, personalized
a deeper understanding of new phenomena. They can            recommendations and cross-sales opportunities
also be used to develop algorithms that more                 matched to each individual customer. Notably,
effectively predict activity. For example, if you see an     Netflix and Amazon have made significant
increase in communications activity in a certain area,       advances in their recommendation engines using
it’s important to be able to drill down to see detailed      these technologies.
records of actual events that underlie the activity, as    • In healthcare, organizations can offer predictive
well as being able to re-summarize within a small            care programs matched to a unique individual based
context to get more details about what is happening          on patterns that exist in data for similar patients.
(e.g., to spot unusual patterns like anomalous levels
                                                           • Financial trading firms can use predictive analytics
of communication from the given area to another area,
                                                             and algorithms to model trends in the market to
or unusual call durations).
                                                             optimize trading decisions.
Predictive models and feedback loop. Predictive
                                                           • All organizations can use predictive analytics to
models without the ability to monitor actual results         arm themselves with models that can determine
are not very useful. Predictive models without the           fraud and security risks, even detecting the smallest
ability to respond based on their predictions are worse.     of variance.




 06      Vol. 2 No. 4    Identify. Adapt. Deliver. ™
IQT QUARTERLY




• Customer retention can be improved greatly by                      reads of data. In particular, for low latency analytic
  modeling churn and offering discounted services                    queries SSDs can allow much faster analysis and
  and products to those customers that are at a                      investigation, and support handling larger data sets.
  higher risk to leave.1
                                                                     Another important element of this is the advent of
The real power of these solutions becomes apparent                   reusable platforms that can be used across many
when the business is able to make changes in                         applications and analyses. When Google first
real-time based on these core predictive capabilities.               introduced a MapReduce computing cluster, there was
How Can a Company Develop an                                         a rapid adoption of the technique, showing both the
Advanced Analytics Capability?                                       power of this kind of analytics and also the importance
                                                                     of having a reusable system that can be shared across
The Changing Dynamics of Computing                                   applications. This same experience in adopting these
                                                                     new techniques has been experienced by users of
One of the foundational elements of the new analytics
                                                                     Hadoop and other scale-out clusters.
is the ability to apply a scalable amount of computing
capacity to problems. With the continued progression                 Reference Architecture
of Moore’s Law and related increases in computing
power, commodity hardware is tremendously powerful                   The patterns for how data storage and processing are
nowadays, allowing the application of copious                        organized for advanced analytics are similar even
quantities network bandwidth, storage, CPU, and RAM                  across different domains. There are three important
to distributed computing problems. Notably, some                     arenas needed for this data processing:
aspects of computing are increasing much faster                      Event processing: There’s typically a need to respond
than others.2,3,4                                                    to incoming interactions within milliseconds, e.g., to
                                                                     flag possible fraud, to bid on an auction, to respond to
RESOURCE                     ANNUAL GROWTH RATE                      a routing request, or to make a recommendation.

Network Bandwidth                                       60%
in Data Center
                                                        60%
Disk Storage Density
CPU Performance                                         60%

Disk Transfer Rate                             40%
Random Disk
                                    16%
Operations


The increasing density of disk has allowed storage of
unprecedented quantities of data, which is one of the key
enablers of this trend. Moreover, network bandwidth has
grown to the point where servers can now stream reads
from their disk at wire speed. When you combine this
with the fact that disk transfer rate is lagging storage
density, processor performance, and network bandwidth,
scaling out becomes vital to allow having enough
spindles to sustain high performance data computing.
The rapid increase in performance and rapid decrease
in cost of Solid State Drives (SSDs) are combining to
transform applications that require low latency random

1
    Seven Reasons You Need Predictive Analytics Today, Prediction Impact 2010
2
    Rules of Thumb in Data Engineering: http://www.slidefinder.net/r/rules_thumb_data/engineering/1062757
3
    E.g., 100 megabit Ethernet was first available in 1995, and 100 gigabit Ethernet was first available in 2010,
    representing a CAGR of 58.5%.
4
    http://www.merriam-webster.com/dictionary/moore's%20law




                                                                     IQT QUARTERLY SPRING 2011            Vol. 2 No. 4    07
IQT QUARTERLY




Typically these responses involve a fast response             integration at the scale and performance required.
based on a model that was previously scored in a              Managing distributed data is often a challenge – large
cluster. In large volume applications, this response          data sets are slow to move across a WAN, and keeping
often involves horizontally scaling out a database for        consistent copies of information among clusters and
reading and writing state, which has been the raison          data centers pose their own challenges. Pooling rich
d’être for NoSQL databases. For some applications,            information in one place also makes it important to
there’s a need for more advanced correlation among            have effective data security. In regulated industries,
events, which has led to the development of Complex           there is increased investment in protecting data and
Event Processing systems.                                     providing access controls. Hadoop security now
                                                              supports client authentication and file-level
Batch processing: To respond effectively in near
                                                              authorization. Additional security can be provided by
real-time, it’s important to apply analytics in advance,
                                                              encrypting fields at rest and in transit, and with
by crunching large amounts of data. This is where
                                                              physical separation of data.
scale-out clusters, such as those built on Hadoop
MapReduce, really shine. Immediately, this includes           These new patterns of computing are driving
the production cycle, which involves updating profiles         tremendous innovation. There has been considerable
for items (cookies, placements, content, places,              investment in open source technologies such as
devices, etc.) that can in turn be pushed out for             Hadoop, HBase, MongoDB, Membase, Oozie, Flume,
real-time event response and analytics. However, the          Pig, Hive, and R. As the market has expanded,
cluster is also used for a science cycle, which is a          commercial vendors have expanded the investment,
process of investigation and improvement that’s used          building products like Cloudera Enterprise, MapR
to improve the production cycle — typically new               Technologies, Datameer, IBM Big Sheets, and
approaches are simulated in the cluster and when              Karmasphere. It’s important to have a good breadth
they appear promising, they are A/B tested.                   of understanding of the technologies when assembling
                                                              a solution.
Fast analytics: Both data scientists and business
analysts need access to summarized calculations of            Advanced analytics is a fast moving arena, and as such
common values to explore and visualize data, and to           it is highly desirable to build capability iteratively, with a
make decisions. Some of these values need to be               focus on getting results to business decision makers
available quickly to facilitate faster iterations and         quickly. This allows the organization to learn and adjust
quick decision making (e.g., for reporting and                the approach, as well as to get quick feedback on
common decision support needs). This kind of analytic         techniques and technologies that are working. Naturally,
information is another kind that is typically pre-            it also allows for a reduced time to value — getting real
computed in a cluster in batch, and then exported to a        results from better analysis, and driving a virtuous cycle
low latency database (whether relational or NoSQL).           of improved data that can be used for future experiments.

A Hadoop cluster becomes a hub of information both            In summary, advanced analytics have arrived and are
from within an organization, and leveraging important         having a significant impact across a wide variety of
data from outside, allowing distillation of information       domains. The unprecedented ability to store and
from data. Naturally, data integration is central to          analyze data is allowing for a new class of applications,
making these architectures work, and there are many           and bringing more data to bear on decisions than ever
important technologies and patterns to support                before possible.


Ron Bodkin is Founder and CEO of Think Big Analytics, which helps customers leverage new data processing
technologies like Hadoop, NoSQL databases, and R for statistical analysis. Previously Ron was the VP of Engineering for
Quantcast. Each day Quantcast ingests 10 billion events and produces more than a petabyte of data using Hadoop. The
Quantcast MapReduce stack handles production data processing, ad hoc analysis, data mining and machine learning.
Prior to that, Ron was a founder of enterprise consulting companies C-bridge Internet Solutions and New Aspects.

Rick Farnell is President and Co-Founder of Think Big Analytics, and has over 15 years of global consulting and
management experience. Rick has held key positions at several successful technology companies including Sun
Microsystems, SeeBeyond, eXcelon and C-bridge Internet Solutions, where he helped grow the firm to employ over 800
consultants, leading to a successful IPO in 1999. Rick is Founder of Rapid Formation which helps incubate, fund, and
scale startup technology companies.




 08      Vol. 2 No. 4    Identify. Adapt. Deliver. ™

More Related Content

Viewers also liked

Post nl checkout2_mediact_fas_presentatie_final
Post nl checkout2_mediact_fas_presentatie_finalPost nl checkout2_mediact_fas_presentatie_final
Post nl checkout2_mediact_fas_presentatie_finalTjitte Folkertsma
 
Marketing Plan 4Life Indonesia
Marketing Plan 4Life IndonesiaMarketing Plan 4Life Indonesia
Marketing Plan 4Life IndonesiaHardi Haerudih
 
Tr abajo dioses griegos
Tr abajo dioses griegos Tr abajo dioses griegos
Tr abajo dioses griegos ceiplasoledad
 
Think Big Analytics Corporate Deck Hadoop Summit June 2011
Think Big Analytics Corporate Deck Hadoop Summit June 2011Think Big Analytics Corporate Deck Hadoop Summit June 2011
Think Big Analytics Corporate Deck Hadoop Summit June 2011r_farnell
 
Using computer networks as facility to share information
Using computer networks as facility to share informationUsing computer networks as facility to share information
Using computer networks as facility to share informationguestfe0065
 
kyoto protocol and cdm
kyoto protocol and cdmkyoto protocol and cdm
kyoto protocol and cdmnaman jain
 

Viewers also liked (8)

Post nl checkout2_mediact_fas_presentatie_final
Post nl checkout2_mediact_fas_presentatie_finalPost nl checkout2_mediact_fas_presentatie_final
Post nl checkout2_mediact_fas_presentatie_final
 
Marketing Plan 4Life Indonesia
Marketing Plan 4Life IndonesiaMarketing Plan 4Life Indonesia
Marketing Plan 4Life Indonesia
 
Trabajo ingles
Trabajo inglesTrabajo ingles
Trabajo ingles
 
Tr abajo dioses griegos
Tr abajo dioses griegos Tr abajo dioses griegos
Tr abajo dioses griegos
 
Dioses griegos
Dioses griegosDioses griegos
Dioses griegos
 
Think Big Analytics Corporate Deck Hadoop Summit June 2011
Think Big Analytics Corporate Deck Hadoop Summit June 2011Think Big Analytics Corporate Deck Hadoop Summit June 2011
Think Big Analytics Corporate Deck Hadoop Summit June 2011
 
Using computer networks as facility to share information
Using computer networks as facility to share informationUsing computer networks as facility to share information
Using computer networks as facility to share information
 
kyoto protocol and cdm
kyoto protocol and cdmkyoto protocol and cdm
kyoto protocol and cdm
 

Recently uploaded

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Predictive Analytics & Hadoop: InQTel Qtrly Spr 2011 Think Big Bodkin & Farnell

  • 1. IQT QUARTERLY PREDICTIVE ANALYTICS ALONE IS NOT THE ANSWER By Ron Bodkin and Rick Farnell What is the Hype About Analytics? The opportunity to re-compute analyses. It’s impressive to analyze data just once, comprehensively In the last year, the world has seen a significant trend and thoroughly, to yield accurate results. But what emerge — a much greater capacity to perform happens when you later need to analyze the data from analytics as a result of greater data storage and a different perspective or to add data you didn’t have at processing power. This trend was underscored in the time of the original analysis? This capability to February 2011 in a classic human vs. computer re-compute an analysis is becoming more powerful as challenge on the TV show Jeopardy!. Watson, an companies invest in data science capabilities. The very analytic supercomputer built using Hadoop, took on role of a data scientist is a new one, having emerged two of the greatest human champions and won. from this explosion in analytic capabilities. Data Since the adoption of the Internet in the late 1990s, science is common in genetics research or in the there has been an exponential increase in the amount financial trading industry but data scientists are of data being produced, much of it less structured than beginning to work in retail, advertising, manufacturing, traditional database data, and a surge of data and other domains that have traditionally not included integration, correlating a wider array of data than ever scientists or quants. Algorithm development is now before assembled. Today’s Big Data compute clusters not just optimized for text search relevance but also (whether hosted in the cloud or in traditional data for advertising scenarios, recommendations, complex centers) are capable of processing massive data sets. trading products, understanding sentiment on social Each year brings new milestones in the capacity for networks, determining security risks across multi- processing data, and in the volumes of data that can channel outlets, and many more areas. be usefully integrated. Our connected world is made Flexible data. Flexible data is the ability to compute up of machines, sensors, and humans all producing what you want over a data set. Today there are many data onto the connected network. Where is this data articles written on “Big Data,” but this trend is not just going? It is being consumed, stored, and analyzed by about the size of data. One organization’s Big Data is tomorrow’s leading organizations. another company’s sample set. It’s about flexible data Think Big Analytics was founded in 2010 to help — using data to solve a business problem, often in a organizations leverage the power of advanced way that wasn’t anticipated. What happens when your analytics, making it easier to assemble the right business purpose changes? Can you get access to the technologies and reduce the time to value gained original raw data before it was processed? This by applying these techniques. Let’s look at some of concept is core to the flexible data principle. Storing the new opportunities that we are seeing at our data in its raw format and holding on to it for future customer deployments. analysis was possible a couple years ago, but it was IQT QUARTERLY SPRING 2011 Vol. 2 No. 4 05
  • 2. IQT QUARTERLY painfully expensive and time consuming. The most A new cadre of “agile” organizations are building common practice at the time was to pre-compute integrated predictive modeling, active listening specific summaries in a data warehouse to answer capabilities, advanced dashboards, and flexible questions that were anticipated in advance, often at business response procedures. These organizations great investment of time and money. Thanks to are more successful because they are able to better Facebook, Quantcast, and other web properties that harness the explosion of available data and begin to invested in building the open source Hadoop take measures to shape the activity and actions of distributed storage and processing system, the rest of their clients, partners, and influencers. the world now has the ability to store, access, and Predictive analytics examples. Predictive analytics is process raw data for a relatively small price with the practice of applying machine learning principles to unprecedented scalability and flexibility. drive operational decision-making. For example: Not a static dashboard any more. Organizations have • In online advertising, firms like Quantcast build traditionally invested in analytics to populate static lookalikes that are predictive models for what ad dashboards that reflect metrics deemed to be impressions are likely to convert, and establish a important at one level of the organization or another. value for real-time bidding on ad exchanges. While there are many approaches to building these dashboards, the common theme has been that they • For search, predictive analytics allows optimized are backward looking. What if you wanted to predict results matched to individual interests. what is going to happen in the future? What if your • Brands can anticipate behaviors and shape the goal was to predict this future in a timely fashion and sentiment of customers. with high accuracy? What if you wanted to listen to your data and make course corrections to influence • In IT asset management, organizations can predict problems before systems fail, and proactively these predictions? Leading organizations are schedule repairs. designing and developing data science capabilities that can predict their business activities with surprising • In banking and insurance, firms can provide risk accuracy. In order to accomplish this, successful models for their field organizations to help with companies are making investments in continuous decision-making that matches the overall risk analysis, course correction, flexible data integration, preferences of the firm. and A/B testing of their algorithms. • In manufacturing, organizations can predict factory Individual level data mining. Traditional analyses are machine failures and have parts pre-ordered and based on aggregating data about the people or events waiting for service installation to minimize production downtime. that underlie the data. By contrast, individual level data mining allows investigation down to the details of • In retail, companies can better predict a customer’s a single occurrence or a single event, to allow building preferences, offering unique, personalized a deeper understanding of new phenomena. They can recommendations and cross-sales opportunities also be used to develop algorithms that more matched to each individual customer. Notably, effectively predict activity. For example, if you see an Netflix and Amazon have made significant increase in communications activity in a certain area, advances in their recommendation engines using it’s important to be able to drill down to see detailed these technologies. records of actual events that underlie the activity, as • In healthcare, organizations can offer predictive well as being able to re-summarize within a small care programs matched to a unique individual based context to get more details about what is happening on patterns that exist in data for similar patients. (e.g., to spot unusual patterns like anomalous levels • Financial trading firms can use predictive analytics of communication from the given area to another area, and algorithms to model trends in the market to or unusual call durations). optimize trading decisions. Predictive models and feedback loop. Predictive • All organizations can use predictive analytics to models without the ability to monitor actual results arm themselves with models that can determine are not very useful. Predictive models without the fraud and security risks, even detecting the smallest ability to respond based on their predictions are worse. of variance. 06 Vol. 2 No. 4 Identify. Adapt. Deliver. ™
  • 3. IQT QUARTERLY • Customer retention can be improved greatly by reads of data. In particular, for low latency analytic modeling churn and offering discounted services queries SSDs can allow much faster analysis and and products to those customers that are at a investigation, and support handling larger data sets. higher risk to leave.1 Another important element of this is the advent of The real power of these solutions becomes apparent reusable platforms that can be used across many when the business is able to make changes in applications and analyses. When Google first real-time based on these core predictive capabilities. introduced a MapReduce computing cluster, there was How Can a Company Develop an a rapid adoption of the technique, showing both the Advanced Analytics Capability? power of this kind of analytics and also the importance of having a reusable system that can be shared across The Changing Dynamics of Computing applications. This same experience in adopting these new techniques has been experienced by users of One of the foundational elements of the new analytics Hadoop and other scale-out clusters. is the ability to apply a scalable amount of computing capacity to problems. With the continued progression Reference Architecture of Moore’s Law and related increases in computing power, commodity hardware is tremendously powerful The patterns for how data storage and processing are nowadays, allowing the application of copious organized for advanced analytics are similar even quantities network bandwidth, storage, CPU, and RAM across different domains. There are three important to distributed computing problems. Notably, some arenas needed for this data processing: aspects of computing are increasing much faster Event processing: There’s typically a need to respond than others.2,3,4 to incoming interactions within milliseconds, e.g., to flag possible fraud, to bid on an auction, to respond to RESOURCE ANNUAL GROWTH RATE a routing request, or to make a recommendation. Network Bandwidth 60% in Data Center 60% Disk Storage Density CPU Performance 60% Disk Transfer Rate 40% Random Disk 16% Operations The increasing density of disk has allowed storage of unprecedented quantities of data, which is one of the key enablers of this trend. Moreover, network bandwidth has grown to the point where servers can now stream reads from their disk at wire speed. When you combine this with the fact that disk transfer rate is lagging storage density, processor performance, and network bandwidth, scaling out becomes vital to allow having enough spindles to sustain high performance data computing. The rapid increase in performance and rapid decrease in cost of Solid State Drives (SSDs) are combining to transform applications that require low latency random 1 Seven Reasons You Need Predictive Analytics Today, Prediction Impact 2010 2 Rules of Thumb in Data Engineering: http://www.slidefinder.net/r/rules_thumb_data/engineering/1062757 3 E.g., 100 megabit Ethernet was first available in 1995, and 100 gigabit Ethernet was first available in 2010, representing a CAGR of 58.5%. 4 http://www.merriam-webster.com/dictionary/moore's%20law IQT QUARTERLY SPRING 2011 Vol. 2 No. 4 07
  • 4. IQT QUARTERLY Typically these responses involve a fast response integration at the scale and performance required. based on a model that was previously scored in a Managing distributed data is often a challenge – large cluster. In large volume applications, this response data sets are slow to move across a WAN, and keeping often involves horizontally scaling out a database for consistent copies of information among clusters and reading and writing state, which has been the raison data centers pose their own challenges. Pooling rich d’être for NoSQL databases. For some applications, information in one place also makes it important to there’s a need for more advanced correlation among have effective data security. In regulated industries, events, which has led to the development of Complex there is increased investment in protecting data and Event Processing systems. providing access controls. Hadoop security now supports client authentication and file-level Batch processing: To respond effectively in near authorization. Additional security can be provided by real-time, it’s important to apply analytics in advance, encrypting fields at rest and in transit, and with by crunching large amounts of data. This is where physical separation of data. scale-out clusters, such as those built on Hadoop MapReduce, really shine. Immediately, this includes These new patterns of computing are driving the production cycle, which involves updating profiles tremendous innovation. There has been considerable for items (cookies, placements, content, places, investment in open source technologies such as devices, etc.) that can in turn be pushed out for Hadoop, HBase, MongoDB, Membase, Oozie, Flume, real-time event response and analytics. However, the Pig, Hive, and R. As the market has expanded, cluster is also used for a science cycle, which is a commercial vendors have expanded the investment, process of investigation and improvement that’s used building products like Cloudera Enterprise, MapR to improve the production cycle — typically new Technologies, Datameer, IBM Big Sheets, and approaches are simulated in the cluster and when Karmasphere. It’s important to have a good breadth they appear promising, they are A/B tested. of understanding of the technologies when assembling a solution. Fast analytics: Both data scientists and business analysts need access to summarized calculations of Advanced analytics is a fast moving arena, and as such common values to explore and visualize data, and to it is highly desirable to build capability iteratively, with a make decisions. Some of these values need to be focus on getting results to business decision makers available quickly to facilitate faster iterations and quickly. This allows the organization to learn and adjust quick decision making (e.g., for reporting and the approach, as well as to get quick feedback on common decision support needs). This kind of analytic techniques and technologies that are working. Naturally, information is another kind that is typically pre- it also allows for a reduced time to value — getting real computed in a cluster in batch, and then exported to a results from better analysis, and driving a virtuous cycle low latency database (whether relational or NoSQL). of improved data that can be used for future experiments. A Hadoop cluster becomes a hub of information both In summary, advanced analytics have arrived and are from within an organization, and leveraging important having a significant impact across a wide variety of data from outside, allowing distillation of information domains. The unprecedented ability to store and from data. Naturally, data integration is central to analyze data is allowing for a new class of applications, making these architectures work, and there are many and bringing more data to bear on decisions than ever important technologies and patterns to support before possible. Ron Bodkin is Founder and CEO of Think Big Analytics, which helps customers leverage new data processing technologies like Hadoop, NoSQL databases, and R for statistical analysis. Previously Ron was the VP of Engineering for Quantcast. Each day Quantcast ingests 10 billion events and produces more than a petabyte of data using Hadoop. The Quantcast MapReduce stack handles production data processing, ad hoc analysis, data mining and machine learning. Prior to that, Ron was a founder of enterprise consulting companies C-bridge Internet Solutions and New Aspects. Rick Farnell is President and Co-Founder of Think Big Analytics, and has over 15 years of global consulting and management experience. Rick has held key positions at several successful technology companies including Sun Microsystems, SeeBeyond, eXcelon and C-bridge Internet Solutions, where he helped grow the firm to employ over 800 consultants, leading to a successful IPO in 1999. Rick is Founder of Rapid Formation which helps incubate, fund, and scale startup technology companies. 08 Vol. 2 No. 4 Identify. Adapt. Deliver. ™