Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
HP Vertica Architecture Boosts Performance of Toughest BI Queries for Infinity Insurance
1. HP Vertica Architecture Gives Massive Performance Boost
to Toughest BI Queries for Infinity Insurance
Transcript of a BriefingsDirect podcast on how a major insurance company is using improved
data architecture to gain a competitive advantage.
Listen to the podcast. Find it on iTunes. Sponsor: HP
Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance
Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your
moderator for this ongoing discussion of IT innovation and how it’s making an
impact on people’s lives.
Once again, we're focusing on how IT leaders are improving their services'
performance to deliver better experiences and payoffs for businesses and end
users alike, and this time we're coming to you directly from the HP Discover 2013
Conference in Las Vegas. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]
Our next innovation case study interview highlights how Infinity Insurance Companies in
Birmingham, Alabama has been deploying a new data architecture to improve productivity for
their analysis and business intelligence (BI).
To learn more about how they are improving their performance and their results for their
business activities, please join me in welcoming our guest, Barry Ralston, the Assistant Vice
President for Data Management at Infinity Insurance Companies. Welcome, Barry.
Barry Ralston: Thanks for having me, Dana.
Gardner: You're welcome. Tell me a bit about the need. What was it that you've been doing with
your BI and data warehousing that prompted you to seek a change?
Ralston: Like many companies in our space, we have constructed an enterprise data warehouse
deployed to a row-store technology. In our case, it was initially Oracle RAC and
then, eventually, the Oracle Exadata engineered hardware/software appliance.
We were noticing that analysis that typically occurs in our space wasn’t really
optimized for execution via that row store. Based on my experience with Vertica,
we did a proof of concept with a couple of other alternative and analytic store-type
databases. We specifically chose Vertica to achieve higher productivity and to
allow us to focus on optimizing queries and extracting value out of the data.
Gardner: Before we learn more about how that’s worked out for you, maybe you could explain
for our listeners’ benefit, what Infinity Insurance Companies does. How big are you, and how
important is data and analysis to you?
2. Ralston: We are billion-dollar property and casualty company, headquartered in Birmingham,
Alabama. Like any insurance carrier, data is key to what we do. But one of the things that drew
me to Infinity, after years of being in a consulting role, was the idea of their determination to use
data as a strategic weapon, not just IT as a whole, but data specifically within that larger IT as a
strategic or competitive advantage.
Vertica environment
Gardner: You have quite a bit of internal and structured data. Tell me a bit what happened
when you moved into a Vertica environment, first to the proof of concept and then
into production?
Ralston: For the proof of concept, we took the most difficult or worst-
performing queries from our Exadata implementation and moved that entire
enterprise data warehouse set into a Vertica deployment on three Dual Hex Core,
DL380 type machines. We're running at the same scale, with the same data, with
the same queries.
We took the top 12 worst-performing queries or longest-running queries from the Exadata
implementation, and not one of the proof of concept queries ran less than a hundred times faster.
It was an easy decision to make in terms of the analytic workload, versus trying to use the row-
store technology that Oracle has been based on.
Gardner: Let’s dig into that a bit. I'm not a computer scientist and I don’t claim to fully
understand the difference between row store, relational, and the column-based approach for
Vertica. Give us the quick "Data Architecture 101" explanation of why this improvement is so
impressive?
Ralston: The original family of relational databases -- the current big three are Oracle, SQL
Server and DB2 -- are based on what we call row-storage technologies. They store information in
blocks on disks, writing an entire row at a time.
If you had a record for an insured, you might have the insured's name, the date the policy went
into effect, the date the policy next shows a payment, etc. All those attributes were written all at
the same time in series to a row which is combined into a block.
So storage has to be allocated in a particular fashion, to facilitate things like updates. It’s an
optimal way of storing data for transaction processing. For now, it’s probably the state-of-the-art
for that. If I am running an accounting system or a quote system, that’s the way to go.
Analytic queries are fundamentally different than transaction-processing queries. Think of the
transaction processing as a cash register. You ring up a sale with a series of line items. Those get
written to that row store database and that works well.
3. But when I want to know the top 10 products sold to my most profitable 20 percent of customers
in a certain set of regions in the country, those set-based queries don’t perform well without
major indexing. Often, that relates back to additional physical storage in a row-storage
architecture.
Column store databases -- Vertica is a native column store database -- store data fundamentally
differently than those row stores. We might break down a record into an entire set of columns or
store distinctly. This allows me to do a couple of different things from an architectural level.
Sort, compress, organize
First and foremost, I can sort, compress, and organize the data on disk much more efficiently.
Compression has been recently added to row storage architectures, but in a row-storage database,
you largely have to compress at the entirety of a row.
I can’t choose an optimal compression algorithm for just a date, because in that row, I will have
text, numbers, and dates. In a column store, I can apply specific compression algorithm to the
data that's in that column. So date gets one algorithm, a monotone increasing key like a surrogate
key you might have in a dimensional data warehouse, has a different encoding algorithm, etc.
This is sorting. How data gets retrieved is fundamentally different, another big point for row-
storage databases at query time. I could say, "Tell me all the customers that bought a product in
California, but I only want to know their last name."
If I have 20 different attributes, a row-storage database actually has to read all the attributes off
of disk. The query engine eliminates the ones I didn’t ask for in the eventual results, but I've
already incurred the penalty of the I/O. This has a huge impact when you think of things like call
detail records in telecom which have a 144-some odd columns.
If I'm only asking against a column store database, "Give me all the people who have last names,
who bought a product in California," I'm essentially asking the database to read two columns off
disk, and that’s all that’s happening. My I/O factors are improved by an order of 10 or in the case
of the CDR, 1 in 144.
Gardner: Fundamentally it’s the architecture that’s different. You can’t just go back and increase
your I/O improvements in those relational environments by making it in-memory or cutting
down on the distance between the data and the processing. That only gets you so far, and you can
only throw hardware at it so much. Fundamentally, it’s about the architecture.
Ralston: Absolutely correct. You've seen a lot of these -- I think one of the fun terms around this
is "unnatural acts with data," as to how data gets either scattered or put into a cache or other
things. Every time you introduce one of these mechanisms, you're putting another bottleneck
between near real-time analytics, and getting the data from a source system into a user’s hands
4. for analytics. Think of a cache. If you’re going to cache, you’ve got to warm that cache up to get
an effect.
If I'm streaming data in from a sensor, real-time location servers, or something like that, I don’t
get a whole lot of value out of the cache to start until it gets warmed up. I totally agree with your
point there, Dana, that it’s all about the architecture.
Gardner: So you’ve gained on speed and scale, and you're able to do things you couldn’t do
differently when it comes to certain types of data. That’s all well and good for us folks who are
interested in computers. What about the people who are interested in insurance? What were you
able to bring back to your company that made a difference for them and their daily business
that’s now allowed you to move beyond your proof of concept into wider production?
Ralston: The great question is what ends up being the business value. In short, leveraging
Vertica, the underlying architecture allows me to create a playfield, if you will, for business
analysts. They don’t necessarily have to be data scientists to enjoy it and be able to relate things
that have a business relationship between each other, but not necessarily one that’s reflected in
the data model, for whatever reason.
Performance suffers
Obviously in a row storage architecture, and specifically within dimensional data warehouses,
if there is no index between a pair of columns, your performance begins to suffer. Vertica creates
no indexes and it’s self-indexing the data via sorting and encoding.
So if I have an end user who wants to analyze something that’s never been analyzed before, but
has a semantic relationship between those items, I don’t have to re-architect the data storage for
them to get information back at the speed of their decision.
Gardner: You've been able to apply the Vertica implementation to some of your existing queries
and you’ve gotten some great productivity benefits from that. What about opening this up to
some new types of data and/or giving your users the folks in the insurance company the
opportunity to look to external types of queries and learn more about markets, where they can
apply new insurance products and grow the bottom line rather than just repay cowpaths?
Ralston: That's definitely part of our strategic plan. Right now, 100 percent of the data being
leveraged at Infinity is structured. We're leveraging Vertica to manage all that structured data, but
we have a plan to leverage Hadoop and the Vertica Hadoop connectors, based on what I'm seeing
this week around HAVEn, the idea of being able to seamlessly structured, non-structured data
from one point.
Insurance is an interesting business in that, as my product and pricing people look for the next
great indicator of risk, we essentially get to ride a wave of that competitive advantage for as long
a period of time as it takes us to report that new rate to a state. The state shares that with our
5. competitors, and then our competitors have to see if they want to bake into their systems what
we’ve just found.
So we can use Vertica as a competitive hammer, Vertica plus Hadoop to do things that our
competitors aren’t able to do. Then, I’ve delivered what my CIO is asking me in terms of data as
a competitive advantage.
Gardner: Well, great. I'm afraid we will have to leave it there. We've been learning about how
Infinity Insurance Companies has been deploying HP Vertica technology and gaining scale and
speed benefits. And now also setting themselves up for perhaps doing types of queries that they
hadn’t been able to do before.
I’d like to thank our guest for joining us. We've been enjoying the company of Barry Ralston, the
Assistant Vice President for Data Management at Infinity Insurance company. Thank so much,
Barry.
Ralston: Thank you very much.
Gardner: I’d like to thank our audience as well for joining us for this special HP Discover
Performance Podcast, coming to you from the HP Discover 2013 Conference in Las Vegas.
I'm Dana Gardner, Principle Analyst at Interarbor Solutions, your host for this ongoing series of
HP-sponsored discussions. Thanks again for listening, and come back next time.
Listen to the podcast. Find it on iTunes. Sponsor: HP
Transcript of a BriefingsDirect podcast on how a major insurance company is using improved
data architecture to gain a competitive advantage. Copyright Interarbor Solutions, LLC,
2005-2013. All rights reserved.
You may also be interested in:
• Defining the New State for Comprehensive Enterprise Security Using CSC Services and
HP Decurity Technology
• Converged Cloud News from HP Discover: What it means
• Liberty Mutual Insurance melds regulatory compliance and security awareness to better
protect assets, customers, and employees
• With Cloud OS, HP takes up mantle of ambassador to the future of hybrid cloud models
• Right-sizing security and information assurance, a core-versus-context journey at Lake
Health
• Podcast recap: HP Experts analyze and explain the HAVEn big data news from HP
Discover
• Heartland CSO instills novel culture that promotes proactive and open responsiveness to
IT security risks