Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
As eCommerce Sites Harvest Big Data, They Mature the Value from Transactional Benefits to Managing Multiple Data Sets Across Cloud Models
1. As eCommerce Sites Harvest Big Data, They Mature the
Value from Transactional Benefits to Managing Multiple
Data Sets Across Cloud Models
Transcript of a Briefings Direct discussion on how HP Vertica helps a big-data consultancy in its
relationship to a variety of enterprises.
Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android.
Sponsor: HP Enterprise
Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm
Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator
for this ongoing sponsored discussion on IT innovation and how it’s making an
impact on people’s lives.
Once again, we're focusing on how companies are adapting to the new style of IT
to improve IT performance and deliver better user experiences, as well as better
business results.
Our next innovation user interview highlights how a consultant is helping big organizations
better manage their big data and provide the insights that they need to thrive in the fast-paced
Digital eCommerce Environment.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
With that, please join me in welcoming our guest. We are here with Jimmy Mohsin. He is the
Principal Software Architect at Norjimm LLC, a consultancy based in Princeton, New Jersey.
Welcome, Jimmy.
Jimmy Mohsin: Thank you, Dana. How are you?
Gardner: We've been hearing an awful lot of about some extraordinary situations where the fast-
paced environment and data volumes that users are dealing with have left them with a need for a
much better architecture.
Tell me what you are seeing in the marketplace? How desperate are people to find the right
architecture now that big data is upon them?
Moshin There's a lot of interest in trying to deal with large data volumes, not only large data
volumes, but also data that changes rapidly. Now, there are many companies that have very large
datasets, some in terabytes, some in petabytes and then they're getting live feeds.
Gardner
2. The data is there and it’s changing rapidly. The traditional databases sometimes can’t handle that
problem, especially if you're using that database as a warehouse and you're reporting against it.
Basically, we have kind of a moving-target situation. With Vertica, what we've seen is the ability
to solve that problem in at least some of the cases that I've come across, and I can talk about
specific use cases in that regard.
Input/output issues
Gardner: Before we get into a specific use case, I'm interested particularly in some of these
input/output issues. People are trying to decide how to move the data around. They're toying with
cloud. They're trying to bring data for more types of traditional repositories. And, as you say,
they're facing new types of data problems with streaming and real-time feeds.
How do you see them beginning this process when they have to handle so many variables? Is it
something that’s an IT architecture, or enterprise architecture, or data architecture? Who's
responsible for this, given that it’s now a rather holistic problem?
Moshin In my present project, we ran into that. The problem is that many companies don't even
have a well defined data-architecture team. Some of them do. You'll find a lot of
companies with an enterprise-architect role and you'll have some companies
with a haphazard definition of an architectural group.
Net-net, at least at this point, unless companies are more structured, it becomes
a management issue in the sense that someone at the leadership level needs to
know who has what domain knowledge and then form the appropriate team to
skin this cat.
I know of a recent situation where we had to build a team of four people, and only one was an
architect. But we built a virtual team of four people who were able to assemble and collate all the
repositories that spanned 15 years and four different technology flavors, and then come up with
an approach that resulted in a single repository in Vertica.
So there are no easy answers yet, because organizations just aren't uniformly structured.
Gardner: Well, I imagine they'll be adapting, just like we all are, to the new realities. In the
meantime, tell me about a specific use case that demonstrates the intensity of scale and velocity,
and how at least one architecture has been deployed to manage that?
Moshin One of my present projects deals with one of the world's largest retailers. It's
eCommerce, online selling. One of the things they do, in addition to their transactions of buying
and selling, is email campaign management. That means staying in touch with the customer on
the basis of their purchases, their interests, and their profiles.
Moshin
3. One of the things we do is see what a certain customer’s buying preferences have been over the
past 90 days. Knowing that and the customer’s profile, we can try to predict what
their buying patterns will be. So we send them a very tailored message in that
regard. In this project, we're dealing with about 150 to 160 million emails a day.
So this is definitely big data.
Here we have online information coming into one warehouse as to what's
happening in the world of buying and selling. Then, behind the scenes, while that
information is being sent to the warehouse, we're trying to do these email campaigns.
This is where the problem becomes fairly complicated. We tried traditional relational database
management systems (RDBMS), and they kind of worked, but we ran into a slew of speed and
performance issues. That's really where the big-data world was really beneficial. We were able to
address that problem in about a seven-month project that we ran.
Gardner: And this was using Vertica?
Large organization
Moshin We did an evaluation. We looked at a few databases, and the corporate choice was
Vertica. We saw that there is a whole bunch of big-data vendors. The issue is that many of the
vendors don't have any large organizations behind them, and Vertica does. The company
management felt that this was a new big database, but HP was behind it, and the fact that they
also use HP hardware helped a lot.
They chose Vertica. The team I was managing did a proof of concept (POC) and we were able to
demonstrate that Vertica would be able to handle the reporting that is tied to the email campaign
management. We ran a 90 day POC, and the results were so positive that there was an interest in
going live. We went live in about another 90 days, following a 90-day POC.
Gardner: I understand that Vertica is quite versatile. I've heard of a number of ways in which it's
used technically. But this email campaign problem almost sounds like a transactional issue, a
complex event processing issue, or a transfer agent scaling issue. How does big data, Vertica, and
analytics come to bear on this particular problem?
Moshin It's exactly what you say it is. As we are reporting and pushing out the campaigns, new
information is coming in every half hour, sometimes even more frequently. There's a live feed
that's updating the warehouse. While the warehouse is being updated, we want to report against it
in real time and keep our campaigns going.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
4. The key point is that we can't really stop any of these processes. The customers who are
managing the campaigns want to see information very frequently. We can’t even predict when
they would want their information. At the same time, the transactional systems are sending us
live feeds.
The problem we ran into with the traditional RDBMS is that the reporting didn't function when
the live feeds were underway. We couldn't run our backend email campaign reports when new
data was coming in.
One of the benefits Vertica has, due to its basic architecture and its columnar design is that it's
better positioned to do that. This is what we were able to demonstrate in the live POC, and
nobody was going to take our word for it.
The end user said, "Take few of our largest clients. Take some of our clients that have a lot of
transactions. Prove that the reports will work for those clients." That's what we did in 30 days.
Then, we extended it, and then in 90 days, we demonstrated the whole thing end to end.
Following that was the go-live.
Gardner: You had to solve that problem of the live feeds, the rapidity of information. Rather
going to a stop, batch process, analyze, repeat, you've gained a solution to your problem.
But at the same time, it seems like you're getting data into an environment where you can
analyze it and perhaps extract other forms of analysis, in addition to solving your email,
eCommerce trajectory issues. It seems to me that you're now going to have the opportunity to
add a new dimension of analysis to what's going on and perhaps we find these transactions more
towards a customer inference benefit.
More than a database
Moshin One of the things internally that I like to say is that Vertica isn't just a big database,
it’s more than just a database. It's really a platform, because you have distributed all, you are
publishing other tools. When we adopted it and went live with this technology, we first solved
the feeds and speeds problem, but now we're very much positioned to use some of the
capabilities that exist in Vertica.
We had Distributed R being one of them, Inference Analysis being another one, so that we can
build intelligent reports. To date, we've been building those outside the RDBMS. RDBMS has no
role in that. With Vertica, I call it more of a data platform. So we definitely will go there, but that
would be our second phase.
As the system starts to function and deliver on the key use cases, the next stage would be to build
more sophisticated reports. We definitely have the requirements and now we have the ability to
deliver.
5. Gardner: Perhaps you could add visualization capabilities to that. You could make a data pool
available to more of the constituents within this organization so that they could innovate and do
experiments. That’s a very powerful stuff indeed.
Is there anything else you can tell us for other organizations that might be facing similar issues
around real-time feeds and the need to analyze and react, now that you have been through this on
this particular project. Are there any lessons learned for others.
If you're facing transactional issues and you haven't thought about a big-data platform as part of
that solution, what do you offer to them in terms of maybe lighting a light bulb in their mind
about looking for alternatives to traditional middleware.
Moshin Like so many people try to do, we tried to see if anyone else had done this. One of the
issues in big data at least today is that you can’t find a whole slew of clients who have already
gone live and who are in production.
There are lots of people in development, and some are live, but in our space, we couldn't find
anyone who was live. We solved that issue via a quick-hit POC. The big lesson there was that we
scoped the POC right. We didn’t want to do too much and we didn’t want to do too little. So that
was a good lesson learned.
The other big thing is the data-migration question. Maybe, to some extent, this problem will
never be solved. It's not so easy to pull data out of legacy database systems. Very few of them
will give you good tools to migrate away from them. They all want you to stay. So we had to
write our own tooling. We scoured the market for it, but we couldn’t find too many options out
there.
Understand your data
So a huge lesson learned was, if you really want to do this, if you want to move to big data, get
a handle on understanding your data. Make sure you have the domain experts in-house. Make
sure you have the tooling in place, however rudimentary it might be, to be able to pull the data
out of your existing database. Once you have it in the file system, Vertica can take it in minutes.
That’s not the problem. The problem is getting it out.
We continue to grapple with that and we have made product enhancement recommendations. But
in fairness to Vertica, this is really not something that Vertica can do much about, because this is
more in the legacy database space.
Gardner: I've heard quite a few people say that, given the velocity with which they are seeing
people move to the cloud, that obviously isn't part of their problem, as the data is already in the
cloud. It's in the standardized architecture that that cloud is built around, if there is a platform-as-
a-service (PaaS) capability, then getting at the data isn't so much of a problem, or am I not
reading that correctly?
6. Moshin No, you're reading that correctly. The problem we have is that a lot of companies are
still not in the cloud. There is still a lingering fear of the cloud. People will tell you that the cloud
is not secure. If you have customer information, if you have personalized data, many
organizations don't want to put it in the cloud.
Slowly, they are moving in that direction. If we were all there, I would completely agree with
you, but since we still have so many on-premise deployments, we're still in a hybrid mode --
some is on-prem, some is in the cloud.
Gardner: I just bring it up because it gives yet another reason to seriously consider cloud. It’s a
benefit that is actually quite powerful -- the data access and ability to do joins and bring datasets
together because they're all in the same cloud.
Moshin I fundamentally agree with you. I fundamentally believe in the cloud and that it really
should be the way to go. Going through our very recent go-live, there is no way we could have
the same elasticity in an on-prem is deployment that we can have in a cloud. I can pick up the
phone, call a cloud provider, and have another machine the next day. I can't do that if it’s on-
premise.
Again, a simple question of moving all the assets into the cloud, at least in some organizations,
will take several months, if not years.
Gardner: Very good. I'm afraid we will have to leave it there. We have been discussing how a
specific enterprise in the eCommerce space has solved some unique problems using big data and,
in particular, the HP Vertica platform.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
That sets the stage for a wider use of big data for transactional problems and live-feed issues. It's
also why moving to cloud has also some potential benefits for speed, velocity, and dexterity
when it comes to data across multiple data sources and implementations.
So with that, a big thank you to our guest. We've been joined by Jimmy Mohsin, Principal
Software Architect at Norjimm LLC, a consultancy based in Princeton, New Jersey. Thanks,
Jimmy.
Moshin Thanks, Dana. Have a great day.
Gardner: And a big thank you to our audience as well, for joining us for the special new style of
IT discussion.
I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of
HP sponsored discussions. Thanks again for listening, and come back next time.
7. Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android.
Sponsor: HP Enterprise
Transcript of a Briefings Direct discussion on how HP Vertica helps a big-data consultancy in its
relationship to a variety of enterprises. Copyright Interarbor Solutions, LLC, 2005-2015. All
rights reserved.
You may also be interested in:
•
Full 360 takes big data analysis cloud services to new business heights
•
HP hyper-converged appliance delivers speedy VDI and apps deployment and a direct
onramp to hybrid cloud
•
Enterprises opting for converged infrastructure as stepping stone to hybrid cloud
•
How big data technologies Hadoop and Vertica drive business results at Snagajob
•
Zynga builds big data innovation culture by making analytics open to all developers
•
How big data powers GameStop to gain retail advantage and deep insights into its
markets
•
Data-driven apps performance monitoring spurs broad business benefits for Swiss insurer
and Turkish mobile carrier
•
How Malaysia’s Bank Simpanan Nasional implemented a sweeping enterprise content
management system
•
Redcentric Uses Advanced Configuration Database to Focus Massive Merger Across
Multiple Networks
•
HP at Discover delivers the industry's first open, hybrid, ecosystem-wide cloud
architecture
•
How Tableau Software and Big Data Come Together: Strong Visualization Embedded on
an Agile Analytics Engine
•
Big Data Helps Conservation International Proactively Respond to Species Threat in
Tropical Forests
•
How Globe Testing helps startups make the leap to cloud- and mobile-first development
•
GoodData analytics developers on what they look for in a big data platform