Unlock Big Data's Potential in Financial Services with Hortonworks
1. CONSULTING SOLUTIONS OUTSOURCING
Unlock Big Data's Potential
in Financial Services
Kurt Lueck – Pactera – US ITS Director of BI & Analytics
Chris Hackett – Hortonworks – Enterprise Account Manager
Ajay Singh – Hortonworks – Director of Technical Channels
PARTNER FOR A NEW
ERA
Good afternoon and Good Morning on the west coast.
PACTERA is a very large systems integrator with over 23k employees across 35 offices globally. Our services range from Advisory Services,BI & Analytics (which includes BIG DATA) CRM, Digital Media, to QA/Testing and Localization. We are an end-to-end consulting firm both on-shore and off-shore.We are listed on the NASDAQ symbol under PACT.My role in the organization is to lead the North America BI & Analytics practice.
Make Hadoop an enterprise data platformInnovate core platform, data, & operational servicesIntegrate deeply with enterprise ecosystemProvide world-class enterprise supportDrive 100% open source software development and releases through the core Apache projectsAddress enterprise needs in community projectsEstablish Apache foundation projects as “the standard”Promote open community vs. vendor control / lock-inEnable the Hadoop market to functionMake it easy for enterprises to deploy at scaleBe the best at enabling deep ecosystem integrationCreate a pull market with key strategic partners
Make Hadoop an enterprise data platformInnovate core platform, data, & operational servicesIntegrate deeply with enterprise ecosystemProvide world-class enterprise supportDrive 100% open source software development and releases through the core Apache projectsAddress enterprise needs in community projectsEstablish Apache foundation projects as “the standard”Promote open community vs. vendor control / lock-inEnable the Hadoop market to functionMake it easy for enterprises to deploy at scaleBe the best at enabling deep ecosystem integrationCreate a pull market with key strategic partners
We’re a plus one. We are here to interoperate and to help get additional out of your existing systems.
This is like redhat
Additionally, we are a leading provider of Hadoop support through our Hortonworks University, with courses for both development and operations. If required, we can also provide expert consulting services from both ourselves or our System Integrator partners.And for anyone looking to get their hands on Hadoop, we have recently introduced the Hadoop Sandbox program which enables users to download a full instance of HDP together with guided tutorials covering both development and administration topics.
Thanks Chris. Lets look at Big Data in Financial Markets and how we approach projects.
The first question and one that I get asked even now a surprisingly large amount of times is WHY DO I NEED BIG DATA?I have the answer down to two reasons. Reduce Cost & Do something you could not do before.For many large organizations the simple reduction or at least maintain at current cost was the factor. One more Large Vendor Appliance to store data was simply to expensive to continue.The more interesting projects are around doing things that organizations could simply NOT do …or were definitely struggling to do.Things like 360 Degree view of the customer and Fraud Detection, which we will cover both in detail in this webinar.
Yes, I know adding Smart in front of something does not make you actually Smart. But it is a great marketing ploy.Here at Pactera we are branding our industry solutions with the Term Smart Commerce – Smart City – Smart Banking and so on.The idea is that current solutions and technology will need a refresh. Big Data is such a game changer that current technology and business processes must be reviewed.The items highlighted in yellow are areas that we feel should be carefully reviewed for enhanced capabilities with Big Data technology.For example, we feel that new Data models will emerge that incorporate our old way of storing data with new methods.
Now before you think that we have lost our minds. Big Data will not solve the world.I know even the HW team that is on the line will agree with me that Big Data is part of the solution but there are many other existing and new technologies that are also part of the solution. I believe that in the next few years the lines will be blurred between “Big Data” and traditional db technologies.We believe that every business problem should be addressed with the right technology. Whenever a new technology springs up there are those that try to use it for everything. Don’tLook for technology vendors like HW that co-exist and play well with your existing vendors. At pactera we strive to know the technologies beyond the hype. Take a polyglot approach. Use the best technology for the problem.
Ok – So for some of you this may be a new slide.Big Data has a lot of new and frankly kind of funny terms. The basic element is HDFS, which is the heart of Big data. It is basically the storage of the data and I think is best understood by thinking of it in the same terms as your laptop. You take files and place them into a folder. You don’t care what is in the file and you don’t build a structure before you put them into the folder. Exact same concept with Big Data.Now a quick run through with some of the tools that are used to manipulate data.FLUME – This is a tool to ingest FilesSqoop – This is a tool to get data from or put data into databases like Oracle or MicrosoftHive – This is a tool for people like myself that want to get data using ANSI SQL. PIG – This is a scripting language much like T-SQL or PLSQL or even Python. This can be extended with Java, Python, and other languages.YARN is a new concept in Hadoop 2.0 but I will leave that for another webinar. Just know that it make Hadoop scaleable and flexible.Alright – so lets move into our First Use Case.
Perhaps there really is no such thing as easy money. Based on declining bank robbery statistics, criminals seem to be realizing that it’s hard to make a living by following in the footsteps of Bonnie and Clyde.In 2009, there were no fewer than 22 bank robberies in a trio of counties centered on Augusta, Georgia. “It felt like we were the bank robbery capital of the world that year,” Capt. Troy Elwell, of the Aiken County Sheriff’s department, recently told the Augusta Chronicle.Last year, however, there were “just” eight bank robberies reported in the same area. In fact, the paper noted, the number of bank robberies around the country has been falling steadily for years:According to the FBI, bank holdups have dropped nearly every year since 2003, when nearly 7,500 robberies were reported nationwide with $77 million taken. In 2011 – the last complete year for data – about 5,000 banks reported robberies with $38 million stolen.So where are they all going? You guess it…electronic and quite sophisticated. Easier money and the sentencing is much shorter.There are many ways a bank can be defrauded but lets focus on our discussion on a commonly understood but difficult to solve Credit Card or ATM fraud
So moving across the top there are four buckets of Method to detect fraud.Rules Based DetectionAnomaly DetectionPredictive AnalysisSocial Network AnalysisWhy is Big Data part of the solution?The main reason– More data enables more analysis both in real-time and over-time. If you are thinking “I thought Big Data was too slow for this type of application” you are somewhat correct. Alone – Hadoop is a bit slow for something real-time but with projects like Stinger and Hybrid in-memory approaches this is a reality today.Which brings me to the final comment on this page. Financial institutions must approach fraud in a hybrid approach which may start by enhancing your data types. Ultimately, all financial instituations will need to build Big Data solutions into their current IT ecosystem.Lets break these 4 types of Fraud detection and look at how Big Data can help.
Rules based fraud is the simplest to understand and implement. Every bank has some form of this in place.Simple rules. For example, a rule that states that you cannot simultaneously take out $500 from 4 different locations – Especially if there is no way that you could be in all 4 locations at the same time. You could have some ruesBut this problem is a bit more tricky. What if I took a flight from CLT to ATL – 45 minutes on a flight from one airport to another and this is a very logical transaction. I could then board another flight and within 2hrs or so I take out another. Am I doing something wrong or simply a world traveller taking the longest possible way to China.
The next item is something that we are all familiar with. Why are we familiar? Because it is not working well enough….YET. Hence the fact that we all get our cards rejected.So if we look at this basic example here. We have a number of transactions and then the 3rd is out of the ordinary. We are looking for Data that do not conform to the normal and expected patterns are outliersCriteria for what constitutes an outlier depend on the problem domain. Big Data is needed to perform the following back-end processesTypically involve large amount data -- Think millions upon millions of credit card transactionsMuch of the data may be unstructured There are some anomoly that are easy to detect. Size of transactions – location – time….Instance data, where the outlier detection algorithm operates on individual instance of data e.g., particular credit transaction involving large amount of money purchasing unusual productSequence data with temporal or spatial relationship. The goal of outlier detection is to find unusual sequence e.g., intrusion detection and cyber security.As a quick discussion of how this works. Hadoop is used to continually build your “normal”. Your normal is then stored in an in-memory type of solution that active transactions can be bounced against. Non-normal means a shutdown on your credit card and series of events that usually involve a phone call.-----------------------------but this leads us to our 3rd example.
The next level is predictive analytics.When someone goes from mundane purchases to high priced dinners and gifts. Are they in love. OR is the card stolen.Using Anomoly techniques We have been able to detect the outlier. But how do we know whether it’s a fraudulent transaction or emerging buying pattern.Your credit card may have been compromised and someone is using it. Or you have fallen in love and decided to shower him or her with expensive high price ticket items.We can’t really tell the difference, except that once there is enough data points for this emerging behavior, we won’t be getting these false positives from our analysisThis leads to the 3rd bucket which is predictive analyticsPredictive modelsPredictive models analyze past performance to assess how likely a customer is to exhibit a specific behavior in the future in order to improve marketing effectiveness. This category also encompasses models that seek out subtle data patterns to answer questions about customer performance, such as fraud detection models. Predictive models often perform calculations during live transactions, for example, to evaluate the risk or opportunity of a given customer or transaction, in order to guide a decision. With advancement in computing speed, individual agent modeling systems can simulate human behavior or reaction to given stimuli or scenarios. The new term for animating data specifically linked to an individual in a simulated environment is avatar analytics.Descriptive modelsDescriptive models quantify relationships in data in a way that is often used to classify customers or prospects into groups. Unlike predictive models that focus on predicting a single customer behavior (such as credit risk), descriptive models identify many different relationships between customers or products. Descriptive models do not rank-order customers by their likelihood of taking a particular action the way predictive models do. Descriptive models can be used, for example, to categorize customers by their product preferences and life stage. Descriptive modeling tools can be utilized to develop further models that can simulate large number of individualized agents and make predictions.Decision modelsDecision models describe the relationship between all the elements of a decision — the known data (including results of predictive models), the decision, and the forecast results of the decision — in order to predict the results of decisions involving many variables. These models can be used in optimization, maximizing certain outcomes while minimizing others. Decision models are generally used to develop decision logic or a set of business rules that will produce the desired action for every customer or circumstance.
Knowledge discovery through associative link analysis.So you may think this is a bit futuristic but I actually stole this graphic from something that was done in 2002. What if I could store everything possible about you, your known business relationships, your friends, etc?What if I picked up the fact that you were just indicted in a fraud crime. I then black list you. BUT I also build a list of your known aquantences and put them on all on a list of highly monitored individuals. In other words, I now EXPECT them to try something so anything even close to out of the ordinary is shut-down immediately.Far fetched. Not at all. Does this require big data. Yes.
What does a Big Data architecture look like to support these 4 Fraud Detection Methods.This is a sample. As you can see moving from left to right we are ingesting a wide Variety , large volume, at a high Velocity. We need several different methods of data ingestion. On the far right we have a variety of tools to put the data to use. Ranging from investigation to Visual analytics.Do you notice the Data Hubs running along the middle. These are going to be used for real-time engines to validate transactions.Alright – lots more that we could talk about on this slide but we need to move on to discuss another topic and probably the hottest topic within many industries. The elusive 360 degree of the customer.Ajay – All yours.
Early on in the presentation Hortonworks explained the value that they can provide. HW has some fantastic training classes. I know because I have attended some of them. Check our their website under training and education for more details.Pactera provides a full set of services within this space. We have HW certified resources who can help you with any of your projects.Our service offerings range from Architecture – Installation – Projects – to maintenance.
Pactera offers a complete life cycle solutions within your organization. We offer a free 4 hour executive and technical workshop within your organization. We just ask for you to fill out a 1 page questionnaire to help us understand your expectations.The executive workshop entails strategy, planning, and your current and future goals.The technical workshop is a deep dive involving end to end management and a proper solution architecture based on your current and up and coming goals. Once the workshops is complete, we will provide you an assessment of the outcome.A lot of our clients initially engage us with a 2-4wk pilot to ensure your project is put into action. And finally, we offer Full lifecycle in the following:Benchmark & MonitoringIntegrations & MigrationsImplementation & ArchitectureProject ManagementAnalyticsReporting We can perform these efforts both on-shore and off-shore.