Pactera is a global consulting firm focused on driving innovation through big data, analytics, mobility and cloud solutions. They have over 24,000 employees in 35 offices globally. The presentation discusses how retailers can utilize big data to better understand customer behavior and provide more personalized experiences. It also outlines Pactera's predictive analytics services and how they can help clients develop successful predictive strategies and data-driven solutions.
1. CONSULTING SOLUTIONS OUTSOURCING
PARTNER FOR A NEW
ERA
Big Data in Retail
Tom Kersnick - Director, Big Data Solutions, Pactera
Challen Bonar - Senior Director, Retail Practice, Pactera
Utilizing Big Data to
Predict Customer Behavior
Tom Kersnick – Director Big Data Solutions
As customers continue to seek out individualized purchasing experiences, companies are beginning to realize that their Big Data initiatives must be optimized in order to provide the personalized level of service that customers demand. spot real-time shifts in customer behavior This will facilitate stronger connections between customer, products, pricing, promotions and sales.If a promotion does not yield the purchases for which the company was hoping, what can it do to increase sales with a different promotion? To discover meaningful relationships and trends within its data, companies must learn how to best acquire, analyze and act on this new information. The data can assist an organization in decisions to bundle services and solutions and determine pricing and packaging.Data categoriesOrganizations typically divide data into two categories: transactional and sub-transactional data. I recommend focusing on sub-transactional data, or clickstream-based data, to identify patterns in user intentions. This data helps your organization make useful predictions about customers’ behaviors. pricing modelsTie the usage to pricing and better match customer behavior to pricing. Using big data solutions, companies can offer services at prices and packages that align with customer actions, while offering the ability to upsell based on those actions in parallel.behavior based customer offersTrack user clickstream behaviorsShowing real-time offers to customers based on specific behavior patterns. this architecture allows to create offers and monetize them immediately to drive more revenue.For example that a well-known travel site will do to increase revenue: they will constantly adjust their prices based on the sales of a specific route in real-time based on user’s reactions.preventing customer churnCustomers lost to churn need to be replaced. That’s expensive. Keeping customer acquisition costs at less than annual recurring revenue is important within today’s products. Sub-transactional data allows your organization to identify when customers begin to use your solutions less frequently or when they are about to make specific mistakes using your products. Analyze this information and put it to good use by fine-tuning the user experience to lead to more engaged customers.
Reduce a Classification Problem I'd try to reduce the problem to a classification problem - and use Machine learning existing tools to get an answer. A proper Big Data Solution Architecture can help in discovery.Extract defined and variable featuresLook for golden nuggets in unstructured data that you can extractDefine what you want to predictCreate a training setGather as many classified examples as you can get. Example: User x visited 5 pages and spent a total of 4 minutes.Execute any existing Classification AlgorithmsTry to predict what a non-classified user did, given just her features aloneA short list of some of the algorithms you can use for this: SVM - not intuitive - but is considered by many the best classification algorithm available. K Nearest neighbor - very intuitive and simple to program, and also the training set can be iteratively increased easily, but is usually a bad decision if the number of features is high. Decision trees algorithms - allows very fast classification, and the resulting tree is intuitive and readable to humans.You can evaluate your algorithm and get your confusion matrix Evaluation how much the algorithm was right and how much it was wrong, and how by using cross-validation on the training set.
Initial investigations1. Look at the data dictionary to see which data is available2. What is the outcome ? is it yes / no ? is it continuous ?3. Decide upon the model required (logistic ! for yes / no outcome)Getting the data ready4 cross tabulations on categorical variables to understand the coding and volumes5. summary statistics to understand the distribution of the continuous variables6. Ask questions about data quality:remove these variables from any potential models ? or,think about imputation ? or,obtain accurate data ?7. Convert continuous variables into categorical variablesModeling8. Check for multi-colinearity / correlation between variables (variance inflation factors), or correlation tests9. Check for interactions10. Choose type of logistic approach (e.g. forward, backward, stepwise)11. Choose the baseline attribute for each categorical variable12. Create a random variable – mustn’t step into the model - something is wrong if it does step into the model13. Split the dataset into two parts (ratio 80%/20%)using random selection without replacementthe larger sample is the build dataset the smaller sample is the test dataset 14. Put all variables from the build dataset (including interactions and the random variable) into the model and run itCheck odds ratios – do they make sense ?, andCheck the coefficients – do they make sense ?Check the model15. Do diagnostic checks and plots of the fit (e.g. Somers D, residuals etc., etc.)16. Put all variables from the test dataset (including interactions and the random variable) into a new model and run itAre the coefficients the same as the model it was built on ? andAre the odds ratios the same as the model it was built on ?Start again 17. Back to the start, fine tune the grouping of the data, put variables in or take variables out.
Start with a small data sample. You don’t need the full width and depth of your data to find interesting stuff. Start small. It saves you a lot of technology headache at the start.Use BigML or any other simple to use tool to build an initial predictive model that you can understand and quickly integrate. Check this series of blog posts comparing some SaaS machine learning offerings. The important words here are simple, actionable and understandable. You don’t want to waste too much time figuring out how to use a tool. Neither do you want to lose time translating and coding the outcomes. And you want to understand the outcomes so you can execute step three.Check if the model gives you any practical insights. Explore the model. Find it’s gold. Or not and discard it.Use the model to generate predictions and see if it can improve your company’s performance. Put the model to action. Find a playground in your company to take a test and measure the changes in churn, conversion, risk or whatever you modeled.Check how more data can improve the model. You can add data in two ways: simply add more datapoints to the same dataset. Or you can add more features to the dataset, new pieces of information, to enhance the model and find new relationships with possibly better performance. In spreadsheet terms: you can add more rows or more columns.Check if this more sophisticated model beats the previous model. Again: put it to action and see how it performs. Does it improve the previous results? Iterate. The secret is to try multiple models to see which one gives the best results at this point in time. Continue to iterate to find the best fit.Now check the technology concept suitable for your situation. Now that you’ve seen some successful implementations of predictive models, you’re much better equipped to evaluate various vendor’s offerings. You have experienced how a cloud-based service saves you the annual license fees, investment in hardware and training etc. You can compare that to the more traditional on site implementations and pick which concept best fits your needs and budget.Actionable analytics.
Many think that the explosion in Big Data will create demand for data scientists able to slice and dice data to guide more informed decision making within the organization. Others go a step further,that a chronic data scientist shortage will hold back the full potential of Big Data.For years, the BI and data analytics conversation was framed around how to aggregate massive volumes of data and then unleash the data scientists to find the value. Today, despite the information deluge, enterprise decision makers are often unable to access the data in a useful way. The tools are designed for those who speak the language of algorithms and statistical analysis.Accelerate natural language capabilitiesIn laying out several predictions for Big Data and data analytics in the coming years, research firm Gartner forecasts that by 2016, 70 percent of leading BI vendors will have incorporated natural language and spoken-word capabilities. Improving natural language queries is essential to the consumerization of Big Data, as it better enables everyday business users to ask questions of their analytics tool in an intuitive format (text, voice-enabled, etc.), and receive the most relevant and meaningful visualization results.Look smart (and ask very little of the user)Users now expect the answer to be presented via visualization — and they expect to be able to interact with the visualization. It’s more than a picture telling a thousand words. It’s the picture becoming the interaction.Create social learning layerBy building a social learning layer into data analytics tools, user behavior can be “learned” by the tool, so it can anticipate queries and be trained to the specific needs of each user. By making the tool as seamless as possible for business users, there is less chance the tool will waste time, resources, or deliver results different than what the user is looking for.Deflate Big DataAn enterprise should start with a small set of users it believes will experience the most immediate value.Focus on right kind of mobilityToday’s Big Data Analytics platform must consumerize the user experience by removing spreadsheets and reports, and place the power of analytics in the hands of users of any level and analytics expertise. This consumerized data analytics experience must enable mobility on any smartphone or tablet device, with complete flexibility on how data can be visualized. It is this flexibility for visualizing data in real-time on mobile devices – rather than just making massive volumes of data accessible to mobile users – that will help data analytics providers realize the most from their mobile investments.A consumerized Big Data experience is in sight for enterprises, as long as data analytics providers remain focused on features and functionality that ultimately will matter most to business users. Big Data analytics is leading the charge to reinventing enterprise applications – bringing our consumer-level expectations to our everyday work experience.
Flight Cost Variant Determination Flight Cost is one of the algorithm methods being used to increase/decrease revenue based on page views, consumer marketing, and time spent on a particular one-way or round-trip flight by a consumer. The goal is to provide not only alternatives, but increase/decrease cost while other consumers are also viewing the same flights. This is determined by sales from all related airlines and competitors during the flight availability. This method can be extended to use other sources as well.Destinations:web applicationsmobile applicationshadooprdbms