Context-Aware Recommender System (CARS) models are trained on datasets of context-dependent user preferences (ratings and context information). Since the number of context-dependent preferences increases exponentially with the number of contextual factors, and certain contextual in- formation is still hard to acquire automatically (e.g., the user’s mood or for whom the user is buying the searched item) it is fundamental to identify and acquire those factors that truly influence the user preferences and the ratings. In particular, this ensures that (i) the user effort in specifying contextual information is kept to a minimum, and (ii) the system’s performance is not negatively impacted by irrelevant contextual information. In this paper, we propose a novel method which, unlike existing ones, directly estimates the impact of context on rating predictions and adaptively identifies the contextual factors that are deemed to be useful to be elicited from the users. Our experimental evaluation shows that it compares favourably to various state-of-the-art context selection methods.
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...Matthias Braunhofer
In this paper we present STS (South Tyrol Suggests), a context-aware mobile recommender system for places of interest (POIs) that integrates some innovative components, including: a personality questionnaire, i.e., a brief and entertaining questionnaire used by the system to learn users’ personality; an active learning module that acquires ratings-in-context for POIs that users are likely to have experienced; and a matrix factorization based recommendation module that leverages the personality information and several contextual factors in order to generate more relevant recommendations.
Adopting a system oriented perspective, we describe the assessment of the combination of the implemented components. We focus on usability aspects and report the end-user assessment of STS. It was obtained from a controlled live user study as well as from the log data produced by a larger sample of users that have freely downloaded and tried STS through Google Play Store. The result of the assessment showed that the overall usability of the system falls between “good” and “excellent”, it helped us to identify potential problems and it provided valuable indications for future system improvement.
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsMatthias Braunhofer
Recommender systems suffer from the new user problem, i.e., the difficulty to make accurate predictions for users that have rated only few items. Moreover, they usually compute recommendations for items just in one domain, such as movies, music, or books. In this paper we deal with such a cold-start situation exploiting cross-domain recommendation techniques, i.e., we suggest items to a user in one target domain by using ratings of other users in a, completely disjoint, auxiliary domain. We present three rating prediction models that make use of information about how users tag items in an auxiliary domain, and how these tags correlate with the ratings to improve the rating prediction task in a different target domain. We show that the proposed techniques can effectively deal with the considered cold-start situation, given that the tags used in the two domains overlap.
In this presentation we present a novel context-aware mobile recommender system for places of interest (POIs). Unlike existing systems, which learn users' preferences solely from their past ratings, it considers also their personality - using the Five Factor Model. Personality is acquired by asking users to complete a brief and entertaining questionnaire as part of the registration process, and is then exploited in: (1) an active learning module that actively acquires ratings-in-context for POIs that users are likely to have experienced, hence reducing the stress and annoyance to rate (or skip rating) items that the users don’t know; and (2) in the recommendation model that builds up on matrix factorization and therefore can be trained even if the users haven’t rated any items yet.
Context-Aware Points of Interest Suggestion with Dynamic Weather Data ManagementMatthias Braunhofer
Weather plays an important role in tourists’ decision-making and, for instance, some places or activities must not be even suggested under dangerous weather conditions. In this paper we present a context-aware recommender system, named STS, that computes recommendations suited for the weather conditions at the recommended places of interest (POI) by exploiting a novel model-based context-aware recommendation technique. In a live user study we have compared the performance of the system with a variant that does not exploit weather data when generating recommendations. The results of our experiment have shown that the proposed approach obtains a higher perceived recommendation quality and choice satisfaction.
Hybrid Solution of the Cold-Start Problem in Context-Aware Recommender SystemsMatthias Braunhofer
This document summarizes Matthias Braunhofer's doctoral research on addressing the cold-start problem in context-aware recommender systems. It presents basic context-aware rating prediction models like CAMF-CC and SPF, and proposes novel variants that incorporate additional contextual information like item categories or user demographics. It also describes two approaches to building hybrid context-aware recommender systems - heuristic switching and adaptive weighting. An evaluation compares the performance of these models on three datasets in addressing new user, new item, and new context cold-start situations, finding that hybrid models generally outperform basic models.
This document discusses context-aware recommender systems for mobile devices. It introduces recommender systems and how they are used to help users find relevant information. It describes how mobile recommender systems can take into account contextual information like location and weather to provide personalized recommendations. As a practical example, it outlines the South Tyrol Suggests app, which provides point of interest recommendations for South Tyrol adapted to the user's context. It also discusses the challenges of building context-aware recommender systems and evaluating their performance.
Hybridisation Techniques for Cold-Starting Context-Aware Recommender SystemsMatthias Braunhofer
Context-Aware Recommender Systems (CARSs) suffer from the cold-start problem, i.e., the inability to provide accurate recommendations for new users, items or contextual situations. In this research, we aim at solving this problem by exploiting various hybridisation techniques, from simple heuristic-based solutions to complex adaptive solutions, in order to take advantage of the strengths of different CARS algorithms while avoiding their weaknesses in a given (cold-start) situation. Our initial research based on offline experiments using various contextually-tagged rating datasets has shown that basic CARS algorithms perform very differently in different recommendation scenarios, and that they can be effectively hybridised to achieve an overall optimal performance. Further research is now required to find the optimal method for hybridisation.
Techniques for Context-Aware and Cold-Start RecommendationsMatthias Braunhofer
Context-aware recommender systems better identify interesting items for users by adapting their suggestions to the specific contextual situations, e.g., to the current weather, if an excursion is to be recommended . But, the cold-start problem may jeopardise the quality of the recommendations: for users, items or contextual situations that are new to the system, recommendations are hard to compute. We have developed a number of novel techniques to tame this problem, and in particular, new hybrid algorithms that combine several, simpler, algorithms in order to exploit their strengths and avoid their weaknesses. We have also developed algorithms for actively identifying the most useful preference information to ask the user in order to bootstrap the system. Our results obtained from a series of offline and online experiments reveal that the proposed techniques can effectively alleviate the cold-start problem of context-aware recommender systems.
Usability Assessment of a Context-Aware and Personality-Based Mobile Recommen...Matthias Braunhofer
In this paper we present STS (South Tyrol Suggests), a context-aware mobile recommender system for places of interest (POIs) that integrates some innovative components, including: a personality questionnaire, i.e., a brief and entertaining questionnaire used by the system to learn users’ personality; an active learning module that acquires ratings-in-context for POIs that users are likely to have experienced; and a matrix factorization based recommendation module that leverages the personality information and several contextual factors in order to generate more relevant recommendations.
Adopting a system oriented perspective, we describe the assessment of the combination of the implemented components. We focus on usability aspects and report the end-user assessment of STS. It was obtained from a controlled live user study as well as from the log data produced by a larger sample of users that have freely downloaded and tried STS through Google Play Store. The result of the assessment showed that the overall usability of the system falls between “good” and “excellent”, it helped us to identify potential problems and it provided valuable indications for future system improvement.
Cold-Start Management with Cross-Domain Collaborative Filtering and TagsMatthias Braunhofer
Recommender systems suffer from the new user problem, i.e., the difficulty to make accurate predictions for users that have rated only few items. Moreover, they usually compute recommendations for items just in one domain, such as movies, music, or books. In this paper we deal with such a cold-start situation exploiting cross-domain recommendation techniques, i.e., we suggest items to a user in one target domain by using ratings of other users in a, completely disjoint, auxiliary domain. We present three rating prediction models that make use of information about how users tag items in an auxiliary domain, and how these tags correlate with the ratings to improve the rating prediction task in a different target domain. We show that the proposed techniques can effectively deal with the considered cold-start situation, given that the tags used in the two domains overlap.
In this presentation we present a novel context-aware mobile recommender system for places of interest (POIs). Unlike existing systems, which learn users' preferences solely from their past ratings, it considers also their personality - using the Five Factor Model. Personality is acquired by asking users to complete a brief and entertaining questionnaire as part of the registration process, and is then exploited in: (1) an active learning module that actively acquires ratings-in-context for POIs that users are likely to have experienced, hence reducing the stress and annoyance to rate (or skip rating) items that the users don’t know; and (2) in the recommendation model that builds up on matrix factorization and therefore can be trained even if the users haven’t rated any items yet.
Context-Aware Points of Interest Suggestion with Dynamic Weather Data ManagementMatthias Braunhofer
Weather plays an important role in tourists’ decision-making and, for instance, some places or activities must not be even suggested under dangerous weather conditions. In this paper we present a context-aware recommender system, named STS, that computes recommendations suited for the weather conditions at the recommended places of interest (POI) by exploiting a novel model-based context-aware recommendation technique. In a live user study we have compared the performance of the system with a variant that does not exploit weather data when generating recommendations. The results of our experiment have shown that the proposed approach obtains a higher perceived recommendation quality and choice satisfaction.
Hybrid Solution of the Cold-Start Problem in Context-Aware Recommender SystemsMatthias Braunhofer
This document summarizes Matthias Braunhofer's doctoral research on addressing the cold-start problem in context-aware recommender systems. It presents basic context-aware rating prediction models like CAMF-CC and SPF, and proposes novel variants that incorporate additional contextual information like item categories or user demographics. It also describes two approaches to building hybrid context-aware recommender systems - heuristic switching and adaptive weighting. An evaluation compares the performance of these models on three datasets in addressing new user, new item, and new context cold-start situations, finding that hybrid models generally outperform basic models.
This document discusses context-aware recommender systems for mobile devices. It introduces recommender systems and how they are used to help users find relevant information. It describes how mobile recommender systems can take into account contextual information like location and weather to provide personalized recommendations. As a practical example, it outlines the South Tyrol Suggests app, which provides point of interest recommendations for South Tyrol adapted to the user's context. It also discusses the challenges of building context-aware recommender systems and evaluating their performance.
Hybridisation Techniques for Cold-Starting Context-Aware Recommender SystemsMatthias Braunhofer
Context-Aware Recommender Systems (CARSs) suffer from the cold-start problem, i.e., the inability to provide accurate recommendations for new users, items or contextual situations. In this research, we aim at solving this problem by exploiting various hybridisation techniques, from simple heuristic-based solutions to complex adaptive solutions, in order to take advantage of the strengths of different CARS algorithms while avoiding their weaknesses in a given (cold-start) situation. Our initial research based on offline experiments using various contextually-tagged rating datasets has shown that basic CARS algorithms perform very differently in different recommendation scenarios, and that they can be effectively hybridised to achieve an overall optimal performance. Further research is now required to find the optimal method for hybridisation.
Techniques for Context-Aware and Cold-Start RecommendationsMatthias Braunhofer
Context-aware recommender systems better identify interesting items for users by adapting their suggestions to the specific contextual situations, e.g., to the current weather, if an excursion is to be recommended . But, the cold-start problem may jeopardise the quality of the recommendations: for users, items or contextual situations that are new to the system, recommendations are hard to compute. We have developed a number of novel techniques to tame this problem, and in particular, new hybrid algorithms that combine several, simpler, algorithms in order to exploit their strengths and avoid their weaknesses. We have also developed algorithms for actively identifying the most useful preference information to ask the user in order to bootstrap the system. Our results obtained from a series of offline and online experiments reveal that the proposed techniques can effectively alleviate the cold-start problem of context-aware recommender systems.
Alleviating cold-user start problem with users' social network data in recomm...Eduardo Castillejo Gil
This work explores the possibility of using relevant data from users’
social network to alleviate the cold-user problems in a recommender
system domain. The proposed solution extracts the most valuable
node in the graph generated by check in a venue with an Android
application using the Foursquare API. By obtaining the recommendations to this node we estimate the probability of some categories
to be similar to users tastes...
Tutorial: Context In Recommender SystemsYONG ZHENG
This document provides an overview of a tutorial on context-aware recommender systems. The tutorial will cover traditional recommendation techniques, context-aware recommendation which incorporates additional contextual information such as time and location, and context suggestion. It includes an agenda with topics, background information on recommender systems and evaluation metrics, and descriptions of techniques for context-aware recommendation including context filtering and modeling.
Context-Aware Recommender System Based on Boolean Matrix FactorisationDmitrii Ignatov
In this work we propose and study an approach for collaborative filtering, which is based on Boolean matrix factorisation and exploits additional (context) information about users and items. To avoid similarity loss in case of Boolean representation we use an adjusted type of projection of a target user to the obtained factor space.
We have compared the proposed method with SVD-based approach on the MovieLens dataset. The experiments demonstrate that the proposed method has better MAE and Precision and comparable Recall and F-measure. We also report an increase of quality in the context information presence.
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningQuantUniversity
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In Part II of the Anomaly Detection Series, we discuss the challenges in analyzing Temporal datasets and discuss methods for outlier analysis. We focus on single time series and discuss point outlier and sub-sequence methods.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance. In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In this workshop, we will discuss the core techniques in anomaly detection and discuss advances in Deep Learning in this field.
Through case studies, we will discuss how anomaly detection techniques could be applied to various business problems. We will also demonstrate examples using R, Python, Keras and Tensorflow applications to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
With R, Python, Apache Spark and a plethora of other open source tools, anyone with a computer can run machine learning algorithms in a jiffy! However, without an understanding of which algorithms to choose and when to apply a particular technique, most machine learning efforts turn into trial and error experiments with conclusions like "The algorithms don't work" or "Perhaps we should get more data".
In this lecture, we will focus on the key tenets of machine learning algorithms and how to choose an algorithm for a particular purpose. Rather than just showing how to run experiments in R ,Python or Apache Spark, we will provide an intuitive introduction to machine learning with just enough mathematics and basic statistics.
We will address:
• How do you differentiate Clustering, Classification and Prediction algorithms?
• What are the key steps in running a machine learning algorithm?
• How do you choose an algorithm for a specific goal?
• Where does exploratory data analysis and feature engineering fit into the picture?
• Once you run an algorithm, how do you evaluate the performance of an algorithm?
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Dr. Cornelius Ludmann
Talk at the Data Streams and Event Processing Workshop at the 16. Fachtagung »Datenbanksysteme für Business, Technologie und Web« (BTW) of the Gesellschaft für Informatik (GI) in Hamburg, Germany. March 3, 2015
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results
Since its debut in 2010, Apache Spark has become one of the most popular Big Data technologies in the Apache open source ecosystem. In addition to enabling processing of large data sets through its distributed computing architecture, Spark provides out-of-the-box support for machine learning, streaming and graph processing in a single framework. Spark has been supported by companies like Microsoft, Google, Amazon and IBM and in financial services, companies like Blackrock (http://bit.ly/1Q1DVJH ) and Bloomberg (http://bit.ly/29LXbPv ) have started to integrate Apache Spark into their tool chain and the interest is growing. Unlike other big-data technologies which require intensive programming using Java etc., Spark enables data scientists to work with a big-data technology using higher level languages like Python and R making it accessible to conduct experiments and for rapid prototyping.
In this talk, we will introduce Apache Spark and discuss the key features that differentiate Apache Spark from other technologies. We will provide examples on how Apache Spark can help scale analytics and discuss how the machine learning API could be used to solve large-scale machine learning problems using Spark’s distributed computing framework. We will also illustrate enterprise use cases for scaling analytics with Apache Spark.
Decision Support Analyss for Software Effort Estimation by AnalogyTim Menzies
The document discusses decision support for software effort estimation by analogy (EBA). It outlines the basic tasks and decision problems in applying EBA, including searching for analogs, determining similarity measures, and choosing an analogy adaptation strategy. It presents a decision-centric process model for EBA and discusses empirical studies to support decision-making when customizing EBA for a specific dataset. An example EBA method called AQUA+ is analyzed through a comparative study evaluating different attribute weighting heuristics.
Instance Space Analysis for Search Based Software EngineeringAldeida Aleti
Search-Based Software Engineering is now a mature area with numerous techniques developed to tackle some of the most challenging software engineering problems, from requirements to design, testing, fault localisation, and automated program repair. SBSE techniques have shown promising results, giving us hope that one day it will be possible for the tedious and labour intensive parts of software development to be completely automated, or at least semi-automated. In this talk, I will focus on the problem of objective performance evaluation of SBSE techniques. To this end, I will introduce Instance Space Analysis (ISA), which is an approach to identify features of SBSE problems that explain why a particular instance is difficult for an SBSE technique. ISA can be used to examine the diversity and quality of the benchmark datasets used by most researchers, and analyse the strengths and weaknesses of existing SBSE techniques. The instance space is constructed to reveal areas of hard and easy problems, and enables the strengths and weaknesses of the different SBSE techniques to be identified. I will present on how ISA enabled us to identify the strengths and weaknesses of SBSE techniques in two areas: Search-Based Software Testing and Automated Program Repair. Finally, I will end my talk with future directions of the objective assessment of SBSE techniques.
Anomaly detection techniques aim to identify outliers or anomalies in datasets. Statistical approaches assume a data distribution and detect anomalies that differ significantly. Distance-based approaches measure distances between data points to find outliers that are far from neighbors. Clustering approaches group normal data and detect outliers in small clusters or far from other clusters. Challenges include determining the number of outliers, handling unlabeled data, and scaling to high dimensions where distances become similar.
Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...Barbara Russo
Predicting system failures can be of great benefit to managers that get a better command over system performance.
Data that systems generate in the form of logs is a valuable source of information to predict system reliability. As such, there is an increasing demand of
tools to mine logs and provide accurate predictions. However, interpreting information in logs poses some challenges. This talk
presents how to effectively mining sequences of logs and provide correct predictions.
The approach integrates different machine learning techniques to control for data brittleness, provide accuracy of model selection and validation,
and increase robustness of classification results. We apply the proposed approach to log sequences of 25 different applications of a software system for
telemetry of cars
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsAlan Said
he evaluation of recommender systems is crucial for their development. In today's recommendation landscape there are many standardized recommendation algorithms and approaches, however, there exists no standardized method for experimental setup of evaluation -- not even for widely used measures such as precision and root-mean-squared error. This creates a setting where comparison of recommendation results using the same datasets becomes problematic. In this paper, we propose an evaluation protocol specifically developed with the recommendation use-case in mind, i.e. the recommendation of one or several items to an end user. The protocol attempts to closely mimic a scenario of a deployed (production) recommendation system, taking specific user aspects into consideration and allowing a comparison of small and large scale recommendation systems. The protocol is evaluated on common recommendation datasets and compared to traditional recommendation settings found in research literature. Our results show that the proposed model can better capture the quality of a recommender system than traditional evaluation does, and is not affected by characteristics of the data (e.g. size. sparsity, etc.).
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...Ptidej Team
RQ2
RQ3
RQ4
Conclusion and
Future Work
Conclusion
Threats to Validity and
Future Work
9 / 30
This document presents an empirical study that investigates developers' program exploration strategies. The goal is to understand how developers navigate through a program's entities in order to help them more efficiently. The study analyzes developers' interaction histories to identify common exploration strategies and examines relationships between strategies and other factors like task type and expertise level. The results could help evaluate developer performance, improve comprehension models, and guide less experienced developers.
GTC 2021: Counterfactual Learning to Rank in E-commerceGrubhubTech
Many ecommerce companies have extensive logs of user behavior such as clicks and conversions. However, if supervised learning is naively applied, then systems can suffer from poor performance due to bias and feedback loops. Using techniques from counterfactual learning we can leverage log data in a principled manner in order to model user behaviour and build personalized recommender systems. At Grubhub, a user journey begins with recommendations and the vast majority of conversions are powered by recommendations. Our recommender policies can drive user behavior to increase orders and/or profit. Accordingly, the ability to rapidly iterate and experiment is very important. Because of our powerful GPU workflows, we can iterate 200% more rapidly than with counterpart CPU workflows. Developers iterate ideas with notebooks powered by GPUs. Hyperparameter spaces are explored up to 8x faster with multi-GPUs Ray clusters. Solutions are shipped from notebooks to production in half the time with nbdev. With our accelerated DS workflows and Deep Learning on GPUs, we were able to deliver a +12.6% conversion boost in just a few months. In this talk we hope to present modern techniques for industrial recommender systems powered by GPU workflows. First a small background on counterfactual learning techniques, then followed by practical information and data from our industrial application.
By Alex Egg, accepted to Nvidia GTC 2021 Conference
Experiments on Design Pattern DiscoveryTim Menzies
The document describes experiments conducted to discover design patterns from source code. It outlines the approach taken by DP-Miner tool, presents experiment data on four Java systems, and evaluates results by calculating precision and recall values. Benchmarks are lacking for accurately evaluating design pattern discovery techniques.
This document presents a method for selectively acquiring contextual information in travel recommender systems to improve recommendations. It proposes acquiring only the most important contextual factors for each user-item pair, rather than all available factors. It describes an algorithm called Largest Deviation that calculates relevance scores for factors based on their impact on rating predictions. An evaluation on two datasets found Largest Deviation achieved better prediction accuracy and ranking quality compared to baseline methods, while acquiring conditions for fewer contextual factors. The selective context acquisition approach allows travel recommender systems to provide more personalized recommendations without needing all available contextual information.
Alleviating cold-user start problem with users' social network data in recomm...Eduardo Castillejo Gil
This work explores the possibility of using relevant data from users’
social network to alleviate the cold-user problems in a recommender
system domain. The proposed solution extracts the most valuable
node in the graph generated by check in a venue with an Android
application using the Foursquare API. By obtaining the recommendations to this node we estimate the probability of some categories
to be similar to users tastes...
Tutorial: Context In Recommender SystemsYONG ZHENG
This document provides an overview of a tutorial on context-aware recommender systems. The tutorial will cover traditional recommendation techniques, context-aware recommendation which incorporates additional contextual information such as time and location, and context suggestion. It includes an agenda with topics, background information on recommender systems and evaluation metrics, and descriptions of techniques for context-aware recommendation including context filtering and modeling.
Context-Aware Recommender System Based on Boolean Matrix FactorisationDmitrii Ignatov
In this work we propose and study an approach for collaborative filtering, which is based on Boolean matrix factorisation and exploits additional (context) information about users and items. To avoid similarity loss in case of Boolean representation we use an adjusted type of projection of a target user to the obtained factor space.
We have compared the proposed method with SVD-based approach on the MovieLens dataset. The experiments demonstrate that the proposed method has better MAE and Precision and comparable Recall and F-measure. We also report an increase of quality in the context information presence.
Anomaly detection: Core Techniques and Advances in Big Data and Deep LearningQuantUniversity
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In Part II of the Anomaly Detection Series, we discuss the challenges in analyzing Temporal datasets and discuss methods for outlier analysis. We focus on single time series and discuss point outlier and sub-sequence methods.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance. In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In this workshop, we will discuss the core techniques in anomaly detection and discuss advances in Deep Learning in this field.
Through case studies, we will discuss how anomaly detection techniques could be applied to various business problems. We will also demonstrate examples using R, Python, Keras and Tensorflow applications to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
With R, Python, Apache Spark and a plethora of other open source tools, anyone with a computer can run machine learning algorithms in a jiffy! However, without an understanding of which algorithms to choose and when to apply a particular technique, most machine learning efforts turn into trial and error experiments with conclusions like "The algorithms don't work" or "Perhaps we should get more data".
In this lecture, we will focus on the key tenets of machine learning algorithms and how to choose an algorithm for a particular purpose. Rather than just showing how to run experiments in R ,Python or Apache Spark, we will provide an intuitive introduction to machine learning with just enough mathematics and basic statistics.
We will address:
• How do you differentiate Clustering, Classification and Prediction algorithms?
• What are the key steps in running a machine learning algorithm?
• How do you choose an algorithm for a specific goal?
• Where does exploratory data analysis and feature engineering fit into the picture?
• Once you run an algorithm, how do you evaluate the performance of an algorithm?
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Dr. Cornelius Ludmann
Talk at the Data Streams and Event Processing Workshop at the 16. Fachtagung »Datenbanksysteme für Business, Technologie und Web« (BTW) of the Gesellschaft für Informatik (GI) in Hamburg, Germany. March 3, 2015
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance.
In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results
Since its debut in 2010, Apache Spark has become one of the most popular Big Data technologies in the Apache open source ecosystem. In addition to enabling processing of large data sets through its distributed computing architecture, Spark provides out-of-the-box support for machine learning, streaming and graph processing in a single framework. Spark has been supported by companies like Microsoft, Google, Amazon and IBM and in financial services, companies like Blackrock (http://bit.ly/1Q1DVJH ) and Bloomberg (http://bit.ly/29LXbPv ) have started to integrate Apache Spark into their tool chain and the interest is growing. Unlike other big-data technologies which require intensive programming using Java etc., Spark enables data scientists to work with a big-data technology using higher level languages like Python and R making it accessible to conduct experiments and for rapid prototyping.
In this talk, we will introduce Apache Spark and discuss the key features that differentiate Apache Spark from other technologies. We will provide examples on how Apache Spark can help scale analytics and discuss how the machine learning API could be used to solve large-scale machine learning problems using Spark’s distributed computing framework. We will also illustrate enterprise use cases for scaling analytics with Apache Spark.
Decision Support Analyss for Software Effort Estimation by AnalogyTim Menzies
The document discusses decision support for software effort estimation by analogy (EBA). It outlines the basic tasks and decision problems in applying EBA, including searching for analogs, determining similarity measures, and choosing an analogy adaptation strategy. It presents a decision-centric process model for EBA and discusses empirical studies to support decision-making when customizing EBA for a specific dataset. An example EBA method called AQUA+ is analyzed through a comparative study evaluating different attribute weighting heuristics.
Instance Space Analysis for Search Based Software EngineeringAldeida Aleti
Search-Based Software Engineering is now a mature area with numerous techniques developed to tackle some of the most challenging software engineering problems, from requirements to design, testing, fault localisation, and automated program repair. SBSE techniques have shown promising results, giving us hope that one day it will be possible for the tedious and labour intensive parts of software development to be completely automated, or at least semi-automated. In this talk, I will focus on the problem of objective performance evaluation of SBSE techniques. To this end, I will introduce Instance Space Analysis (ISA), which is an approach to identify features of SBSE problems that explain why a particular instance is difficult for an SBSE technique. ISA can be used to examine the diversity and quality of the benchmark datasets used by most researchers, and analyse the strengths and weaknesses of existing SBSE techniques. The instance space is constructed to reveal areas of hard and easy problems, and enables the strengths and weaknesses of the different SBSE techniques to be identified. I will present on how ISA enabled us to identify the strengths and weaknesses of SBSE techniques in two areas: Search-Based Software Testing and Automated Program Repair. Finally, I will end my talk with future directions of the objective assessment of SBSE techniques.
Anomaly detection techniques aim to identify outliers or anomalies in datasets. Statistical approaches assume a data distribution and detect anomalies that differ significantly. Distance-based approaches measure distances between data points to find outliers that are far from neighbors. Clustering approaches group normal data and detect outliers in small clusters or far from other clusters. Challenges include determining the number of outliers, handling unlabeled data, and scaling to high dimensions where distances become similar.
Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...Barbara Russo
Predicting system failures can be of great benefit to managers that get a better command over system performance.
Data that systems generate in the form of logs is a valuable source of information to predict system reliability. As such, there is an increasing demand of
tools to mine logs and provide accurate predictions. However, interpreting information in logs poses some challenges. This talk
presents how to effectively mining sequences of logs and provide correct predictions.
The approach integrates different machine learning techniques to control for data brittleness, provide accuracy of model selection and validation,
and increase robustness of classification results. We apply the proposed approach to log sequences of 25 different applications of a software system for
telemetry of cars
A Top-N Recommender System Evaluation Protocol Inspired by Deployed SystemsAlan Said
he evaluation of recommender systems is crucial for their development. In today's recommendation landscape there are many standardized recommendation algorithms and approaches, however, there exists no standardized method for experimental setup of evaluation -- not even for widely used measures such as precision and root-mean-squared error. This creates a setting where comparison of recommendation results using the same datasets becomes problematic. In this paper, we propose an evaluation protocol specifically developed with the recommendation use-case in mind, i.e. the recommendation of one or several items to an end user. The protocol attempts to closely mimic a scenario of a deployed (production) recommendation system, taking specific user aspects into consideration and allowing a comparison of small and large scale recommendation systems. The protocol is evaluated on common recommendation datasets and compared to traditional recommendation settings found in research literature. Our results show that the proposed model can better capture the quality of a recommender system than traditional evaluation does, and is not affected by characteristics of the data (e.g. size. sparsity, etc.).
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...Ptidej Team
RQ2
RQ3
RQ4
Conclusion and
Future Work
Conclusion
Threats to Validity and
Future Work
9 / 30
This document presents an empirical study that investigates developers' program exploration strategies. The goal is to understand how developers navigate through a program's entities in order to help them more efficiently. The study analyzes developers' interaction histories to identify common exploration strategies and examines relationships between strategies and other factors like task type and expertise level. The results could help evaluate developer performance, improve comprehension models, and guide less experienced developers.
GTC 2021: Counterfactual Learning to Rank in E-commerceGrubhubTech
Many ecommerce companies have extensive logs of user behavior such as clicks and conversions. However, if supervised learning is naively applied, then systems can suffer from poor performance due to bias and feedback loops. Using techniques from counterfactual learning we can leverage log data in a principled manner in order to model user behaviour and build personalized recommender systems. At Grubhub, a user journey begins with recommendations and the vast majority of conversions are powered by recommendations. Our recommender policies can drive user behavior to increase orders and/or profit. Accordingly, the ability to rapidly iterate and experiment is very important. Because of our powerful GPU workflows, we can iterate 200% more rapidly than with counterpart CPU workflows. Developers iterate ideas with notebooks powered by GPUs. Hyperparameter spaces are explored up to 8x faster with multi-GPUs Ray clusters. Solutions are shipped from notebooks to production in half the time with nbdev. With our accelerated DS workflows and Deep Learning on GPUs, we were able to deliver a +12.6% conversion boost in just a few months. In this talk we hope to present modern techniques for industrial recommender systems powered by GPU workflows. First a small background on counterfactual learning techniques, then followed by practical information and data from our industrial application.
By Alex Egg, accepted to Nvidia GTC 2021 Conference
Experiments on Design Pattern DiscoveryTim Menzies
The document describes experiments conducted to discover design patterns from source code. It outlines the approach taken by DP-Miner tool, presents experiment data on four Java systems, and evaluates results by calculating precision and recall values. Benchmarks are lacking for accurately evaluating design pattern discovery techniques.
This document presents a method for selectively acquiring contextual information in travel recommender systems to improve recommendations. It proposes acquiring only the most important contextual factors for each user-item pair, rather than all available factors. It describes an algorithm called Largest Deviation that calculates relevance scores for factors based on their impact on rating predictions. An evaluation on two datasets found Largest Deviation achieved better prediction accuracy and ranking quality compared to baseline methods, while acquiring conditions for fewer contextual factors. The selective context acquisition approach allows travel recommender systems to provide more personalized recommendations without needing all available contextual information.
Contextual Information Elicitation in Travel Recommender SystemsMatthias Braunhofer
Context-Aware Recommender Systems are advisory applications that exploit users’ preference knowledge contained in datasets of context-dependent user ratings, i.e., ratings augmented with the description of the contextual situation detected when the user experienced the item and rated it. Since the space of context-dependent ratings increases exponentially in size with the number of contextual factors, and because certain contextual information is still hard to acquire automatically (e.g., the user’s mood or the travellers’ group composition), it is fundamental to identify and acquire only those factors that truly influence the user preferences and consequently the ratings and the recommendations. In this paper, we propose a novel method that estimates the impact of a contextual factor on rating predictions and adaptively elicits from the users only the relevant ones. Our experimental evaluation, on two travel-related datasets, shows that our method compares favorably to other state-of-the-art context selection methods.
CABT SHS Statistics & Probability - Estimation of Parameters (intro)Gilbert Joseph Abueg
This document provides an overview of inferential statistics and parameter estimation through a lecture presentation. It discusses point estimation and interval estimation to approximate population parameters from sample data. Point estimates are specific numerical values while interval estimates provide a range of values with an associated confidence level. Common point estimators include the sample mean, proportion, and standard deviation. Interval estimates use a confidence level to express the probability that the true population parameter falls within the calculated interval. Formulas are provided for constructing confidence intervals for the population mean with both known and unknown variances.
An evaluation of SimRank and Personalized PageRank to build a recommender sys...Paolo Tomeo
The Web of Data is the natural evolution of the World Wide Web from a set of interlinked documents to a set of interlinked entities. It is a graph of information resources interconnected by semantic relations, thereby yielding the name Linked Data. The proliferation of Linked Data is for sure an opportunity to create a new family of data-intensive applications such as recommender systems. In particular, since content-based recommender systems base on the notion of similarity between items, the selection of the right graph-based similarity metric is of paramount importance to build an effective recommendation engine. In this paper, we review two existing metrics, SimRank and PageRank, and investigate their suitability and performance for computing similarity between resources in RDF graphs and investigate their usage to feed a content-based recommender system. Finally, we conduct experimental evaluations on a dataset for musical artists and bands recommendations thus comparing our results with two other content-based baselines
measuring their performance with precision and recall, catalog coverage, items distribution and novelty metrics.
This document discusses uncertainty analysis and the importance of recording assumptions. It explains that single point estimates are often inaccurate, and it is better to estimate variables as ranges or distributions. Tools like sensitivity analysis can help identify the key drivers of uncertainty. Monte Carlo analysis incorporates the uncertainty ranges into simulations to generate outputs. All estimates and assumptions must be thoroughly documented in files like the Master Data Assumption List, as models may be audited. Recording assumptions provides evidence for results and allows others to understand and validate the analysis.
Aleksandar Kapisoda: The semantic approach for tracking scientific publicationsSemantic Web Company
The document discusses Boehringer Ingelheim Pharma's development of a publication tracking system using semantic technologies. It aims to automatically import publication data, perform data curation, and enable advanced visualization and analysis. Some key challenges include cleaning noisy author and institution data, adding internal BI data, and linking to external impact factors. The system utilizes tools like PoolParty, Virtuoso, and SPARQL to semantically enrich and link publication data. It is meant to provide advanced analytics beyond what was possible in their previous manually curated system.
R is a programming language and software environment for statistical analysis and graphics. It is used for data manipulation, statistics, and graphics. R allows users to create functions (like spells for wizards) and relies on functions developed by statistical researchers. While initially developed in the 1990s, R has grown significantly with over 800 add-on packages. Data mining involves exploring large datasets to discover patterns and make predictions. Common techniques in R include classification, clustering, association rule mining, and decision trees.
The document discusses key concepts in quantitative research methods and data analytics covered in a university course. It outlines the course content, which includes topics like data visualization, the normal distribution, and hypothesis testing. It then details the course assessments, which include a mid-term assignment and final coursework report worth 30% and 70% respectively. The final report involves selecting a topic, collecting and analyzing data using R Studio, and reporting the results in a 2000 word paper with sections on introduction, data, results, and conclusion.
The document discusses combining user experience research and web analytics to gain a more holistic understanding of users. It begins by outlining the speaker's background and defining key terms. It then explores why combining methods is beneficial by noting their individual strengths and weaknesses in capturing both qualitative and quantitative insights. The document also examines why these areas are not routinely combined, then provides three opportunities for integration: 1) Using customer research to inform analytics metrics 2) Leveraging analytics to drive user research 3) Integrating the areas throughout the product lifecycle to continually optimize the user experience. Case studies and tips for getting started with each opportunity are also presented.
Representative Of The Populationseek Your Dream/Tutorialoutletdotcomapjk512
FOR MORE CLASSES VISIT
www.tutorialoutlet.com
ECO 301: Decision-Making Analysis Paper Guidelines and Rubric
Overview
Your final project for this course is a detailed analysis of a specific problem statement. How economic themes, such as demand, production, cost, and market
structure relate to a particular company will be a focus of this analysis. You will analyze these components with quantitative techniques,
The document introduces the Multidimensional Poverty Assessment Tool (MPAT), which assesses ten dimensions of rural poverty through household surveys. It describes how MPAT works by collecting perception data through surveys, transforming the data into indicator scores, and combining scores to composite indicators. The document also summarizes how MPAT was developed with expert input, tested in multiple countries, evaluated independently, and can be implemented through the MPAT Excel spreadsheet which automatically calculates results.
A proposal for the inclusion of accessibility criteria in the publishing work...adaptabit
The document proposes including accessibility criteria in the publishing workflow of images in biomedical academic articles. It discusses how visual content is important but often inaccessible. It then outlines a behavior change wheel model to intervene at different points in the submission process, such as educating authors, improving tools, and introducing validation steps. Checklists are provided to help make figures accessible by including detailed descriptions and explanations for labels, colors, adjustments, scales, and more. The overall goal is to ensure images are born digitally accessible.
Andrea Dal Pozzolo is a data scientist passionate about machine learning, data mining, and applying statistical techniques. He has a Ph.D. in computer science from Université Libre de Bruxelles and has worked on credit card fraud detection. Currently he is a quantitative consultant at Ernst & Young applying machine learning algorithms to conduct risk and credit/market risk modeling.
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
The release of the Data Cube Vocabulary specification introduces a standardised method for publishing statistics following the linked data principles. However, a statistical dataset can be very complex, and so understanding how to get value out of it may be hard. Analysts need the ability to quickly grasp the content of the data to be able to make use of it appropriately. In addition, while remodelling the data, data cube publishers need support to detect bugs and issues in the structure or content of the dataset. There are several aspects of RDF, the Data Cube vocabulary and linked data that can help with these issues however, including that they make the data "self-descriptive". Here, we attempt to answer the question "How feasible is it to use this feature to give an overview of the data in a way that would facilitate debugging and exploration of statistical linked open data?" We present a tool that automatically builds interactive facets as diagrams out of a Data Cube representation without prior knowledge of the data content to be used for debugging and early analysis. We show how this tool can be used on a large, complex dataset and we discuss the potential of this approach.
This document summarizes a presentation on structural equation modeling (SEM) given by Scott MacLean of Nulink Analytics. The presentation covered key concepts in SEM including the use of latent variables and indicators to model constructs that cannot be directly observed. It provided examples of formative versus reflective measurement models and discussed the importance of properly specifying these models. The presentation also addressed topics like goodness of fit indices and analyzing models with formative factors. It concluded with software and training suggestions for working with SEM.
Ähnlich wie Parsimonious and Adaptive Contextual Information Acquisition in Recommender Systems (20)
Discover the benefits of outsourcing SEO to Indiadavidjhones387
"Discover the benefits of outsourcing SEO to India! From cost-effective services and expert professionals to round-the-clock work advantages, learn how your business can achieve digital success with Indian SEO solutions.
HijackLoader Evolution: Interactive Process HollowingDonato Onofri
CrowdStrike researchers have identified a HijackLoader (aka IDAT Loader) sample that employs sophisticated evasion techniques to enhance the complexity of the threat. HijackLoader, an increasingly popular tool among adversaries for deploying additional payloads and tooling, continues to evolve as its developers experiment and enhance its capabilities.
In their analysis of a recent HijackLoader sample, CrowdStrike researchers discovered new techniques designed to increase the defense evasion capabilities of the loader. The malware developer used a standard process hollowing technique coupled with an additional trigger that was activated by the parent process writing to a pipe. This new approach, called "Interactive Process Hollowing", has the potential to make defense evasion stealthier.
Securing BGP: Operational Strategies and Best Practices for Network Defenders...APNIC
Md. Zobair Khan,
Network Analyst and Technical Trainer at APNIC, presented 'Securing BGP: Operational Strategies and Best Practices for Network Defenders' at the Phoenix Summit held in Dhaka, Bangladesh from 23 to 24 May 2024.
Honeypots Unveiled: Proactive Defense Tactics for Cyber Security, Phoenix Sum...APNIC
Adli Wahid, Senior Internet Security Specialist at APNIC, delivered a presentation titled 'Honeypots Unveiled: Proactive Defense Tactics for Cyber Security' at the Phoenix Summit held in Dhaka, Bangladesh from 23 to 24 May 2024.
Parsimonious and Adaptive Contextual Information Acquisition in Recommender Systems
1. IntRS’15 - September 2015, Vienna, Austria
Parsimonious and Adaptive Contextual
Information Acquisition in Recommender
Systems
Matthias Braunhofer1
, Ignacio Fernández-Tobías2
and Francesco Ricci1
1
Free University of Bozen - Bolzano
Piazza Domenicani 3, 39100 Bolzano, Italy
{mbraunhofer,fricci}@unibz.it
2
Universidad Autónoma de Madrid
C / Francisco Tomás y Valiente 11, 28049 Madrid, Spain
ignacio.fernandezt@uam.es
2. IntRS’15 - September 2015, Vienna, Austria
Outline
2
• Introduction
• Related Works
• Selective Context Acquisition
• Experimental Evaluation and Results
• Conclusions and Future Work
3. IntRS’15 - September 2015, Vienna, Austria
Outline
2
• Introduction
• Related Works
• Selective Context Acquisition
• Experimental Evaluation and Results
• Conclusions and
4. IntRS’15 - September 2015, Vienna, Austria
Context-Aware Recommender Systems
• Context-Aware Recommender Systems (CARSs) aim to provide better
recommendations by exploiting contextual information (e.g., weather)
• Rating prediction function is: R: Users x Items x Context → Ratings
3
3 ? 4
2 5 4
? 3 4
1 ? 1
2 5
? 3
3 ? 5
2 5
? 3
5 ? 5
4 5 4
? 3 5
5. IntRS’15 - September 2015, Vienna, Austria
Challenges for CARSs
4
• Identification of contextual factors that influence user preferences and the
decision making process, and hence are worth to be collected from the users
along with their ratings
• Development of a predictive model for predicting the user’s ratings for items
under various contextual situations
• Design of a human-computer interaction layer on top of the predictive model
6. IntRS’15 - September 2015, Vienna, Austria
Challenges for CARSs
4
• Identification of contextual factors that influence user preferences and the
decision making process, and hence are worth to be collected from the users
along with their ratings
• Development of a predictive model for predicting the user’s ratings for items
under various contextual situations
• Design of a human-computer interaction layer on top of the predictive model
7. IntRS’15 - September 2015, Vienna, Austria
Example
STS (South Tyrol Suggests)
5
STS provides context-
aware suggestions for
Places Of Interest (POIs)
in South Tyrol, Italy
8. IntRS’15 - September 2015, Vienna, Austria
Example
STS (South Tyrol Suggests)
5
STS provides context-
aware suggestions for
Places Of Interest (POIs)
in South Tyrol, Italy
9. IntRS’15 - September 2015, Vienna, Austria
Example
STS (South Tyrol Suggests)
5
STS provides context-
aware suggestions for
Places Of Interest (POIs)
in South Tyrol, Italy
10. IntRS’15 - September 2015, Vienna, Austria
Example
STS (South Tyrol Suggests)
5
STS provides context-
aware suggestions for
Places Of Interest (POIs)
in South Tyrol, Italy
11. IntRS’15 - September 2015, Vienna, Austria
Example
STS (South Tyrol Suggests)
5
STS provides context-
aware suggestions for
Places Of Interest (POIs)
in South Tyrol, Italy
12. IntRS’15 - September 2015, Vienna, Austria
Example
STS (South Tyrol Suggests)
5
STS provides context-
aware suggestions for
Places Of Interest (POIs)
in South Tyrol, Italy
13. IntRS’15 - September 2015, Vienna, Austria
Example
STS (South Tyrol Suggests)
5
STS provides context-
aware suggestions for
Places Of Interest (POIs)
in South Tyrol, Italy
14. IntRS’15 - September 2015, Vienna, Austria
Example
STS (South Tyrol Suggests)
5
STS provides context-
aware suggestions for
Places Of Interest (POIs)
in South Tyrol, Italy
15. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
16. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
17. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
18. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
19. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
20. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
21. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
22. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
23. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
24. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
25. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
26. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
27. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
28. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
29. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/o Selective Context Acquisition
6
Don’t.
All contextual factors are
requested.
30. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/ Selective Context Acquisition
7
Do.
Only relevant contextual
factors are requested.
31. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/ Selective Context Acquisition
7
Do.
Only relevant contextual
factors are requested.
32. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/ Selective Context Acquisition
7
Do.
Only relevant contextual
factors are requested.
33. IntRS’15 - September 2015, Vienna, Austria
Example
STS w/ Selective Context Acquisition
7
Do.
Only relevant contextual
factors are requested.
34. IntRS’15 - September 2015, Vienna, Austria
Outline
8
• Related Works
• Selective Context Acquisition
• Experimental Evaluation and Results
• Conclusions and Future Work
• Introduction
35. IntRS’15 - September 2015, Vienna, Austria
Context Selection
A Priori (i.e., Before Collecting Ratings)
• (Baltrunas et al., 2012): Development of a
web survey where users were requested to
evaluate the influence of contextual
conditions on POI categories
• This allowed to identify the relevant
contextual factors for different POI
categories (using mutual information
statistic)
• Pros: can acquire ratings under relevant
contextual conditions
• Cons: artificial setting; survey requires extra
effort from the user
9
36. IntRS’15 - September 2015, Vienna, Austria
Context Selection
A Posteriori (i.e., After Collecting Ratings)
• (Odić et al., 2013): Provision of several
statistic-based methods for detection of
relevant context, i.e., unalikeability,
entropy, sample variance, χ2
test,
Freeman–Halton test
• Results show a significant difference in
prediction of ratings in context detected as
relevant and the one detected as irrelevant
• Pros: can improve rating prediction
• Cons: still irrelevant context is acquired in
the rating acquisition phase
10
Relevant context Unclassified context
Irrelevant context Baseline predictors
37. IntRS’15 - September 2015, Vienna, Austria
Outline
11
• Related Works
• Selective Context Acquisition
• Experimental Evaluation and Results
• Conclusions and F
• Introduction
38. IntRS’15 - September 2015, Vienna, Austria
Parsimonious & Adaptive Context Acquisition
• Main idea: for each user-item pair (u, i), identify the contextual factors that
when acquired together with u’s rating for i improve most the overall system
• Heuristic: acquire the contextual factors that have the largest impact on
rating prediction
• Example:
12
(Alice, Skiing)
Season
Weather
Temperature
Daytime
Impact
0.000 0.125 0.250 0.375 0.500
39. IntRS’15 - September 2015, Vienna, Austria
Parsimonious & Adaptive Context Acquisition
• Main idea: for each user-item pair (u, i), identify the contextual factors that
when acquired together with u’s rating for i improve most the overall system
• Heuristic: acquire the contextual factors that have the largest impact on
rating prediction
• Example:
12
(Alice, Skiing)
Season
Weather
Temperature
Daytime
Impact
0.000 0.125 0.250 0.375 0.500
How to
quantify this
impact?
40. IntRS’15 - September 2015, Vienna, Austria
CARS Prediction Model
• We use a new variant of Context-Aware Matrix Factorization (CAMF)
(Baltrunas et al., 2011) that treats contextual conditions similarly to either item
or user attributes
• Advantage: allows to capture latent correlations and patterns between a
potentially wide range of knowledge sources ⟹ ideal to derive the usefulness
of contextual factors
13
ˆruic1,...,ck
= (qi + xa
a∈A(i)∪C(i)
∑ )T
⋅(pu + yb
b∈A(u)∪C(u)
∑ )+ ri + bu
qi latent factor vector of item i
A(i) set of conventional item attributes (e.g., genre)
C(i) set of contextual item attributes (e.g., weather)
xa latent factor vector of item attribute a
pu latent factor vector of user u
A(u) set of conventional user attributes (e.g., age)
C(u) set of contextual user attributes (e.g., mood)
yb latent factor vector of user attribute b
ṝi average rating for item i
bu baseline for user u
41. IntRS’15 - September 2015, Vienna, Austria
Largest Deviation
• Computes a personalized relevance score for a contextual factor Cj and a
user-item pair (u, i)
• Given (u, i), it first measures the “impact” of each contextual condition cj ∈ Cj
by calculating the absolute deviation between the rating prediction when the
condition holds (i.e., ȓuicj) and the predicted context-free rating (i.e., ȓui):
where fcj is the normalized frequency of cj
• Finally, it takes the average of these individual scores for the contextual
conditions to yield a single relevance score for the contextual factor Cj
14
ˆwuicj
= fcj
ˆruicj
− ˆrui ,
42. IntRS’15 - September 2015, Vienna, Austria
Illustrative Example
• ȓAlice Skiing Sunny = 5
• ȓAlice Skiing = 3.5
• 20% of ratings are tagged with Sunny (i.e., fSunny = 0.2)
• ŵAlice Skiing Sunny = 0.2⋅|5 - 3.5| = 0.3
15
43. IntRS’15 - September 2015, Vienna, Austria
Outline
16
• Introduction
• Related Works
• Selective Context Acquisition
• Experimental Evaluation and Results
• Conclusions
44. IntRS’15 - September 2015, Vienna, Austria
CoMoDa TripAdvisor
Domain Movies POIs
Rating scale 1-5 1-5
Ratings 2,098 4,147
Users 112 3,916
Items 1,189 569
Contextual factors 12 3
Contextual conditions 49 31
User attributes 4 2
Item features 7 12
Datasets
17
45. IntRS’15 - September 2015, Vienna, Austria
CoMoDa TripAdvisor
Domain Movies POIs
Rating scale 1-5 1-5
Ratings 2,098 4,147
Users 112 3,916
Items 1,189 569
Contextual factors 12 3
Contextual conditions 49 31
User attributes 4 2
Item features 7 12
Datasets
17
time,
daytype, season,
location, weather,
social, mood, …
46. IntRS’15 - September 2015, Vienna, Austria
CoMoDa TripAdvisor
Domain Movies POIs
Rating scale 1-5 1-5
Ratings 2,098 4,147
Users 112 3,916
Items 1,189 569
Contextual factors 12 3
Contextual conditions 49 31
User attributes 4 2
Item features 7 12
Datasets
17
age, gender, city,
country
47. IntRS’15 - September 2015, Vienna, Austria
CoMoDa TripAdvisor
Domain Movies POIs
Rating scale 1-5 1-5
Ratings 2,098 4,147
Users 112 3,916
Items 1,189 569
Contextual factors 12 3
Contextual conditions 49 31
User attributes 4 2
Item features 7 12
Datasets
17
director, country,
language, year, budget,
genres, actors
48. IntRS’15 - September 2015, Vienna, Austria
CoMoDa TripAdvisor
Domain Movies POIs
Rating scale 1-5 1-5
Ratings 2,098 4,147
Users 112 3,916
Items 1,189 569
Contextual factors 12 3
Contextual conditions 49 31
User attributes 4 2
Item features 7 12
Datasets
17
type, month and year
of the trip
49. IntRS’15 - September 2015, Vienna, Austria
CoMoDa TripAdvisor
Domain Movies POIs
Rating scale 1-5 1-5
Ratings 2,098 4,147
Users 112 3,916
Items 1,189 569
Contextual factors 12 3
Contextual conditions 49 31
User attributes 4 2
Item features 7 12
Datasets
17
user location, member
type
50. IntRS’15 - September 2015, Vienna, Austria
CoMoDa TripAdvisor
Domain Movies POIs
Rating scale 1-5 1-5
Ratings 2,098 4,147
Users 112 3,916
Items 1,189 569
Contextual factors 12 3
Contextual conditions 49 31
User attributes 4 2
Item features 7 12
Datasets
17
item type,
amenities, item
locality, price range,
hotel class, …
52. IntRS’15 - September 2015, Vienna, Austria
Evaluation Procedure
Overview
18
• Repeated random sub-sampling validation (20 times):
53. IntRS’15 - September 2015, Vienna, Austria
Evaluation Procedure
Overview
18
25% 50% 25%
Training set Candidate set Testing set
• Repeated random sub-sampling validation (20 times):
• Randomly partition the ratings into three subsets
54. IntRS’15 - September 2015, Vienna, Austria
Evaluation Procedure
Overview
18
25% 50% 25%
Training set Candidate set Testing set
• Repeated random sub-sampling validation (20 times):
• For each user-item pair (u,i) in the candidate set, compute the N most relevant
contextual factors and transfer the corresponding rating and context information ruic
in the candidate set to the training set as ruic' with c' ⊆ c containing the associated
contextual conditions for these factors
• Randomly partition the ratings into three subsets
55. IntRS’15 - September 2015, Vienna, Austria
Evaluation Procedure
Overview
18
25% 50% 25%
Training set Candidate set Testing set
• Repeated random sub-sampling validation (20 times):
• For each user-item pair (u,i) in the candidate set, compute the N most relevant
contextual factors and transfer the corresponding rating and context information ruic
in the candidate set to the training set as ruic' with c' ⊆ c containing the associated
contextual conditions for these factors
• Randomly partition the ratings into three subsets
56. IntRS’15 - September 2015, Vienna, Austria
Evaluation Procedure
Overview
18
25% 50% 25%
Training set Candidate set Testing set
• Repeated random sub-sampling validation (20 times):
• For each user-item pair (u,i) in the candidate set, compute the N most relevant
contextual factors and transfer the corresponding rating and context information ruic
in the candidate set to the training set as ruic' with c' ⊆ c containing the associated
contextual conditions for these factors
• Randomly partition the ratings into three subsets
57. IntRS’15 - September 2015, Vienna, Austria
Evaluation Procedure
Overview
18
25% 50% 25%
Training set Candidate set Testing set
• Repeated random sub-sampling validation (20 times):
• For each user-item pair (u,i) in the candidate set, compute the N most relevant
contextual factors and transfer the corresponding rating and context information ruic
in the candidate set to the training set as ruic' with c' ⊆ c containing the associated
contextual conditions for these factors
• Randomly partition the ratings into three subsets
58. IntRS’15 - September 2015, Vienna, Austria
Evaluation Procedure
Overview
18
25% 50% 25%
Training set Candidate set Testing set
• Repeated random sub-sampling validation (20 times):
• For each user-item pair (u,i) in the candidate set, compute the N most relevant
contextual factors and transfer the corresponding rating and context information ruic
in the candidate set to the training set as ruic' with c' ⊆ c containing the associated
contextual conditions for these factors
• Randomly partition the ratings into three subsets
59. IntRS’15 - September 2015, Vienna, Austria
Evaluation Procedure
Overview
18
25% 50% 25%
Training set Candidate set Testing set
• Repeated random sub-sampling validation (20 times):
• For each user-item pair (u,i) in the candidate set, compute the N most relevant
contextual factors and transfer the corresponding rating and context information ruic
in the candidate set to the training set as ruic' with c' ⊆ c containing the associated
contextual conditions for these factors
• Randomly partition the ratings into three subsets
60. IntRS’15 - September 2015, Vienna, Austria
Evaluation Procedure
Overview
18
25% 50% 25%
Training set Candidate set Testing set
• Repeated random sub-sampling validation (20 times):
• For each user-item pair (u,i) in the candidate set, compute the N most relevant
contextual factors and transfer the corresponding rating and context information ruic
in the candidate set to the training set as ruic' with c' ⊆ c containing the associated
contextual conditions for these factors
• Randomly partition the ratings into three subsets
• Measure user-averaged MAE (U-MAE), Precision@10 and Recall@10 on the testing
set, after training the prediction model on the new extended training set
61. IntRS’15 - September 2015, Vienna, Austria
Evaluation Procedure
Overview
18
25% 50% 25%
Training set Candidate set Testing set
• Repeated random sub-sampling validation (20 times):
• For each user-item pair (u,i) in the candidate set, compute the N most relevant
contextual factors and transfer the corresponding rating and context information ruic
in the candidate set to the training set as ruic' with c' ⊆ c containing the associated
contextual conditions for these factors
• Randomly partition the ratings into three subsets
• Measure user-averaged MAE (U-MAE), Precision@10 and Recall@10 on the testing
set, after training the prediction model on the new extended training set
• Repeat
62. IntRS’15 - September 2015, Vienna, Austria
user-item pair
top two contextual factors
rating transferred to training set
Evaluation Procedure
Example
19
+
+
=
rating in candidate set
63. IntRS’15 - September 2015, Vienna, Austria
(Alice, Skiing)
top two contextual factors
rating transferred to training set
Evaluation Procedure
Example
19
+
+
=
rating in candidate set
64. IntRS’15 - September 2015, Vienna, Austria
(Alice, Skiing)
Season and Weather
rating transferred to training set
Evaluation Procedure
Example
19
+
+
=
rating in candidate set
65. IntRS’15 - September 2015, Vienna, Austria
(Alice, Skiing)
Season and Weather
rating transferred to training set
Evaluation Procedure
Example
19
rAlice Skiing Winter, Sunny, Warm, Morning = 5+
+
=
66. IntRS’15 - September 2015, Vienna, Austria
(Alice, Skiing)
Season and Weather
Evaluation Procedure
Example
19
rAlice Skiing Winter, Sunny, Warm, Morning = 5
rAlice Skiing Winter, Sunny = 5
+
+
=
67. IntRS’15 - September 2015, Vienna, Austria
Baseline Methods for Evaluation
• Mutual Information (Baltrunas et al., 2012): given a user-item pair (u,i), it
computes the relevance score for the contextual factor Cj as the normalized
mutual information between the ratings for items belonging to i’s category
and Cj
• Freeman-Halton Test (Odić et al., 2013): calculates the relevance of a
contextual factor Cj using the Freeman-Halton test, which is the Fisher’s exact
test extended for contingency tables > 2 × 2
• Minimum Redundancy Maximum Relevance - mRMR (Peng et al., 2005):
ranks each contextual factor Cj according to its relevance to the rating
variable and redundancy to other contextual factors
• Random: randomly chooses the top N contextual factors for a user-item pair
20
68. IntRS’15 - September 2015, Vienna, Austria
Evaluation Results
U-MAE
21
CoMoDa
U-MAE
0.71
0.72
0.73
0.74
0.75
0.76
0.77
0.78
0.79
0.80
0.81
0.82
Number of Selected Contextual Factors
1 2 3 4
Largest Deviation Mutual Information Freeman-Halton mRMR Random All features
TripAdvisor
U-MAE
0.520
0.521
0.522
0.523
0.524
0.525
0.526
0.527
0.528
0.529
0.530
0.531
0.532
0.533
Number of Selected Contextual Factors
1 2 3
*
*
* * *
*
*
*
*
*
*
*
69. IntRS’15 - September 2015, Vienna, Austria
Evaluation Results
Precision@10
22
CoMoDa
Precision@10
0.0000
0.0002
0.0004
0.0006
0.0008
0.0010
0.0012
0.0014
0.0016
Number of Selected Contextual Factors
1 2 3 4
Largest Deviation Mutual Information Freeman-Halton mRMR Random All features
TripAdvisor
Precision@10
0.0100
0.0105
0.0110
0.0115
0.0120
0.0125
0.0130
0.0135
0.0140
0.0145
0.0150
0.0155
0.0160
Number of Selected Contextual Factors
1 2 3
*
*
*
*
*
* *
*
*
*
70. IntRS’15 - September 2015, Vienna, Austria
Evaluation Results
Recall@10
23
CoMoDa
Recall@10
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
Number of Selected Contextual Factors
1 2 3 4
Largest Deviation Mutual Information Freeman-Halton mRMR Random All features
TripAdvisor
Recall@10
0.100
0.105
0.110
0.115
0.120
0.125
0.130
0.135
0.140
0.145
0.150
0.155
0.160
Number of Selected Contextual Factors
1 2 3
*
*
*
*
* *
*
*
*
*
71. IntRS’15 - September 2015, Vienna, Austria
Evaluation Results
Practical Implications
• Using Largest Deviation, we know that we can ask only the contextual factors
C1, C2 and C3 when we ask user u to rate item i
24
72. IntRS’15 - September 2015, Vienna, Austria
Outline
25
• Introduction
• Related Works
• Selective Context Acquisition
• Experimental Evaluation and Results
• Conclusions and Future Work
73. IntRS’15 - September 2015, Vienna, Austria
Conclusions
• Identifying which contextual factors should be acquired from the user upon
rating an item is an important and practical problem for CARSs
• We tackled this problem with a new method that asks the user to specify
those contextual factors that if considered in the CARS prediction model
would produce a rating prediction that is most different from the context-free
prediction
• Results from our offline experiment confirm that the proposed parsimonious
context acquisition strategy elicits ratings with contextual information that
improve more the recommendation performance
26
74. IntRS’15 - September 2015, Vienna, Austria
Future Work
• Evaluate the performance of employing an Active Learning method for
adaptively selecting both the item to rate and the contextual information to
add
• Understand how the proposed method can be extended to generate requests
for contextual data that takes into account possible correlations between
contextual factors
• Update the evaluation procedure so that it can be used also on rating
datasets for which only a subset of contextual factors is known
• Integrate the developed method into our STS app and perform a live user
study
27