Distribution Problems in Recommender Systems

•Als PPTX, PDF herunterladen•

1 gefällt mir•376 views

Traditional machine learning and collaborative filtering pay little attention to the sources of the data they use. The differences between the distribution backing the learning data, the distribution backing the algorithm output, and the distribution backing the ground truth are often completely different and almost unrelated to the target distribution: true ratings across all items for every user.

Technologie

Differences in Distributions and
Their Effect on Recommendation
System Performance
Why Collaborative Filtering Doesn’t Scale
(portions reference Prismatic’s Silicon Valley talk)

Overfitting
Distribution
of All Items
Across Users
Distribution of
All Items Across
All Users in the
Future
Concrete Set of
Past Items
Across Users
Concrete Set of
Future Items
Across Users

Recommender Systems Dilemma
Set of All Items Possible
Set of Items Known to Users in the Future
Set of Items Known to Users in the
Past
Set of Items
Recommended By
Recommenders
Items Viewed
Or Liked in
the Future
Items Users
Viewed Or Rated
in the Past
Items Seen in Ground
Truth Without
Changes in Item
Access
??????

Collaborative Filtering in Music
• Construct correlations between items from set of past known items
• Generate estimated distribution for past users across all items
• Hope ‘errors’ relate to future user liked items
• Gap between distributions escalates with the scale of data

Resulting Biases
Huge number of items where 50%+ of users only ever saw 20 songs a
month out of 3 million
Massive gap between all items and known items distribution
Cross Validation ground truth assumes the 50%+ users only ever saw
that new top 20 songs for the new set
Results are supposed to be based on if users knew all sets
Continuous user testing assumes ‘all items seen’ distributions, but
only the set of recommended items are new items seen
User data itself is a biased subset of the whole

First Generation Problems
• Everyone likes The Beatles or Norah Jones
• Extremely frequent in biased data sets
• Since everyone listened to before, everyone gets recommended them
• Recommendations usually repeat the top 40 of the data collection
• Users might like novel recommendations, but that won’t ever be in
the evaluation set in cross validation – users never saw them

Problems Over Time
• The ground truth is heavily biased by recommendations controlling
the set of known items
• Machine learning – including collaborative filtering – learns the algorithm
distribution more than users preferences
• Performance Bias
• Future ground truth comes from those that stayed in the system
• They liked the system
• It doesn’t represent those that were unhappy and left
• Biases data to keep existing users happy without regard to ex-users
• In extreme cases, even new users are discarded

Best Solution So Far
Past Data Idealized Future Distribution
Idealized Function Feature Value => Rating

Best Solution So Far
• Requires all Items be categorized and quantized
• Requires accuracy and general agreement on these values
• (Socially Defined versus Absolute)
• At least all features are present in all sets
• Transforms recommendation into optimization and personalization
• Set of items with highest score for a user
• Ability to predict poor performing product or agent solutions
• Better able to incorporate additional data
• Prediction is usually linear time over the number of items

Evaluation Adjustments
• No Replacement for Real World A/B testing
• Machine Learning for evaluation, not just the question
• Hidden dependencies and ‘cheating’
Learned Algorithm Model Training
Evaluation
Model
Model
Training
Business
Objective
Ground Truth

Weitere ähnliche Inhalte

Ähnlich wie Distribution Problems in Recommender Systems

Demystifying Recommendation Systems

Rumman Chowdhury

Modern Perspectives on Recommender Systems and their Applications in Mendeley

Kris Jack

Recommender Systems

Girish Khanzode

Overview of recommender system

Stanley Wang

IFIP Summer School 2015 - Using Authorization Logic to Capture User Policies ...

bogwonch

Tutorial presented by Muthusamy Chelliah (Flipkart, India) and Sudeshna Sarkar (IIT Kharagpur, India) at ACM RecSys 2017 https://recsys.acm.org/recsys17/tutorials/#content-tab-1-3-tab E-commerce websites commonly deploy recommender systems that make use of user activity (e.g., ratings, views, and purchases) or content (product descriptions). These recommender systems can benefit enormously by also exploiting the information contained in customer reviews. Reviews capture the experience of multiple customers with diverse preferences, often on the fine-grained level of specific features of products. Reviews can also identify consumers’ preferences for product features and provide helpful explanations. The usefulness of reviews is evidenced by the prevalence of their use by customers to support shopping decisions online. With the appropriate techniques, recommender systems can benefit directly from user reviews. This tutorial will present a range of techniques that allow recommender systems in e-commerce websites to take full advantage of reviews. Topics covered include text mining methods for feature-specific sentiment analysis of products, topic models and distributed representations that bridge the vocabulary gap between user reviews and product descriptions, and recommender algorithms that use review information to address the cold-start problem. The tutorial sessions will be interspersed with examples from an online marketplace (i.e., Flipkart) and experience with using data mining and Natural Language Processing techniques (e.g., matrix factorization, LDA, word embeddings) from Web-scale systems.

Product Recommendations Enhanced with Reviews

maranlar

Recommendation engine Using Genetic Algorithm

Culbert.ppt

Culbert.ppt

Culbert.ppt

Culbert.ppt

Recommender systems have become an important part of various applications in e-commerce, supporting both customers and providers in their decision-making processes. However, these systems still must overcome limitations that reduce their performance, like recommendations overspecialization, less popular item providing, and difficulties when items with unequal probability distribution appear or recommendations for sets of items are asked. A novel approach, addressing the above issues through a case-based recommendation methodology is presented here. The scope of the presented approach is to generate meaningful recommendations based on items' co-occurring patterns and to provide more insight into customers' buying habits. In contrast to current recommendation techniques that recommend items based on users' ratings or history, and to most case-based item recommenders that evaluate items' similarities, the implemented recommender uses a hierarchical model for the items and searches for similar sets of items, in order to recommend those that are most likely to satisfy a user.

case based recommendation approach for market basket data

mniranjanmurthy

Олександр Обєдніков “Рекомендательные системы”

Dakiry

Use of data science in recommendation system

AkashPatil334

Measuring Impact: Towards a data citation metric

Edward Baker

A recommendation system, often referred to as a recommender system or recommendation engine, is a type of machine learning application that provides personalized suggestions or recommendations to users. These systems are widely used in various domains to help users discover products, services, or content that are likely to be of interest to them. There are several approaches to building recommendation systems in machine learning:

Recommended System.pptx

Dr.Shweta

What does it mean to be in a truly data-driven organization? Josh Aberant dives into the data-driven culture that was the foundation of all decisions within the Twitter Growth team. Hear how #growthhacking can turn data nerds into superstars. In this session, learn methods for making data insights impactful on the business, as well as the benefits of enacting 1% experiments that anyone can do. Dive into some users state models and see how they can help scale data-driven decision making. He’ll cover the best practices that help make a data-driven organization successful and ultra-competitive in an environment where many are still struggling to just get by.

Josh Aberant - Data-Driven Digital Growth

Digital Experience (DX) Summit 2016

Recommender systems

Ruxandra Burtica

Thesis Presentation

nirvdrum

Fashiondatasc

Suman Bhattacharya, PhD

Ähnlich wie Distribution Problems in Recommender Systems (20)

Demystifying Recommendation Systems

Modern Perspectives on Recommender Systems and their Applications in Mendeley

Recommender Systems

Overview of recommender system

IFIP Summer School 2015 - Using Authorization Logic to Capture User Policies ...

Product Recommendations Enhanced with Reviews

Recommendation engine Using Genetic Algorithm

Culbert.ppt

case based recommendation approach for market basket data

Олександр Обєдніков “Рекомендательные системы”

Use of data science in recommendation system

Measuring Impact: Towards a data citation metric

Recommended System.pptx

Josh Aberant - Data-Driven Digital Growth

Recommender systems

Thesis Presentation

Fashiondatasc

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter

naman860154

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Civil Lines Women Seeking Men

Delhi Call girls

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

Scaling API-first – The story of a global engineering organization

Radu Cotescu

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

Enterprise Knowledge’s Urmi Majumder, Principal Data Architecture Consultant, and Fernando Aguilar Islas, Senior Data Science Consultant, presented "Driving Behavioral Change for Information Management through Data-Driven Green Strategy" on March 27, 2024 at Enterprise Data World (EDW) in Orlando, Florida. In this presentation, Urmi and Fernando discussed a case study describing how the information management division in a large supply chain organization drove user behavior change through awareness of the carbon footprint of their duplicated and near-duplicated content, identified via advanced data analytics. Check out their presentation to gain valuable perspectives on utilizing data-driven strategies to influence positive behavioral shifts and support sustainability initiatives within your organization. In this session, participants gained answers to the following questions: - What is a Green Information Management (IM) Strategy, and why should you have one? - How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication? - How can an organization use insights into their data to influence employee behavior for IM? - How can you reap additional benefits from content reduction that go beyond Green IM?

Driving Behavioral Change for Information Management through Data-Driven Gree...

Enterprise Knowledge

Evaluating the top large language models.pdf

ChristopherTHyatt

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

HampshireHUG

Finology Group – Insurtech Innovation Award 2024

The Digital Insurer

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

In an era where artificial intelligence (AI) stands at the forefront of business innovation, Information Architecture (IA) is at the core of functionality. See “There’s No AI Without IA” – (from 2016 but even more relevant today) Understanding and leveraging how Information Architecture (IA) supports AI synergies between knowledge engineering and prompt engineering is critical for senior leaders looking to successfully deploy AI for internal and externally facing knowledge processes. This webinar be a high-level overview of the methodologies that can elevate AI-driven knowledge processes supporting both employees and customers. Core Insights Include: Strategic Knowledge Engineering: Delve into how structuring AI's knowledge base is required to prevent hallucinations, enable contextual retrieval of accurate information. This will include discussion of gold standard libraries of use cases support testing various LLMs and structures and configurations of knowledge base. Precision in Prompt Engineering: Learn the art of crafting prompts that direct AI to deliver targeted, relevant responses, thereby optimizing customer experiences and business outcomes. Unified Approach for Enhanced AI Performance: Explore the intersection of knowledge and prompt engineering to develop AI systems that are not only more responsive but also aligned with overarching business strategies. Guiding Principles for Implementation: Equip yourself with best practices, ethical guidelines, and strategic considerations for embedding these technologies into your business ecosystem effectively. This webinar is designed to empower business and technology leaders with the knowledge to harness the full potential of AI, ensuring their organizations not only keep pace with digital transformation but lead the charge. Join us to map a roadmap to fully leverage Information Architecture (IA) and AI chart a course towards a future where AI is a key pillar of strategic innovation and business success.

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Earley Information Science

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

CNv6 Instructor Chapter 6 Quality of Service

giselly40

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

apidays

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter

08448380779 Call Girls In Civil Lines Women Seeking Men

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Scaling API-first – The story of a global engineering organization

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Driving Behavioral Change for Information Management through Data-Driven Gree...

Evaluating the top large language models.pdf

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Finology Group – Insurtech Innovation Award 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

Powerful Google developer tools for immediate impact! (2023-24 C)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

presentation ICT roal in 21st century education

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Exploring the Future Potential of AI-Enabled Smartphone Processors

How to Troubleshoot Apps for the Modern Connected Worker

CNv6 Instructor Chapter 6 Quality of Service

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Artificial Intelligence: Facts and Myths

Distribution Problems in Recommender Systems

1. Differences in Distributions and Their Effect on Recommendation System Performance Why Collaborative Filtering Doesn’t Scale (portions reference Prismatic’s Silicon Valley talk)

2. History of Recommendation

3. Overfitting Distribution of All Items Across Users Distribution of All Items Across All Users in the Future Concrete Set of Past Items Across Users Concrete Set of Future Items Across Users

4. Recommender Systems Dilemma Set of All Items Possible Set of Items Known to Users in the Future Set of Items Known to Users in the Past Set of Items Recommended By Recommenders Items Viewed Or Liked in the Future Items Users Viewed Or Rated in the Past Items Seen in Ground Truth Without Changes in Item Access ??????

5. Collaborative Filtering in Music • Construct correlations between items from set of past known items • Generate estimated distribution for past users across all items • Hope ‘errors’ relate to future user liked items • Gap between distributions escalates with the scale of data

6. Resulting Biases Huge number of items where 50%+ of users only ever saw 20 songs a month out of 3 million Massive gap between all items and known items distribution Cross Validation ground truth assumes the 50%+ users only ever saw that new top 20 songs for the new set Results are supposed to be based on if users knew all sets Continuous user testing assumes ‘all items seen’ distributions, but only the set of recommended items are new items seen User data itself is a biased subset of the whole

7. First Generation Problems • Everyone likes The Beatles or Norah Jones • Extremely frequent in biased data sets • Since everyone listened to before, everyone gets recommended them • Recommendations usually repeat the top 40 of the data collection • Users might like novel recommendations, but that won’t ever be in the evaluation set in cross validation – users never saw them

8. Problems Over Time • The ground truth is heavily biased by recommendations controlling the set of known items • Machine learning – including collaborative filtering – learns the algorithm distribution more than users preferences • Performance Bias • Future ground truth comes from those that stayed in the system • They liked the system • It doesn’t represent those that were unhappy and left • Biases data to keep existing users happy without regard to ex-users • In extreme cases, even new users are discarded

9. Best Solution So Far Past Data Idealized Future Distribution Idealized Function Feature Value => Rating

10. Best Solution So Far • Requires all Items be categorized and quantized • Requires accuracy and general agreement on these values • (Socially Defined versus Absolute) • At least all features are present in all sets • Transforms recommendation into optimization and personalization • Set of items with highest score for a user • Ability to predict poor performing product or agent solutions • Better able to incorporate additional data • Prediction is usually linear time over the number of items

11. Evaluation Adjustments • No Replacement for Real World A/B testing • Machine Learning for evaluation, not just the question • Hidden dependencies and ‘cheating’ Learned Algorithm Model Training Evaluation Model Model Training Business Objective Ground Truth

Distribution Problems in Recommender Systems

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Distribution Problems in Recommender Systems

Ähnlich wie Distribution Problems in Recommender Systems (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Distribution Problems in Recommender Systems