SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Why the amazon sellers are buying the RTX 3080:
Dynamic pricing with RL
What is dynamic pricing?
Dynamic Pricing is a process of automated price adjustment for products or services in real-
time to maximise income and other economic performance indicators.
Why e-commerce?
By 2040, around 95% of all purchases globally are expected to be made online.
Dynamic Pricing for e-commerce: benefits
● Stay ahead of the competitors. Automation of competitors’ prices monitoring, allows
to quickly adapt to the dynamic environment.
● Increase in profits. By analysing the market, you can adjust the price of a product
to generate more revenues. If the demand for a product is low, you can boost it by lowering
the price, and if it is a peak season for a product, you can increase the price
without altering sales volume.
● Gain market insights. The continuous market examination allows retailers to stay aware of
prevailing market trends and get insights into consumers behaviour, which lead to better
decision-making.
Sales forecasting
Sales data is time-series data that
contains prices along with respective
sales and other features that could be useful in
driving sales (like inventory, adversarial spent,
competitors' prices, discounts, etc).
Greedy strategy for Dynamic Pricing
A typical solution is to discretise a continuous interval into a countable number of possible
prices and choose the price that leads to the highest income or another objective.
Reinforcement Learning basics: MDP
MDP - is a discrete-time stochastic control process. It consists of a state space S, action space
A, reward space R (subset of real numbers), and dynamics function p.
Reinforcement Learning basics: Agent-Environment interface
Agent and environment (MDP) interacts at each time steps t = 0, 1, 2, … . The agent performs an
action a, based on current state s. MDP receive a and respond with reward r and next state s’.
Together they produce the following sequence
The overall process can be
visualised with this picture:
Reinforcement Learning basics: Goals and Rewards
RL’s objective is to maximise the cumulative reward, which the agent receives in the long run
(reward hypothesis).
To formalise this idea, the notion of return should be defined. The return is some function of a
sequence of rewards. In the case of episodic tasks (those that terminate), it can be defined as:
In case of continuing tasks, we should also consider convergence properties of the infinite
sums. That’s way the following return is used:
Reinforcement Learning basics: Policy and Value functions
Policy - is a mapping from states to probabilities of selecting each possible action. Informally,
policy defines the rules of moving on MDP.
We can reformulate the reinforcement learning task as finding “the best” policy. The question is
what it means “the best”?
To answer this question the notions of state-value and action-value functions are defined:
Reinforcement Learning basics: Bellman optimality equation
We said that policy pi’ is better than policy pi if
It can be shown, that there exists a policy that is better than any other policy. Its state-values
and action-values should satisfy the following equations – Bellman optimality equations for v
and q
Reinforcement Learning for dynamic pricing
In terms of RL concepts, actions are all possible prices, and states are market conditions,
except for the current price of the product or service.
Usually, it is incredibly problematic to train an agent from an interaction with a real-world
market. There are two main reasons – time and exploration.
An alternative approach is to use a simulator of the environment.
Simulated sales data
In example we use simulated sales rather than real ones:
This allows us to analytically find a greddy policy and compare it with the RL agent’s
performance.
Experiments
The graph below shows the performance of the random and greedy agents.
Episodic task of 52 steps yields the following results:
Tabular Q-Learning
Q-learning is an off-policy temporal-difference control algorithm. Its main purpose is to
iteratively find the action values of an optimal policy (optimal action values).
The following updated formula is used:
This updated formula can also be treated as an iterative way of solving Bellman optimality
equations.
Also, before running this algorithm, we should discretise continuous variables.
Tabular Q-Learning: Results
As we can see, this approach outperforms a random agent, but cannot outperform a greedy
agent.
Deep Q-Network
Deep Q-Network (DQN) algorithm is based on the same idea as Tabular Q-learning. The main
difference is that DQN uses a parametrised function to approximate optimal action values.
The optimisation objective at iteration i looks as follows:
The gradient of the objective is as follows:
Deep Q-Network: Results
As we can see, DQN outperforms all other agents. Also, it was trained on a smaller number of
episodes.
Policy Gradients
Instead of learning optimal action values and moving greedy with respect to them, policy
gradients directly parametrise and optimise a policy.
The difficulty here is that an optimisation objective, depends on a dynamics function p, which is
unknown.
That is why policy gradients use the fact that the gradient of the objective is an expected
value of a random variable, which is approximated while acting in the environment.
Policy Gradients: Results
Policy gradients outperform a greedy agent but do not perform as well as DQN. Likewise, policy
gradients require far more episodes then DQN.
Conclusions
● Dynamic pricing can help retail players to stay
ahead of the market.
● RL-based dynamic pricing solves a problem
of greedy decision-making.
● Training agents on real market is time
consuming and can result in a loss of funds.
That's way a simulator of the environment is
used.

Weitere ähnliche Inhalte

Was ist angesagt?

Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenSmarten Augmented Analytics
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processchauhankapil
 
Combination portfolio
Combination portfolioCombination portfolio
Combination portfolioTaiLiLuo
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learningSKS
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Economics1 216181522-111024050500-phpapp01
Economics1 216181522-111024050500-phpapp01Economics1 216181522-111024050500-phpapp01
Economics1 216181522-111024050500-phpapp01pecmba11
 
Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15
Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15
Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15MLconf
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 

Was ist angesagt? (16)

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Marginal analysis
Marginal analysisMarginal analysis
Marginal analysis
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Combination portfolio
Combination portfolioCombination portfolio
Combination portfolio
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learning
 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
System Design & testing
System Design  & testing System Design  & testing
System Design & testing
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Economics1 216181522-111024050500-phpapp01
Economics1 216181522-111024050500-phpapp01Economics1 216181522-111024050500-phpapp01
Economics1 216181522-111024050500-phpapp01
 
Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15
Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15
Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 

Ähnlich wie Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic pricing with RL

Intro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIIntro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIMikko Mäkipää
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 
Modern Recommendation for Advanced Practitioners part2
Modern Recommendation for Advanced Practitioners part2Modern Recommendation for Advanced Practitioners part2
Modern Recommendation for Advanced Practitioners part2Flavian Vasile
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learningazzeddine chenine
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxMohibKhan79
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSVijaylakshmi
 
Reinforcement Learning for Algorithmic Trading
Reinforcement Learning for Algorithmic TradingReinforcement Learning for Algorithmic Trading
Reinforcement Learning for Algorithmic TradingSynaptonIncorporated
 
decision making in Lp
decision making in Lp decision making in Lp
decision making in Lp Dronak Sahu
 
Naive Reinforcement algorithm
Naive Reinforcement algorithmNaive Reinforcement algorithm
Naive Reinforcement algorithmSameerJolly2
 
rlpptgroup3-231018180804-0c05fb2f789piutt
rlpptgroup3-231018180804-0c05fb2f789piuttrlpptgroup3-231018180804-0c05fb2f789piutt
rlpptgroup3-231018180804-0c05fb2f789piutt201roopikha
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learningCairo University
 
Approaches in Analyzing Alternatives.pptx
Approaches in Analyzing Alternatives.pptxApproaches in Analyzing Alternatives.pptx
Approaches in Analyzing Alternatives.pptxRara599668
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptxRithikRaj25
 
Reinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperReinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperDataScienceLab
 
Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)Bean Yen
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningElias Hasnat
 

Ähnlich wie Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic pricing with RL (20)

Intro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIIntro to Reinforcement learning - part II
Intro to Reinforcement learning - part II
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
Modern Recommendation for Advanced Practitioners part2
Modern Recommendation for Advanced Practitioners part2Modern Recommendation for Advanced Practitioners part2
Modern Recommendation for Advanced Practitioners part2
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptx
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement Learning for Algorithmic Trading
Reinforcement Learning for Algorithmic TradingReinforcement Learning for Algorithmic Trading
Reinforcement Learning for Algorithmic Trading
 
decision making in Lp
decision making in Lp decision making in Lp
decision making in Lp
 
Naive Reinforcement algorithm
Naive Reinforcement algorithmNaive Reinforcement algorithm
Naive Reinforcement algorithm
 
rlpptgroup3-231018180804-0c05fb2f789piutt
rlpptgroup3-231018180804-0c05fb2f789piuttrlpptgroup3-231018180804-0c05fb2f789piutt
rlpptgroup3-231018180804-0c05fb2f789piutt
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learning
 
Approaches in Analyzing Alternatives.pptx
Approaches in Analyzing Alternatives.pptxApproaches in Analyzing Alternatives.pptx
Approaches in Analyzing Alternatives.pptx
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
 
Reinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperReinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine Sweeper
 
Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
ai.pptx
ai.pptxai.pptx
ai.pptx
 

Mehr von Lviv Startup Club

Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Lviv Startup Club
 
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Lviv Startup Club
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Lviv Startup Club
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Lviv Startup Club
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Lviv Startup Club
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Lviv Startup Club
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Lviv Startup Club
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Lviv Startup Club
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Lviv Startup Club
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Lviv Startup Club
 
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Lviv Startup Club
 
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Lviv Startup Club
 
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Lviv Startup Club
 
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Lviv Startup Club
 
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Lviv Startup Club
 
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Lviv Startup Club
 
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Lviv Startup Club
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Lviv Startup Club
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Lviv Startup Club
 

Mehr von Lviv Startup Club (20)

Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
 
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
 
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
Alexander Marchenko: Проблеми росту продуктової екосистеми (UA)
 
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
Oleksandr Grytsenko: Save your Job або прокачай скіли до Engineering Manageme...
 
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
Yuliia Pieskova: Фідбек: не лише "як", але й "коли" і "навіщо" (UA)
 
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)Nataliya Kryvonis: Essential soft skills to lead your team (UA)
Nataliya Kryvonis: Essential soft skills to lead your team (UA)
 
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
Volodymyr Salyha: Stakeholder Alchemy: Transforming Analysis into Meaningful ...
 
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
Anna Chalyuk: 7 інструментів та принципів, які допоможуть зробити вашу команд...
 
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
Oksana Smilka: Цінності, цілі та (де) мотивація (UA)
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
Andrii Skoromnyi: Чому не працює методика "5 Чому?" – і яка є альтернатива? (UA)
 
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
Maryna Sokyrko & Oleksandr Chugui: Building Product Passion: Developing AI ch...
 

Kürzlich hochgeladen

VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni
 
DEPED Work From Home WORKWEEK-PLAN.docx
DEPED Work From Home  WORKWEEK-PLAN.docxDEPED Work From Home  WORKWEEK-PLAN.docx
DEPED Work From Home WORKWEEK-PLAN.docxRodelinaLaud
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewasmakika9823
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Socio-economic-Impact-of-business-consumers-suppliers-and.pptx
Socio-economic-Impact-of-business-consumers-suppliers-and.pptxSocio-economic-Impact-of-business-consumers-suppliers-and.pptx
Socio-economic-Impact-of-business-consumers-suppliers-and.pptxtrishalcan8
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 

Kürzlich hochgeladen (20)

VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.
 
DEPED Work From Home WORKWEEK-PLAN.docx
DEPED Work From Home  WORKWEEK-PLAN.docxDEPED Work From Home  WORKWEEK-PLAN.docx
DEPED Work From Home WORKWEEK-PLAN.docx
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Socio-economic-Impact-of-business-consumers-suppliers-and.pptx
Socio-economic-Impact-of-business-consumers-suppliers-and.pptxSocio-economic-Impact-of-business-consumers-suppliers-and.pptx
Socio-economic-Impact-of-business-consumers-suppliers-and.pptx
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 

Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic pricing with RL

  • 1. Why the amazon sellers are buying the RTX 3080: Dynamic pricing with RL
  • 2. What is dynamic pricing? Dynamic Pricing is a process of automated price adjustment for products or services in real- time to maximise income and other economic performance indicators.
  • 3. Why e-commerce? By 2040, around 95% of all purchases globally are expected to be made online.
  • 4. Dynamic Pricing for e-commerce: benefits ● Stay ahead of the competitors. Automation of competitors’ prices monitoring, allows to quickly adapt to the dynamic environment. ● Increase in profits. By analysing the market, you can adjust the price of a product to generate more revenues. If the demand for a product is low, you can boost it by lowering the price, and if it is a peak season for a product, you can increase the price without altering sales volume. ● Gain market insights. The continuous market examination allows retailers to stay aware of prevailing market trends and get insights into consumers behaviour, which lead to better decision-making.
  • 5. Sales forecasting Sales data is time-series data that contains prices along with respective sales and other features that could be useful in driving sales (like inventory, adversarial spent, competitors' prices, discounts, etc).
  • 6. Greedy strategy for Dynamic Pricing A typical solution is to discretise a continuous interval into a countable number of possible prices and choose the price that leads to the highest income or another objective.
  • 7. Reinforcement Learning basics: MDP MDP - is a discrete-time stochastic control process. It consists of a state space S, action space A, reward space R (subset of real numbers), and dynamics function p.
  • 8. Reinforcement Learning basics: Agent-Environment interface Agent and environment (MDP) interacts at each time steps t = 0, 1, 2, … . The agent performs an action a, based on current state s. MDP receive a and respond with reward r and next state s’. Together they produce the following sequence The overall process can be visualised with this picture:
  • 9. Reinforcement Learning basics: Goals and Rewards RL’s objective is to maximise the cumulative reward, which the agent receives in the long run (reward hypothesis). To formalise this idea, the notion of return should be defined. The return is some function of a sequence of rewards. In the case of episodic tasks (those that terminate), it can be defined as: In case of continuing tasks, we should also consider convergence properties of the infinite sums. That’s way the following return is used:
  • 10. Reinforcement Learning basics: Policy and Value functions Policy - is a mapping from states to probabilities of selecting each possible action. Informally, policy defines the rules of moving on MDP. We can reformulate the reinforcement learning task as finding “the best” policy. The question is what it means “the best”? To answer this question the notions of state-value and action-value functions are defined:
  • 11. Reinforcement Learning basics: Bellman optimality equation We said that policy pi’ is better than policy pi if It can be shown, that there exists a policy that is better than any other policy. Its state-values and action-values should satisfy the following equations – Bellman optimality equations for v and q
  • 12. Reinforcement Learning for dynamic pricing In terms of RL concepts, actions are all possible prices, and states are market conditions, except for the current price of the product or service. Usually, it is incredibly problematic to train an agent from an interaction with a real-world market. There are two main reasons – time and exploration. An alternative approach is to use a simulator of the environment.
  • 13. Simulated sales data In example we use simulated sales rather than real ones: This allows us to analytically find a greddy policy and compare it with the RL agent’s performance.
  • 14. Experiments The graph below shows the performance of the random and greedy agents. Episodic task of 52 steps yields the following results:
  • 15. Tabular Q-Learning Q-learning is an off-policy temporal-difference control algorithm. Its main purpose is to iteratively find the action values of an optimal policy (optimal action values). The following updated formula is used: This updated formula can also be treated as an iterative way of solving Bellman optimality equations. Also, before running this algorithm, we should discretise continuous variables.
  • 16. Tabular Q-Learning: Results As we can see, this approach outperforms a random agent, but cannot outperform a greedy agent.
  • 17. Deep Q-Network Deep Q-Network (DQN) algorithm is based on the same idea as Tabular Q-learning. The main difference is that DQN uses a parametrised function to approximate optimal action values. The optimisation objective at iteration i looks as follows: The gradient of the objective is as follows:
  • 18. Deep Q-Network: Results As we can see, DQN outperforms all other agents. Also, it was trained on a smaller number of episodes.
  • 19. Policy Gradients Instead of learning optimal action values and moving greedy with respect to them, policy gradients directly parametrise and optimise a policy. The difficulty here is that an optimisation objective, depends on a dynamics function p, which is unknown. That is why policy gradients use the fact that the gradient of the objective is an expected value of a random variable, which is approximated while acting in the environment.
  • 20. Policy Gradients: Results Policy gradients outperform a greedy agent but do not perform as well as DQN. Likewise, policy gradients require far more episodes then DQN.
  • 21. Conclusions ● Dynamic pricing can help retail players to stay ahead of the market. ● RL-based dynamic pricing solves a problem of greedy decision-making. ● Training agents on real market is time consuming and can result in a loss of funds. That's way a simulator of the environment is used.