- This event has passed.
Colloquium- Sajad Modaresi University of North Carolina at Chapel Hill
18 Mar @ 3:30 pm - 4:30 pm
Colloquium- Sajad Modaresi University of North Carolina at Chapel Hill
18 Mar @ 3:30 pm – 4:30 pmSajad Mordaresi
Kenan-Flagler Business School
University of Northing Carolina at Chapel Hill
Exploration Optimization for Dynamic Assortment Personalization under Linear Preferences
Abstract
We study the dynamic assortment personalization problem of an online retailer that adaptively customizes assortments based on customers’ attributes to learn their preferences and maximize revenue. We assume that there is a linear relationship between product utilities and customer attributes which governs the customer preferences for products. The coefficient matrix characterizing these linear relationships is unknown to the retailer and as a result, the retailer faces the classic exploration (learning preferences) vs. exploitation (earning revenue) trade-off. We show that there are price-driven and linearity-driven efficiencies that can be leveraged for exploration. More specifically, we show that not all products and customer profiles need to be explored in order to recover the optimal assortments and maximize revenue. We use such insights and prove an instant-dependent lower bound on the regret (i.e., expected revenue loss relative to a clairvoyant retailer) of any admissible policy. We show that the regret lower bound depends on the optimal objective value of a linear program, which we call the Regret Lower Bound (RLB) problem. Even though the RLB is a linear program, solving this problem in practice might be challenging as it depends non-trivially on unknown (to the retailer) parameters. To address this issue, we propose a proxy for the RLB problem, which we call the exploration-optimization problem. We show that this proxy problem can be formulated as a Mixed Integer Linear Program (MILP) that can be effectively tackled with state-of-the-art solvers. Finally, we design efficient learning policies which identify an efficient exploration set by solving either the RLB or exploration-optimization problem. To illustrate the practical value of the proposed policies, we consider a setting calibrated using a dataset from a large Chilean retailer. We show that in addition to better computational running times, our proposed policies outperform the Thompson sampling benchmark in terms of regret (revenue).