Dynamic Assortment Optimization. From Learning to Earning
As the infrastructure of information improves to better handle incoming real-time purchase data, the necessity for data-driven, automated assortment policies arises. In general, a seller of products has to consider multiple aspects, e.g., inventory management, demand management, budget constraints, pricing strategy and assortment planning. Proper management of these aspects can have a large impact on the success of his/her undertaking. In particular, sales data can be used effectively to maximize profit. This thesis considers assortment optimization, where an assortment is a collection or subset of all products - from which a customer chooses a product to purchase. The main question covered by this thesis is: how can a seller determine the optimal assortment of products - the subset which yields the highest expected profit - based on sales data. In particular, we consider dynamic assortment optimization over a finite time horizon in which we can adjust the offered assortment. To focus on the aspect of learning customers' preferences we consider a sequential decision framework. Then, the sequential decisions in a finite time window are based on past purchase behavior and are described by a policy. In this thesis, we provide policies for a variety of different settings. In addition, we analyze the performance of our policies through mathematically deriving bound on the performance metric that is the accumulated expected loss due to offering suboptimal assortments: the regret.