Center for Applied Mathematics Colloquium

Alex SlivkinsMicrosoft Research NYC
Adversarial Bandits with Knapsacks

Friday, November 22, 2019 - 3:30pm
Rhodes 655

"Bandits with Knapsacks" (BwK) is a general model for multi-armed bandits under supply/budget constraints, with motivating examples ranging from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling. While the prior work on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. The objective now is to minimize the "competitive ratio": the ratio of the benchmark reward to algorithm's reward, as regret minimization is no longer feasible.

We design an algorithm with competitive ratio O(log T) relative to the best fixed distribution over actions, where T is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new algorithm for the stochastic version of the problem, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version, and use it as a subroutine to solve the latter. This approach also leads to several extensions, both for stochastic and adversarial versions.

Joint work with Nicole Immorlica, Karthik A. Sankararaman, and Robert Schapire.

Alex Slivkins is a Principal Researcher at Microsoft Research New York City. Previously he was a researcher at MSR Silicon Valley after receiving his Ph.D. from Cornell and a brief postdoc at Brown. His research interests are in algorithms and theoretical computer science, spanning machine learning theory, algorithmic economics, and networks. Alex is particularly interested in exploration-exploitation tradeoff and online machine learning, and their manifestations in mechanism design and human computation. His work has been recognized with the best paper award at ACM EC 2010, the best paper nomination at WWW 2015, and the best student paper award at ACM PODC 2005. He is finishing a book, Introduction to Multi-armed Bandits, to be published with Foundations and Trends in Machine Learning.