Timing as an Action: Learning When to Observe and Act

Helen Zhou, Audrey Huang, Kamyar Azizzadenesheli, Rachel Childers, Zachary C. Lipton

Abstract

In standard reinforcement learning setups, the agent receives observations and performs ac- tions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow- up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample- efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.

Type

Conference paper

Publication

AISTATS 2024

Date

March, 2024

Links

Preprint PDF Code Poster