Understanding Transformers as Algorithmic Executors: Theory and Applications
Y5-205 (YEUNG)
ABSTRACT
Transformers have demonstrated impressive capabilities on complex reasoning tasks. However, most existing studies of reasoning focus on natural language problems or formal domains such as coding. In contrast, numerical reasoning tasks, such as scientific prediction and time-series forecasting, remain less studied. In this talk, we explore the capabilities of transformers in the more mathematically concrete setting of numerical reasoning, where models reason by effectively executing learned algorithms based on the input context. We will first present recent applications of transformers to scientific prediction, and then discuss theoretical learning guarantees for transformers in the in-context learning setting. Notably, our work identifies an interesting phenomenon: even when guided by a teacher-provided reasoning path, a model may instead learn a different but more efficient reasoning strategy.