Analysis

Tim G. J. Rudner

AI/ML Fellow (non-resident) Print Bio

Tim G. J. Rudner is an AI/ML Fellow at Georgetown’s Center for Security and Emerging Technology (CSET). He is currently completing his Ph.D. in Computer Science at the University of Oxford, where he conducts research on probabilistic machine learning, reinforcement learning and AI safety. Previously, Tim worked at Amazon Research, the European Central Bank and the European Space Agency’s Frontier Development Lab. He holds an M.Sc. in Statistics from the University of Oxford and a B.S. in Applied Mathematics and in Economics from Yale University. Tim is also a Fellow of the German Academic Scholarship Foundation and a Rhodes Scholar.

This paper is the third installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. The first paper in the series, “Key Concepts in AI Safety: An Overview,” described three categories of AI safety issues: problems of robustness, assurance, and specification. This paper introduces interpretability as a means to enable assurance in modern machine learning systems.

This paper is the second installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. The first paper in the series, “Key Concepts in AI Safety: An Overview,” described three categories of AI safety issues: problems of robustness, assurance, and specification. This paper introduces adversarial examples, a major challenge to robustness in modern machine learning systems.

This paper is the first installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. In it, the authors introduce three categories of AI safety issues: problems of robustness, assurance, and specification. Other papers in this series elaborate on these and further key concepts.