Senior Researcher,

Microsoft Research, New York

I am a machine learning researcher specializing in the field of natural language understanding, interactive learning (e.g., reinforcement learning), and representation learning. My main research agenda is to develop generalizable agents that can interact with the world using actions and natural language, and solve a range of tasks using reward, natural language feedback, or other types of feedback.

**News:** Our LASER paper got accepted at
ICLR 2024, trended on Github python, and
got featured in a
Verge article!

My research agenda has the following main threads.

*Interactive Learning (Learning Algorithm):*I am interested in developing both*practical*and*efficient*algorithms for training agents. In particular, my recent focus has been on developing algorithms for fine-tuning agents such as LLM (arXiv 2023). I am interested in developing algorithms that are provably-efficient, or use insights from theory to solve real-world challenges. My representative work on this agenda includes a list of recent RL algorithms for problems with complex observations that are provably sample-efficient and computationally-efficient: the Homer algorithm (ICML 2020), RichID algorithm (NeurIPS 2020), FactoRL Algorithm (ICLR 2021), and PPE algorithm (ICLR 2022).*Language Feedback (Learning Signal):*Natural language is an expressive medium for training and controlling agents that can be used by most humans. I am interested in developing agents that can understand and execute instructions in natural language, and also be trained using these mediums. Representative work on this agenda include the EMNLP 2017, EMNLP 2018, CoRL 2018, and CVPR 2019 papers on developing agents that can follow natural language instruction, and our recent Learning from Language Feedback (LLF) Benchmark (arXiv 2023) and the ICML 2021 paper that trains these agents using*just*natural language.*Representation Learning (Model):*An agent needs to learn the right representation of the world to make decisions. E.g., a multi-modal LLM may embed image in a certain way to generate an action or caption. This choice of embedding/representation is very important. I am interested in developing the theory and practice of representation learning methods for training these embeddings, specially, using self-supervised learning. Representative work includes our recent paper at ICLR 2024 (Spotlight) for training representations using video data, and AISTATS 2022 and ICML 2022 on understanding the behavior of contrastive learning. I am also interested in understanding representations, and a representative work on this is our recent paper on the LASER method at ICLR 2024 for probing and improving LLM reasoning.

Beyond my main agenda, I also have interest in a diverse range of topics including language and vision problems, semantic parsing, statistical learning theory, and computational social science.

**Bio:** I am a Senior Researcher at Microsoft Research, New York.
I received my PhD in computer science from Cornell University (2019) and my
bachelors in computer science from Indian Institute of Technology Kanpur (2013).

**Quick Links:**
MSR Reinforcement Learning,
Intrepid Code Base,
CIFF Code Base,
Math for AI,
My Blog,
RL Formulas

Policy Improvement using Language Feedback Models

[arXiv 2024]

LLF-Bench: Benchmark for Interactive Learning from Language Feedback

[arXiv 2023] [Code] [Website]

Learning to Generate Better Than Your LLM

[arXiv 2023] [Preliminary Version accepted at NeurIPS 2023 Workshop]

Towards Principled Representation Learning from Videos for Reinforcement Learning

In Proceedings of the 12^{th} International Conference on Learning Representations (ICLR), 2024.

[ICLR 2024] [ICLR Spotlight] [Code]

The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

[This paper presents a surprising discovery that doing low-rank approximation of *selective* weight matrices of an LLM can boost the LLM's QA performance, at times by 20-30% point.]

In Proceedings of the 12^{th} International Conference on Learning Representations (ICLR), 2024.

[arXiv 2023] [ICLR 2024] [Code] [Website]

Survival Instinct in Offline Reinforcement Learning

In Conference on Neural Information Processing Systems (NeurIPS), 2023

[arXiv 2023] [NeurIPS Spotlight] [Preliminary Version accepted at ICML Workshop]

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

In Proceedings of the International Conference of Machine Learning (ICML), 2023.

[ICML 2023 Version] [Preliminary version accepted at NeurIPS 2022 workshop]

Guaranteed Discovery of Controllable Latent States with Multi-Step Inverse Models

In Proceedings of the Transactions on Machine Learning Research (TMLR), 2023.

[TMLR 2023 Version] [arXiv 2022] [Website]

Provable Safe Reinforcement Learning with Binary Feedback

In Proceedings of the 26^{th} International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.

[AISTAS 2023 Version] [arXiv 2022] [Code]

Provably Sample-Efficient RL with Side Information about Latent Dynamics

In Proceedings of the 36^{th} Conference on Neural Information Processing Systems (NeurIPS), 2022.

[NeurIPS 2022 version] [arXiv 2022]

Sample-Efficient RL in the Presence of Exogenous Information

In Proceedings of the 35^{th} Conference on Learning Theory (COLT), 2022.

[COLT Version] [arXiv 2022]

Understanding Contrastive Learning Requires Incorporating Inductive Biases

In Proceedings of the 39^{th} International Conference on Machine Learning (ICML), 2022.

[ICML Version] [arXiv 2022]

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

In Proceedings of the 10^{th} International Conference on Learning Representations (ICLR), 2022.

[ICLR 2022] [arXiv 2021] [Code] [Oral Presentation]

Investigating the Role of Negatives in Contrastive Representation Learning

The 25^{th} International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.

[arXiv 2021] [Code to come soon]

Interactive Learning from Activity Description

In Proceedings of the 38^{th} International Conference on Machine Learning (ICML), 2021.

[Paper] [Version at EML workshop, ICLR 2021] [Code]

Provable Rich Observation Reinforcement Learning with Combinatorial Latent States

In Proceedings of the 9^{th} International Conference on Learning Representations (ICLR), 2021.

[Paper] [Code] [RL Theory Seminar]

Learning the Linear Quadratic Regulator from Nonlinear Observations

In Proceedings of the 34^{th} Conference on Neural Information Processing Systems (NeuRIPS), 2020.

[arXiv Version] [NeuRIPS Version] [Code]

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

In Proceedings of the 37^{th} International Conference on Machine Learning (ICML), 2020.

[arXiv Version] [ICML Version] [Code]

Early Fusion for Goal Directed Robotic Vision

In International Conference on Intelligent Robots and Systems (IROS), 2019.

[Paper] [Robocup Best paper nomination]

Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[Paper] [Dataset and SDR Code] [Navigation Code]

Mapping Navigation Instructions to Continuous Control Actions with Position Visitation Prediction

In Proceedings of the Conference on Robot Learning (CoRL), 2018.

[Paper] [Code] [Demo Video]

Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction

In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.

[Paper] [Code, Data and Simulators]

Mapping Instructions and Visual Observations to Actions with Reinforcement Learning

In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.

[Paper] [Code] [Arxiv Preprint]

Neural Shift-Reduce CCG Semantic Parsing

In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.

[Paper] [Supplementary] [Code]

Tell Me Dave: Context-Sensitive Grounding of Natural Language to Manipulation Instructions

In The International Journal of Robotics Research (IJRR), 2015.

[Paper]

(Note the domain tellmedave DOT com no longer belongs to my coauthors and I.

Also, the link tellmedave DOT cs DOT cornell DOT edu is no longer active)

Environment-driven lexicon induction for high-level instructions

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2015.

[Paper]
[Supplementary]
[Code]
[Data]
[Simulator]
[Bibtex]

Tell Me Dave: Context-Sensitive Grounding of Natural Language to Manipulation Instructions

In Proceedings of the Robotics: Science and systems (RSS), 2015.

[Paper]

(Note the domain tellmedave DOT com no longer belongs to my coauthors or I.

Also, the link tellmedave DOT cs DOT cornell DOT edu is no longer active)

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

[arXiv 2022] (Accepted at NeurIPS 2022 "3rd Offline RL Workshop: Offline RL as a "Launchpad" Workshop)

Have you tried Neural Topic Models? Comparative Analysis of Neural and
Non-Neural Topic Models with Application to COVID-19 Twitter Data

Data Science for Social Good (DSSG) workshop at Conference on Knowledge Discovery and Data Mining (KDD) 2021

[arXiv 2021] [Code]

Towards a Simple Approach to Multi-step Model-based Reinforcement Learning

Deep Reinforcement Learning Workshop at the Conference on Neural Information Processing Systems (NeurIPS), 2018.

[Paper]

The Third Workshop on Representation Learning for NLP (Rep4NLP)

Workshop at the Annual Meeting of the Association for Computational Linguistics (ACL), 2018.

[Workshop Proceedings]

Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning

Workshop on Prediction and Generative Modeling in Reinforcement Learning (PGMRL) at the International Conference on Machine Learning (ICML), 2018.

[ArXiv Preprint]

Combating the Compounding-Error Problem with a Multi-step Model

arXiv, 2019.

[Paper]

Robo Brain: Large-Scale Knowledge Engine for Robots

[Paper]

Academia and Compute-Intensive AI Research [Post]

PAC with Hoeffding-Bernstein [Post]

Growing Bifurcation of AI Scholarship [Post]

Are Synthetic Datasets in AI Useful? [Post]

Are we doing NLP the right way? [Post]

Writing and Proof Reading Research Code [Post]

Mathematical Analysis of Policy Gradient Methods [Post]

Tutorial on Markov Decision Process Theory and Reinforcement Learning. [Slides Part 1] [Slides Part 2] [Post]