I am a machine learning researcher specializing in the field of natural language understanding, interactive learning (e.g., reinforcement learning), and representation learning. My main research agenda is to develop generalizable agents that can interact with the world using actions and natural language, and solve a range of tasks using reward, natural language feedback, or other types of feedback.
A recurring theme in my research is developing interactive learning algorithms or representation learning methods using feedback or data that naturally occurs in real-world such as video data, user edits, language feedback, etc. My current areas of research focus are below.
(Algorithms) Learning Algorithms for LLMs and Foundation Model: I am interested in developing both practical and efficient algorithms for training agents. In particular, my recent focus has been on developing algorithms for fine-tuning LLMs using better imitation learning algorithms (arXiv 2023) such as our DR-PO algorithm (arXiv 2024). My other past work includes developing RL methods that are provably sample-efficient and computationally-efficient: the Homer algorithm (ICML 2020), RichID algorithm (NeurIPS 2020), FactoRL Algorithm (ICLR 2021), and PPE algorithm (ICLR 2022 Oral), and AC State (TMLR 2023).
(Feedback) Developing Agents that Learn from Language Feedback: Once an agent foundation model has been deployed, it may need post-training or adaptation to given a setup. I have developed approaches that can finetune models using language feedback which is easy and natural for non-expert humans to provide. E.g.,
Our recent paper (arXiv 2024) studies aligning LLMs using user edit feedback that is naturally generated in writing assistant applications.
It is more expensive to label a trajectory for a given language instruction than to label instruction for a given trajectory. Our ICML 2024 and ICML 2021 papers train agents using these hindsight language instructions.
Humans often use free-form language feedback to guide each other. Our recent LLF-Bench paper (arXiv 2023) introduces a benchmark for evaluating learning in LLM agents using language feedback.
(Models) Foundation Models for Decision Making: I am focusing on developing foundation models that can take actions for different agents, in different domains, and for different tasks. E.g., an agent that can interact in an OS or a bot that can play a game against a human. I am particularly interested in using naturally available data such as videos for building these models. My relevant recent work is our ICLR 2024 (Spotlight) paper on learning right representations from videos. Other relevant work is my series of work published at EMNLP 2017, EMNLP 2018, CoRL 2018, and CVPR 2019, on developing instruction following agents that can solve a variety of tasks specified in natural language, in embodied setting.
Beyond my main agenda, I also have interest in a diverse range of topics including language and vision problems, statistical learning theory, and computational social science.
Bio: I am a Staff Research Scientist at the Mosaic Research team at Databricks. I received my PhD in computer science from Cornell University (2019) and my bachelors in computer science from Indian Institute of Technology Kanpur (2013). Previously, I was a Senior Researcher at Microsoft Research (2019- Aug 2014). I most closely associate with the ML (ICLR, ICML, NeurIPS) and NLP (ACL, EMNL, CoNLL) research communities. I am Senior Area Chair for ACL 2025 and was Area Chair for ICLR 2025 and CoNLL (2021, 2022).
Quick Links: Databricks Mosaic Research, Intrepid Code Base, CIFF Code Base, Math for AI, My Blog, RL Formulas
Dataset Reset Policy Optimization for RLHF
[arXiv 2024] [Code]
LLF-Bench: Benchmark for Interactive Learning from Language Feedback
[arXiv 2023] [Code] [Website]
Learning to Generate Better Than Your LLM
[arXiv 2023] [Preliminary Version accepted at NeurIPS 2023 Workshop]
Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning
In Conference on Neural Information Processing Systems (NeurIPS), 2024.
[arXiv 2024] [Code To Come] [NeurIPS Spotlight]
Aligning LLM Agents by Learning Latent Preference from User Edits
In Conference on Neural Information Processing Systems (NeurIPS), 2024.
[arXiv 2024] [Code]
Policy Improvement using Language Feedback Models
In Conference on Neural Information Processing Systems (NeurIPS), 2024.
[arXiv 2024]
Provable Interactive Learning with Hindsight Instruction Feedback
In Proceedings of the International Conference of Machine Learning (ICML), 2024.
[arXiv Version] [Code]
Towards Principled Representation Learning from Videos for Reinforcement Learning
In Proceedings of the 12th International Conference on Learning Representations (ICLR), 2024.
[ICLR 2024] [ICLR Spotlight] [Code]
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
[This paper presents a surprising discovery that doing low-rank approximation of selective weight matrices of an LLM can boost the LLM's QA performance, at times by 20-30% point.]
In Proceedings of the 12th International Conference on Learning Representations (ICLR), 2024.
[arXiv 2023] [ICLR 2024] [Code] [Website]
Survival Instinct in Offline Reinforcement Learning
In Conference on Neural Information Processing Systems (NeurIPS), 2023
[arXiv 2023] [NeurIPS Spotlight] [Preliminary Version accepted at ICML Workshop]
Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information
In Proceedings of the International Conference of Machine Learning (ICML), 2023.
[ICML 2023 Version] [Preliminary version accepted at NeurIPS 2022 workshop]
Guaranteed Discovery of Controllable Latent States with Multi-Step Inverse Models
In Proceedings of the Transactions on Machine Learning Research (TMLR), 2023.
[TMLR 2023 Version] [arXiv 2022] [Website]
Provable Safe Reinforcement Learning with Binary Feedback
In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.
[AISTAS 2023 Version] [arXiv 2022] [Code]
Provably Sample-Efficient RL with Side Information about Latent Dynamics
In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS), 2022.
[NeurIPS 2022 version] [arXiv 2022]
Sample-Efficient RL in the Presence of Exogenous Information
In Proceedings of the 35th Conference on Learning Theory (COLT), 2022.
[COLT Version] [arXiv 2022]
Understanding Contrastive Learning Requires Incorporating Inductive Biases
In Proceedings of the 39th International Conference on Machine Learning (ICML), 2022.
[ICML Version] [arXiv 2022]
Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
In Proceedings of the 10th International Conference on Learning Representations (ICLR), 2022.
[ICLR 2022] [arXiv 2021] [Code] [Oral Presentation]
Investigating the Role of Negatives in Contrastive Representation Learning
The 25th International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
[arXiv 2021] [Code to come soon]
Interactive Learning from Activity Description
In Proceedings of the 38th International Conference on Machine Learning (ICML), 2021.
[Paper] [Version at EML workshop, ICLR 2021] [Code]
Provable Rich Observation Reinforcement Learning with Combinatorial Latent States
In Proceedings of the 9th International Conference on Learning Representations (ICLR), 2021.
[Paper] [Code] [RL Theory Seminar]
Learning the Linear Quadratic Regulator from Nonlinear Observations
In Proceedings of the 34th Conference on Neural Information Processing Systems (NeuRIPS), 2020.
[arXiv Version] [NeuRIPS Version] [Code]
Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning
In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020.
[arXiv Version] [ICML Version] [Code]
Early Fusion for Goal Directed Robotic Vision
In International Conference on Intelligent Robots and Systems (IROS), 2019.
[Paper] [Robocup Best paper nomination]
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[Paper] [Dataset and SDR Code] [Navigation Code]
Mapping Navigation Instructions to Continuous Control Actions with Position Visitation Prediction
In Proceedings of the Conference on Robot Learning (CoRL), 2018.
[Paper] [Code] [Demo Video]
Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction
In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.
[Paper] [Code, Data and Simulators]
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
[Paper] [Code] [Arxiv Preprint]
Neural Shift-Reduce CCG Semantic Parsing
In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
[Paper] [Supplementary] [Code]
Tell Me Dave: Context-Sensitive Grounding of Natural Language to Manipulation Instructions
In The International Journal of Robotics Research (IJRR), 2015.
[Paper] [Special Issue]
(Note the domain tellmedave DOT com no longer belongs to my coauthors and I.
Also, the link tellmedave DOT cs DOT cornell DOT edu is no longer active)
Environment-driven lexicon induction for high-level instructions
In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2015.
[Paper]
[Supplementary]
[Code]
[Data]
[Simulator]
[Bibtex]
Tell Me Dave: Context-Sensitive Grounding of Natural Language to Manipulation Instructions
In Proceedings of the Robotics: Science and systems (RSS), 2015.
[Paper]
(Note the domain tellmedave DOT com no longer belongs to my coauthors or I.
Also, the link tellmedave DOT cs DOT cornell DOT edu is no longer active)
Towards Data-Driven Offline Simulations for Online Reinforcement Learning
[arXiv 2022] (Accepted at NeurIPS 2022 "3rd Offline RL Workshop: Offline RL as a "Launchpad" Workshop)
Have you tried Neural Topic Models? Comparative Analysis of Neural and
Non-Neural Topic Models with Application to COVID-19 Twitter Data
Data Science for Social Good (DSSG) workshop at Conference on Knowledge Discovery and Data Mining (KDD) 2021
[arXiv 2021] [Code]
Towards a Simple Approach to Multi-step Model-based Reinforcement Learning
Deep Reinforcement Learning Workshop at the Conference on Neural Information Processing Systems (NeurIPS), 2018.
[Paper]
The Third Workshop on Representation Learning for NLP (Rep4NLP)
Workshop at the Annual Meeting of the Association for Computational Linguistics (ACL), 2018.
[Workshop Proceedings]
Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning
Workshop on Prediction and Generative Modeling in Reinforcement Learning (PGMRL) at the International Conference on Machine Learning (ICML), 2018.
[ArXiv Preprint]
Combating the Compounding-Error Problem with a Multi-step Model
arXiv, 2019.
[Paper]
Robo Brain: Large-Scale Knowledge Engine for Robots
[Paper]
Selected articles in popular media about my research.
Microsoft LASERs away LLM inaccuracies
by Emilia David, in Verge, Jan 31, 2024.
Two-step training helps robots interpret human language
by Melanie Lefkowitz, in Cornell Chronicle, November 12, 2018.
Tell Me Dave Lets You Train A Robot To Respond To Complex Commands
by John Biggs, in TechCrunch, June 23, 2014.
New robot learns from plain speech, not computer code
by Julia Rosen, in Los Angeles Times, June 26, 2014.
Teaching old robots new tricks: Machines swap knowledge about how to complete a task despite being hundreds of miles apart
by Victoria Woollaston, in DailyMail, Oct 30, 2015.
Robots Can Now Teach Each Other New Tricks
by Will Knight, in MIT Technology Review, Oct 27, 2015.
This robot can make you ice cream
Video from Jason Aldag , in The Washington Post, June 24, 2014.
Robots Are Smart – But Can They Understand Us?
by Randy Rieland, in Smithsonian Magazine, July 8, 2014.
Robot Responds to Natural Language Instructions, Brings You Fancy Ice Cream
by Evan Ackerman, in IEEE Spectrum, June 26, 2014.
Robots learn from (even bad) human language
by Bill Steele, in Cornell Chronicle, June 24, 2014.
Academia and Compute-Intensive AI Research [Post]
PAC with Hoeffding-Bernstein [Post]
Growing Bifurcation of AI Scholarship [Post]
Are Synthetic Datasets in AI Useful? [Post]
Are we doing NLP the right way? [Post]
Writing and Proof Reading Research Code [Post]
Mathematical Analysis of Policy Gradient Methods [Post]
Tutorial on Markov Decision Process Theory and Reinforcement Learning. [Slides Part 1] [Slides Part 2] [Post]