I am a Senior Researcher at Microsoft Research, New York.
I received my PhD in computer science from Cornell University (2019) and my
bachelors in computer science from Indian Institute of Technology Kanpur (2013).
My main interest is in developing efficient machine learning algorithms with
applications to real-world problems. The word efficient here includes provable,
sample and computationally efficient, interpretable, scalable, and ethical.
My empirical focus is on problems in natural language understanding and allied fields.
I am currently active in reinforcement learning, interactive learning,
representation learning, and language and vision problems. I also have interest
in computational social science, and data and society.
News: We have a new paper on interactive learning at ICML 2021 that only uses language feedback (i.e., no reward, actions, etc.).
Our algorithm uses language descriptions of trajectories and solves a sequence of supervised learning problems. We evaluate on
grounded language understanding tasks and provide convergence guarantees.
News: Qinghua Liu and I recently gave a talk at RL theory seminar. You can find it
Provable RL: We have three new provable reinforcement learning algorithms for rich-observation problems.
These algorithms are computationally efficient, and their sample complexity is either independent of the size of observation space, or
only weakly depends on it.
We are hiring!
FactoRL at ICLR 2021 that provably solves a subset of rich-observation problems with a latent exponentially large state space.
FactoRL learns the latent factorized model, and a state decoding function.
[Paper] [Code coming soon]
RichID at NeurIPS 2020 that provably solves continuous control problem
with latent LQR dynamics and rich-observations. RichID learns the latent LQR model and a near-optimal policy.
Homer at ICML 2020 that solves rich-observation problems with a discrete latent state space. Homer provably explores,
recovers the latent dynamics, and optimizes any given reward function.
- For post-doc and full-time positions in reinforcement learning