I am passionate about building general intelligence. I am specifically interested in Interactive Intelligence, Sequential Decision Making, Reasoning, Planning, Memory and Allignment of Foundation models for Sequential tasks.
In this work, we study and demonstrate the importance of various design decisions for Recurrent PPO in partially observable
domains with long episodes and in continuing tasks. We also show that simple
strategies like updating hidden states (before collecting new experience) and recomputing hidden states (before each epoch in minibatch gradient descent) can
prevent staleness in updates and significantly improve performance. Finally, we
provide practical insights and recommendations for implementing Recurrent PPO.
We present IlliniMet, a system to automatically detect metaphorical words. Our
model combines the strengths of the contextualized representation by the widely used
RoBERTa model and the rich linguistic information from external resources such as
WordNet.
Planning with Model-Free and Model-Based Reinforcement Learning
Kshitij Gupta,
Jeffrey Lai,
Keshav
CS 498, IR
Model-based RL has a strong advantage of being sample efficient. Once the model and the
cost function are known, we can plan the optimal controls without further sampling. We explore various model based
planning methods and experiment on various Gym Environments.
This paper explores the use of
reversible endothermic gaseous chemical reactions to create a truly portable, reusable instant cooler that is human
powered.