HW 2: Bottom-Up Learning
Table of contents
Introduction
The goal of this project is for you to gain experience with:
- Loading, visualizing, modifying, and writing demonstration and reward data
- Modelling task structure, goals, and/or dynamics from demonstrations
- Understanding the effect of model parameters on robot learning
- Evaluating the generalizability of learned models
You’ll submit the following deliverables via Canvas:
- Your Google Colab file containing your algorithm and evaluation code, and annotated with explanations of how it works
- A series of videos showing how the robot’s learning is affected by (i) number of training datapoints and (ii) variance in training/testing data.
- A report containing both (i) your evaluation results and (ii) your answers to the reflection questions
Collaboration Policy
You are welcome (and encouraged!) to collaborate with others. However, you must submit your own code and fully understand how it works. In your report, you must state your collaborators and acknowledge how they assisted.
Code Re-use Policy
You are welcome to directly use existing packages or libraries as “helper code” within your project. You are also welcome to reference papers and pseudocode, and adapt online implementation examples of the algorithms you are using. However, you must write your own algorithm code, fully understand how it works, and acknowledge any resources you have referenced or adapted.
As a result of this policy, it is very important that you take every opportunity to demonstrate your understanding of whatever algorithm you implement. This should come across clearly in your code comments/annotations and in your report.
Part 1: Getting started with Robosuite
For this assignment, we’ll be using three key libraries:
- Robosuite provides a MuJoCo-based framework for simulating and benchmarking learning algorithms on robot arms. It doesn’t rely on ROS :)
- Robomimic is built off Robosuite and provides standard datasets and implementations of RL baselines.
- MimicGen extrapolates a small number of human demonstrations to generate lots of additional “demonstrations” on variations of the task environment. We’ll be using these generated demonstrations to train bottom-up task models.
- Download robomimic_get_started.ipynb and go through the tutorial.
- This tutorial has been tested in Google Colab.
- If running locally on an M1 Mac, you may run into issues with importing MuJoCo. Try setting up your conda environment like this:
CONDA_SUBDIR=osx-arm64 conda create -n suite python=3.11 numpy -c conda-forge conda activate suite pip install cmake
- Now let’s take a look at some demonstration data. Download mimicgen_datsets.ipynb and go through the tutorial.
- For this assignment, we’ll be using data from the MimicGen dataset.
- Each dataset is defined by a distribution. In D1 distributions, there is more variance in object locations than in D0. Scroll down on the MimicGen site and take a look at the section titled “Task Reset Distributions” for examples.
- Each dataset contains 1000 demonstrations. You are welcome to use all of these demonstrations if you’d like, but for simplicity, we’ve pre-segmented the dataset into subsets containing 100 demonstrations each. If your algorithm only needs 10 demonstrations, for example, then there’s no point in downloading all 1000 of them.
Part 2: Implementing Algorithms
Your task is to teach a robot four skills: stack, coffee, mug_cleanup, and kitchen.
For each of these skills, you’ll train and test them in two data distributions: a low-variance distribution (D0) and high-variance distribution (D1). Here are some examples of those distributions for the coffee and mug_cleanup tasks:
- Choose one of the following papers to implement for this assignment. Which algorithm do you think will be the most accurate? Data-efficient? Generalizable?
- Akgun & Thomaz. (2016). “Simultaneously learning actions and goals from demonstration.” Autonomous Robots 40.2: 211-227.
- Niekum et al. (2015). “Learning grounded finite-state representations from unstructured demonstrations.” IJRR 34.2: 131-157.
- Konidaris et al. (2012). “Robot learning from demonstration by constructing skill trees.” IJRR 31.3: 360-375
- Chernova & Veloso. “Confidence-based policy learning from demonstration using gaussian mixture models.” AAMAS 2007.
- Maeda et al. “Active incremental learning of robot movement primitives.” CoRL 2017.
- Schaal. (2006). “Dynamic movement primitives-a framework for motor control in humans and humanoid robotics.” Adaptive Motion of Animals and Machines. 261-280.
- Pastor et al. “Learning and generalization of motor skills by learning from demonstration.” ICRA 2009.
- Li, Song, & Ermon. “InfoGAIL: Interpretable imitation learning from visual demonstrations.” NeurIPS 2017.
- Note: for DMPs, you may find this code useful as a starting point.
-
Whichever algorithm you choose, you’ll need to justify: Why do you believe this algorithm is a good choice for our learning problem?
- Identify at least 3 parameters that influence your chosen algorithm’s performance. As part of your evaluation, you’ll need to assess the impact of these parameters and also tune them to optimize performance.
- This is an important opportunity for you to demonstrate how well you understand the algorithm. Don’t just report on the effect of the parameters (i.e., “Changing X caused Y”). Instead, form hypotheses for why these parameters are important, how they influence learning, and how you should select their values.
- Start writing your code! To make full use of Robosuite/Robomimic’s built-in framework, follow these instructions for implementing a custom algorithm. In doing so, you’ll extend the Algo class and define these key functions.
Part 3: Evaluation
Now we’ll compare these algorithms based on their sample efficiency and the trained model’s performance.
-
Download evaluation_pipeline.ipynb as a starting point for your evaluation pipeline. Modify the script to call your algorithm and demonstrate the effect of your 3 selected parameters.
- Record some videos showing examples of the trained model’s output, including:
- Typical performance on the D0 and D1 training data
- Examples of good and poor performance on the D0 and D1 testing data
- Examples of how the 3 parameters influence the robot’s behavior
-
Produce graphs showing how the algorithm performance changed based on (i) number of training samples, (ii) the 3 parameters, and (iii) the dataset being used (D0 vs D1).
- Show off your graphs! Post it to the leaderboard topic on our Ed Discussion.
- You are welcome to post anonymously if you prefer :)
Part 4: Report
Write up a report that answers the following questions. Remember: whenever possible, demonstrate your understanding of your chosen algorithm.
- If applicable: who were your collaborators? Describe everyone’s role within the collaboration.
- What algorithm did you implement? Explain why you believed it would be a good fit for this assignment.
- What 3 parameters did you assess? What do they do, and what were your hypotheses for how they would affect the algorithm’s performance?
- What were your hypotheses for how these algorithms would perform according to (i) which dataset (D0 vs D1) was used and (ii) the amount of data used to train it?
- How did you modify the algorithm for this learning problem?
- Present the result graphs and describe them. What trends do you see? What is it about the algorithm that causes it to perform well or poorly?
- How did these results compare to your hypotheses? Did anything surprise you? How would you modify the algorithm to improve its performance?
What to submit
On Canvas, upload your:
- Annotated code (preferably as a .ipynb)
- Report
- Videos