HW 3: Active Learning

Table of contents

Introduction
1. Collaboration Policy
2. Code Re-use Policy
Part 1: Choosing an Active Learning (AL) algorithm
Part 2: Implementation and evaluation
Part 3: Report
What to submit

Introduction

The goal of this project is for you to gain experience with:

Implementing Active Learning techniques
Comparing learning performance in active vs passive settings
Evaluating the human factors implications of Active Learning techniques

You’ll submit the following deliverables via Canvas:

Your Google Colab file containing your algorithm and evaluation code, and annotated with explanations of how it works
A series of videos showing examples of the robot’s active learning queries, and how they differ based on (i) pre-training and (ii) variance in training/testing data.
A report containing both (i) your evaluation results and (ii) your answers to the reflection questions

Collaboration Policy

You are welcome (and encouraged!) to collaborate with others. However, you must submit your own code and fully understand how it works. In your report, you must state your collaborators and acknowledge how they assisted.

Code Re-use Policy

You are welcome to directly use existing packages or libraries as “helper code” within your project. You are also welcome to reference papers and pseudocode, and adapt online implementation examples of the algorithms you are using. However, you must write your own algorithm code, fully understand how it works, and acknowledge any resources you have referenced or adapted.

As a result of this policy, it is very important that you take every opportunity to demonstrate your understanding of whatever algorithm you implement. This should come across clearly in your code comments/annotations and in your report.

Part 1: Choosing an Active Learning (AL) algorithm

Your task is to adapt your HW 2 learning algorithm to incorporate active learning. You may adapt any active learning method we’ve covered in class.

We’ll be re-using some infrastructure from the last assignment. Just like last time, we’ll be using Robosuite/Robomimic/MimicGen and learning the same four tasks as before.
If you prefer, you are welcome to use a different base algorithm than what you used for HW 2. However, you will need to implement a non-active version of that algorithm for your evaluation.
If you’d like to adapt a different active learning algorithm than what we’ve covered in class, check with Prof. Tesca first.

Here are some questions to help guide your thinking as you choose which algorithm to implement. Really, think through these now! You’ll need to write up your answers for your report anyways.
- For the algorithm you implemented on the last assignment:
- How much data did it require?
- What did it learn well from the demonstration data? What did it struggle to learn?
- What data do you believe would have helped it learn more efficiently? * Now consider:
- What kind of feedback would provide the robot with the most helpful information?
- When are the best opportunities for it to request this feedback?
- What human factors might influence how accurate or informative this feedback is?
Now it’s time to choose an active learning (AL) algorithm. Once you’ve made your decision, write up your answers to these questions.
- Note: for this assignment, there does need to be a human involved in providing feedback (rather than solely learning from an oracle).
- In your own words, how does the algorithm work?
- Why did you choose this algorithm?
- How will the robot decide (1) when to query the user and (2) what to query about?
- How do you expect this algorithm to improve the learning performance?
- What are the human factors implications (positive and negative) of using this algorithm?
- How many interactions do you think you’ll need in order to reproduce similar results as your original, non-AL implementation?

Part 2: Implementation and evaluation

Time to implement your algorithm!
- As you adapt your HW 2 code, make sure you can compare the results from the original, passive learning algorithm with the revised, active version.
- Make sure your algorithm has the ability to “bootstrap” its learning from demonstration data. In other words, you should be able to change how much training data your algorithm has access to before it starts actively querying you for feedback.
- If your algorithm queries for demonstrations, you may find this RoboSuite interface useful.
Produce graphs showing how the algorithm performance changed based on (i) number of queries, (ii) amount of bootstrap data, and (iii) the task and dataset being used (D0 vs D1).
Record some videos showing how the robot’s queries change as it learns more about the task.

Part 3: Report

Write up a report that answers all of the questions posed on this page, plus the questions below. Remember: whenever possible, demonstrate your understanding of your chosen algorithm.

If applicable: who were your collaborators? Describe everyone’s role within the collaboration.
Compare the results to those from the passive version of the learning algorithm (i.e., HW 2). What trends do you see?
- How many interactions were actually needed in order to achieve similar results to the non-AL implementation? How does this compare to your hypothesis?
- Were you able to achieve better results than the non-AL implementation? Why or why not?
- Demonstrate thoroughness here. If the AL version did not perform better, convince me that you were thorough in understanding why, and how you took steps to try to improve the performance. If the AL version did perform better, how many interactions does it take for it for the performance to converge?
How does active learning performance change based on the number of interactions? What about the amount of bootstrap data?
- Provide graphs showing the number of interactions versus performance for each task. How do these graphs change depending on the amount of bootstrap data?
- How does performance vary depending on the task and training/testing data distributions?
How did these results compare to your hypotheses? Did anything surprise you? How would you modify the algorithm to improve its performance?

What to submit

On Canvas, upload your:

Annotated code (preferably as a .ipynb)
Report
Videos