Revert and regression policy

How to handle regressions

File a detailed issue on the Github tracker.
If the regression introduced causes a functionality to be broken or unusable on production, send an email to the release coordinator (see assignments) of the previous release so that they can do a hotfix to fix the bug on the production server. Also, cc the original PR author (or notify them in some other way) so that they’re aware.
If the regression causes problems for developers (e.g. failing tests), send an email to oppia-dev to let developers know. The email should include a link to the issue you opened so that developers can follow along as we fix the regression.
Investigate to find the cause of the regression. Ideally, we want to find the PR that introduced the regression, but failing that, we want to understand what problem in the code is causing the regression.
- If you identify the PR that caused the regression, mention it in the issue thread and @-mention the author of the PR so that they are aware of the problem. The problematic PR may be reverted if it matches the criteria mentioned in the ”When to revert a PR?” section.
- If you can’t find the PR / commit that caused the regression, make sure to mention this in the issue thread. Instead of reverting the PR, you should open a PR to fix the problem.
Consider what mitigation steps we could take to prevent similar regressions in the future. Here are some potential mitigations to consider:
- New tests, for example if a breakage occurred in a bit of code that wasn’t thoroughly tested.
- Adding lint checks, for example if the regression was caused by a coding pattern we should avoid.
- Changes to how we review PRs, for example if the regression was a UI bug that would have been caught had the PR description included screenshots.
  - Consider how much your mitigation will slow down the review process. Some mitigations, like adding screenshots, might not be a problem. Others, like requiring many more reviewers, might be more trouble than they’re worth.
- Emailing oppia-dev, for example to warn developers to avoid the coding pattern that caused the regression.
  - While we want to notify the developer community of how they can avoid regressions, we don’t want to spam them with so many emails that they stop reading oppia-dev. Emailing oppia-dev is often the easiest mitigation, but it is not always the most effective.
You may conclude that no mitigation is worth implementing, for example if the mitigations would introduce too much friction into the PR approval process. However, you should document your thought process in the issue in case someone has an idea for a mitigation you didn’t consider.
Unless you can implement the mitigations immediately, you should open an issue for them. You should assign the issue to whoever is going to be in charge of implementation (often the dev workflow team lead).

How to identify the bad commit

To systematically locate a bad commit, you can use the git-bisect tool. It internally uses a binary search algorithm to find the offending commit from the project history. To use this tool:

Git checkout develop branch where the bug is observed (git checkout develop).
Run git bisect start to start a bisect session.
Run git bisect bad to verify that the current commit contains the bug.
Go to this page to find an older (“good”) commit where the bug did not exist. Copy the commit id of the commit, and do git checkout on it (git checkout ). Start the dev server (by running the start script) and try to figure out whether the error still exists. If so, you will need to find an even older commit than the current one. Kill the server, and repeat this step until you find a “good” commit.
Once you’ve found a “good” commit, run “git bisect good”. The tool will automatically perform a binary search and checkout a commit between the “good” commit and “bad” commit. Start the dev server (by running the start script) and try to reproduce the error. At each step:
- If the bug is reproducible, run “git bisect good”.
- If not, run “git bisect bad”.
Repeat this step until the tool reports the offending commit.

When to revert a PR

If the PR exhibits either of the following two cases:

Introduces a flaky (there is any reported failure of the test on the develop branch) or consistently failing test.
Introduces a regression in a critical functionality.

Then, follow these steps:

Automatically revert the PR if it has no merge conflicts or merge conflicts are trivial to resolve and inform the author that you have reverted their PR. You can find instructions for reverting PRs on Github here.
If the revert PR has non-trivial merge conflicts, do not revert. Create an issue and mark it as a blocking bug for the on-going release and inform the author of the PR about the same by @-mentioning on the PR thread.
If the PR introduced a flaky or failing test and contains other changes too, create a PR to delete or disable the test. Also, create an issue to add back the deleted/disabled test and inform the author of the PR about the same by @-mentioning on the PR thread.
If the functionality is broken on the production server and needs to be fixed, inform the release coordinator for the previous release so that a hotfix can be done.
Add “PR: require post-merge sync to HEAD” label to the revert or fix PR if developers need to update their branches once the fix/revert PR has been merged into develop. Oppiabot will comment on open PRs to merge from develop once the reverted PR has been merged.

How to revert a PR?

In the PR page, you will find a “revert” button at the point where the PR was merged.

Clicking on this button will create a new PR with the reverted changes. Feel free to add a detailed comment in the PR description explaining why the PR needs to be reverted and tag the author of the original PR.

If you see this:

You will need to make the code changes on your local machine and create a PR by hand.

What do I do if I caused a regression?

If the PR that introduced the regression hasn’t been reverted, please follow the steps above to revert. Once your PR is reverted, follow these steps:

Ensure you can repro the issue fully. If you can’t, talk to the original author to better understand how to repro the issue.
Make sure you fully understand why your change caused the observed regression(s)–without this step, no progress can actually be made. Remember: fix issues not symptoms. We don’t just want the observed problem to go away, we want the underlying root cause to be understood & addressed.
Once you understand the root cause, think through potential edge cases that may break other aspects of the app. It’s quite likely that the observed regression was just one of several broken behaviors caused by the underlying issue.
Turn the behaviors from (3) into tests. Verify that these tests fail without a fix, and pass with a fix to the original regression.
Submit a new PR with your original changes + the fixes and tests from (4). Make sure to detail the investigation in this new PR, and reference the original issue that tracked the regression.

Finally, be aware that everyone causes regressions. While we have a lot of checks in place to try and avoid regressions from being checked in, they will still happen. Focus on learning from the situation rather focusing on the fact that a regression happened.