Today the bot exposed to us a rare behavior reporting that it didn't have access to the target repo.
This happened as a result of encountering some issue during git push
β the corresponding code block reports a "permission error" unconditionally without attempting to analyze what actually happened: https://github.com/sanitizers/patchback-github-app/blob/2c263f7/patchback/event_handlers.py#L165-L180.
And it doesn't include any details in the report text (originally β for security reason because it's unsanitized and may leak an access token if implemented poorly).
So I want to document the need to improve the traceability by adding the sanitized command output to the reported error.
Also, we may need to implement some error analysis and if it is recoverable, implement retries with https://pypi.org/p/backoff.
Another thing to talk about is --force-with-lease
. TL;DR we may be better off using just plain --force
.
How is this relevant? It's hard to judge without the actual git push output
but I seem to recall facing a weird race condition with --force-with-lease
back when we were working on https://github.com/ansible-community/collection_migration.
I've attempted to re-analyze why it might've been happening and found a way to simulate it. Basically, it's achieved by
$ git commit --allow-empty -m 'Empty commit'
[branch-name fa821df7] Empty commit
$ for _ in {1..5}; do git push origin HEAD --force-with-lease &; done
[2] 2107671
[3] 2107672
[4] 2107675
[5] 2107677
[6] 2107684
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 850 bytes | 850.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 850 bytes | 850.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 850 bytes | 850.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 850 bytes | 425.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 850 bytes | 425.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
To ssh://github.com/user/repo.git
! [remote rejected] branch-name -> branch-name (cannot lock ref 'refs/heads/branch-name': is at fa821df7897ae866da9e72b28c404cdfcc92760e but expected b5f3ae57debb9ef018aedc92dd0a3704a1d06fa7)
error: failed to push some refs to 'ssh://github.com/user/repo.git'
[6] + 2107684 exit 1 git push origin HEAD --force-with-lease
To ssh://github.com/user/repo.git
! [remote rejected] branch-name -> branch-name (cannot lock ref 'refs/heads/branch-name': is at fa821df7897ae866da9e72b28c404cdfcc92760e but expected b5f3ae57debb9ef018aedc92dd0a3704a1d06fa7)
error: failed to push some refs to 'ssh://github.com/user/repo.git'
[2] 2107671 exit 1 git push origin HEAD --force-with-lease
To ssh://github.com/user/repo.git
! [remote rejected] branch-name -> branch-name (cannot lock ref 'refs/heads/branch-name': is at fa821df7897ae866da9e72b28c404cdfcc92760e but expected b5f3ae57debb9ef018aedc92dd0a3704a1d06fa7)
error: failed to push some refs to 'ssh://github.com/user/repo.git'
[3] 2107672 exit 1 git push origin HEAD --force-with-lease
To ssh://github.com/user/repo.git
! [remote rejected] branch-name -> branch-name (cannot lock ref 'refs/heads/branch-name': is at fa821df7897ae866da9e72b28c404cdfcc92760e but expected b5f3ae57debb9ef018aedc92dd0a3704a1d06fa7)
error: failed to push some refs to 'ssh://github.com/user/repo.git'
[4] - 2107675 exit 1 git push origin HEAD --force-with-lease
To ssh://github.com/user/repo.git
b5f3ae57..fa821df7 branch-name -> branch-name
[5] + 2107677 done git push origin HEAD --force-with-lease
So the difference between --force
and --force-with-lease
that is of interest to us is the fact that the latter attempts to lock the remote ref to make sure that the remote branch replaced is the one the local repo knows about and the push won't unintentionally rewrite the branch contents somebody else pushed to since the last git fetch origin
.
I think that maybe it's what's happened in the morning β probably a race condition may have been triggered by adding and removing+readding the same label on the same PR at close moments in time. And so one of the invocations of the event handler pushed (created) a backport branch, reported the success and the second one attempted to do the same but failed because of the inability to acquire the branch lock and then reported the failed status on the same check-run.
Refs:
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too