(sorry for the wall of text, I tried to give as much info as possible, feel free to ask me clarification if something is unclear)
Context
It seems like the parallel execution feature of doit
doesn't work on macOS (I noticed it on a coworker's laptop, being on Linux I never noticed any issue on my side).
I investigated on my side and came up with a satisfying workaround/fix (force the use of threads on macOS).
Long story short, this is related to the fact that by default:
doit
uses multiprocessing
for parallel executionmultiprocessing
uses fork
on Unix-like systemFor more information you can take a look at the issue I tackled on our side (scality/metalk8s#1354).
Here is a minimal dodo.py
that reproduce the problem if you exectute on macOS (10.13 or higher):
#!/usr/bin/env python3
# coding: utf-8
import random
import time
import urllib.request
import doit
def task_foo():
def action():
urllib.request.urlopen('http://example.com/')
time.sleep(random.randint(1, 5))
for i in range(10):
yield {
'name': str(i),
'actions': [action]
}
To be executed with doit -n 4
.
Note that the crash happen only when you're calling into Cocoa libraries (that why I added an urllib
call, simply sleeping doesn't exhibit the behavior).
What can be done
I know it's not really an doit
issue per se, and after reading this very informative thread on the Python bugtracker one can see that the issue will probably disappear with Python 3.8 (where multiprocessing
will be using spawn
instead of fork
by default).
But I believe that we may want to do something on doit
side because Python 3.8 is not out yet (and people will keep using older Python for a while), so it may be interesting to have a fix/workaround in doit
itself.
I think that the issue can be solved by calling either multiprocessing.set_start_method('forkserver')
or multiprocessing.set_start_method('spawn')
(I would go for spawn
since official Python is moving that way) before creating any subprocesses in doit
.
If it doesn't work/it's too much work/outside the scope of doit
, it's possible to instead mention the issue in the documentation about parallel execution (so that people are not surprised/know how to deal with).
Or maybe simply disabling the use of multiprocessing
on macOS (and using only threads).
In any case, if you want to keep the support for multiprocessing
on macOS, you may want to take a look now because the current code will not work in 3.8 (which will be using spawn
by default): adding a call to multiprocessing.set_start_method('spawn')
in the provided dodo.py
triggers a pickle error (which is not totally unexpected as spawn
has more constraints thant fork
, see here).
Environment
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too