This bug occurs after certain async cancellations, and causes the HTTP pool to get stuck in a degraded state, unable to issue requests. It looks like a resurfacing of this issue.
httpcore version: 1.0.5
python version: 3.12.4
Code to reproduce:
import random
import asyncio
import httpcore
from collections import defaultdict
# Global dictionary to store status code counts
status_counts = defaultdict(int)
async def make_request(pool):
try:
response = await pool.request(
method="GET",
url=("https://localhost:8000/"),
)
status_counts[response.status] += 1
except BaseException:
status_counts[0] += 1
async def spawn_and_cancel_tasks(pool):
while True:
tasks = [asyncio.create_task(make_request(pool)) for _ in range(100)]
await asyncio.sleep(random.random()) # Allow some time for the tasks to start
for task in tasks:
task.cancel()
await asyncio.gather(*tasks, return_exceptions=True)
async def monitor_stats(pool):
while True:
status_str = ", ".join(f"{code}: {count}" for code, count in sorted(status_counts.items()))
print(f"Status codes: {status_str} | Pool {pool}")
status_counts.clear()
await asyncio.sleep(1)
async def main():
async with httpcore.AsyncConnectionPool() as pool:
stats_task = asyncio.create_task(monitor_stats(pool))
spawn_task = asyncio.create_task(spawn_and_cancel_tasks(pool))
try:
await asyncio.gather(stats_task, spawn_task)
except KeyboardInterrupt:
print("Stopping...")
if __name__ == "__main__":
asyncio.run(main())
Output:
Status codes: | Pool <AsyncConnectionPool [Requests: 0 active, 0 queued | Connections: 0 active, 0 idle]>
Status codes: 200: 200 | Pool <AsyncConnectionPool [Requests: 0 active, 0 queued | Connections: 0 active, 10 idle]>
Status codes: 0: 202, 200: 139 | Pool <AsyncConnectionPool [Requests: 62 active, 0 queued | Connections: 8 active, 2 idle]>
Status codes: 0: 142, 200: 117 | Pool <AsyncConnectionPool [Requests: 5 active, 0 queued | Connections: 5 active, 0 idle]>
Status codes: 0: 35, 200: 165 | Pool <AsyncConnectionPool [Requests: 5 active, 0 queued | Connections: 5 active, 5 idle]>
Status codes: 200: 100 | Pool <AsyncConnectionPool [Requests: 5 active, 0 queued | Connections: 5 active, 5 idle]>
Status codes: 0: 155, 200: 45 | Pool <AsyncConnectionPool [Requests: 5 active, 100 queued | Connections: 10 active, 0 idle]>
Status codes: 0: 300 | Pool <AsyncConnectionPool [Requests: 5 active, 0 queued | Connections: 10 active, 0 idle]>
Status codes: | Pool <AsyncConnectionPool [Requests: 5 active, 100 queued | Connections: 10 active, 0 idle]>
Status codes: 0: 200 | Pool <AsyncConnectionPool [Requests: 5 active, 0 queued | Connections: 10 active, 0 idle]> <- stops responding here
Status codes: 0: 100 | Pool <AsyncConnectionPool [Requests: 5 active, 100 queued | Connections: 10 active, 0 idle]>
Status codes: 0: 200 | Pool <AsyncConnectionPool [Requests: 5 active, 0 queued | Connections: 10 active, 0 idle]>
Status codes: 0: 100 | Pool <AsyncConnectionPool [Requests: 5 active, 100 queued | Connections: 10 active, 0 idle]>
Status codes: 0: 100 | Pool <AsyncConnectionPool [Requests: 5 active, 100 queued | Connections: 10 active, 0 idle]>
Status codes: 0: 200 | Pool <AsyncConnectionPool [Requests: 5 active, 100 queued | Connections: 10 active, 0 idle]>
Status codes: 0: 200 | Pool <AsyncConnectionPool [Requests: 5 active, 100 queued | Connections: 10 active, 0 idle]>
Status codes: 0: 300 | Pool <AsyncConnectionPool [Requests: 5 active, 100 queued | Connections: 10 active, 0 idle]>
You can see from the above output the point where the pool stops issuing new requests. This is reproducible using the default python http server:
echo 'a' > index.html
python -m http.server
The bug applies to both asyncio and trio. However in trio it's much narrower. With asyncio it triggers almost immediately, while trio takes about 30 seconds.
Code to reproduce in trio:
import random
import trio
import httpcore
from collections import defaultdict
# Global dictionary to store status code counts
status_counts = defaultdict(int)
async def make_request(pool):
try:
response = await pool.request(
method="GET",
url="http://localhost:8000/",
)
status_counts[response.status] += 1
except BaseException:
status_counts[0] += 1
async def spawn_and_cancel_tasks(pool):
while True:
with trio.CancelScope() as cancel_scope:
async with trio.open_nursery() as nursery:
for _ in range(100):
nursery.start_soon(make_request, pool)
await trio.sleep(random.random()) # Allow some time for the tasks to start
cancel_scope.cancel()
async def monitor_stats(pool):
while True:
status_str = ", ".join(f"{code}: {count}" for code, count in sorted(status_counts.items(), reverse=True))
print(f"Status codes: {status_str} | Pool {pool}")
# Clear the counts for the next iteration
status_counts.clear()
await trio.sleep(1)
async def main():
async with httpcore.AsyncConnectionPool() as pool:
async with trio.open_nursery() as nursery:
nursery.start_soon(monitor_stats, pool)
await spawn_and_cancel_tasks(pool)
if __name__ == "__main__":
try:
trio.run(main)
except KeyboardInterrupt:
print("Stopping...")
Sorry to bring this up again; I know these types of issues suck to troubleshoot. But it's been causing issues for us in production. Please let me know if there's anything else I can do to help.
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too