Nested pagination through m2m fields is broken with the optimizer enabled. This appears to be because using .prefetch_related()
with a QuerySet
annotated with a Window
function where partition_by
is an m2m field causes an extra join and duplicate results.
See a minimal reproducible example here:
mre/nested-pagination-m2m
In the unit test above, the expected result of the query is:
{
"tagConn": {
"edges": [
{
"node": {
"name": "Tag 1",
"issues": {
"edges": [
{"node": {"name": "Issue 1"}},
{"node": {"name": "Issue 2"}}
]
}
}
},
{
"node": {
"name": "Tag 2",
"issues": {
"edges": [
{"node": {"name": "Issue 2"}},
{"node": {"name": "Issue 3"}}
]
}
}
}
]
}
}
However, the test fails, and the actual result of the query is:
{
"tagConn": {
"edges": [
{
"node": {
"name": "Tag 1",
"issues": {
"edges": [
{"node": {"name": "Issue 1"}},
{"node": {"name": "Issue 2"}},
{"node": {"name": "Issue 2"}} // Duplicate!
]
}
}
},
{
"node": {
"name": "Tag 2",
"issues": {
"edges": [
{"node": {"name": "Issue 2"}},
{"node": {"name": "Issue 2"}}, // Duplicate!
{"node": {"name": "Issue 3"}}
]
}
}
}
]
}
}
It appears that this is caused by the implementation of apply_window_pagination()
.
As suggested above, when you annotate a .prefetch_related()
QuerySet
with a Window
function and refer back to the other side of the m2m, you add another join - causing duplicate results. When Django introduced the ability to prefetch with sliced querysets (which Strawberry-Django isn't directly using) they had to do a bunch of hacks to combine calls to filter together (see QuerySet._next_is_sticky()
and _filter_prefetch_queryset()
).
To demonstrate the difference directly with the Django ORM see: a6f4e31.
In the unit test above, Strawberry-Django produces the following SQL query for the paginated prefetch:
SELECT *
FROM
(SELECT ("projects_issue_tags"."tag_id") AS "_prefetch_related_val_tag_id",
"projects_issue"."name" AS "col1",
"projects_issue"."id" AS "col2",
ROW_NUMBER() OVER (PARTITION BY "projects_issue_tags"."tag_id"
ORDER BY "projects_issue"."id" ASC) AS "_strawberry_row_number",
COUNT(1) OVER (PARTITION BY "projects_issue_tags"."tag_id") AS "_strawberry_total_count"
FROM "projects_issue"
LEFT OUTER JOIN "projects_issue_tags" ON ("projects_issue"."id" = "projects_issue_tags"."issue_id")
INNER JOIN "projects_issue_tags" T4 ON ("projects_issue"."id" = T4."issue_id")
WHERE T4."tag_id" IN (1,
2)
ORDER BY "projects_issue"."id" ASC) "qualify"
WHERE "_strawberry_row_number" <= 2
ORDER BY "col2" ASC
Whereas Django produces this SQL query for the paginated prefetch:
SELECT "_prefetch_related_val_tag_id",
"col1",
"col2"
FROM
(SELECT *
FROM
(SELECT ("projects_issue_tags"."tag_id") AS "_prefetch_related_val_tag_id",
"projects_issue"."name" AS "col1",
"projects_issue"."id" AS "col2",
ROW_NUMBER() OVER (PARTITION BY "projects_issue_tags"."tag_id"
ORDER BY "projects_issue"."id" ASC) AS "qual0"
FROM "projects_issue"
INNER JOIN "projects_issue_tags" ON ("projects_issue"."id" = "projects_issue_tags"."issue_id")
WHERE "projects_issue_tags"."tag_id" IN (1,
2)
ORDER BY "projects_issue"."id" ASC) "qualify"
WHERE ("qual0" > 0
AND "qual0" <= 2)) "qualify_mask"
ORDER BY "col2" ASC
Note the extra LEFT OUTER JOIN
which causes the issue.
As noted in the apply_window_pagination()
docstring, Django 4.2+ actually supports sliced QuerySet
s in .prefetch_related()
now.
The solution here may be to actually use Django's inbuilt support for .prefetch_related()
QuerySet
slicing. However, this means we can't annotate _strawberry_total_count
onto the nodes anymore. It is likely that we would need to refactor that functionality onto the parent records, maybe using a subquery count and OuterRef
?
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too