When generating a definitions from a schema, if defaults are used for dictionaries or arrays, datamodel-code-generator currently creates models using the default
keyword for fields. In my opinion, this should really be the default_factory
keyword.
NOTE: Fortunately, when using pydantic the use of default
as opposed to default_factory
does not result in the typical issue of the shared instances. However, I am not entirely sure of this is intentional or merely a happy coincidence. If were to switch things to factories, datamodel-code-generator
would be a tick safer.
Suppose I have a definition
openapi: 3.0.3
info:
version: x.y.z
title: My schema
servers:
- url: "https://company.org"
paths: {}
components:
schemas:
Book:
description: |-
A class about books
type: object
required:
- title
properties:
title:
type: string
isbn:
description: |-
The book's ISBN
type: string
default: "0000"
authors:
description: |-
The book author or authors
type: array
items:
type: string
default: []
editors:
description: |-
The editors of the book
type: array
items:
type: string
default: ["nobody"]
additionalProperties: false
then, when using datamodel-code-generator, I expect that the defaults of lists be treated as factories. Instead, when one calls, e.g.
python3 -m datamodel_code_generator \
--input-file-type openapi \
--output-model-type pydantic_v2.BaseModel \
--encoding "UTF-8" \
--disable-timestamp \
--use-schema-description \
--use-standard-collections \
--collapse-root-models \
--use-default-kwarg \
--field-constraints \
--capitalise-enum-members \
--enum-field-as-literal one \
--set-default-enum-member \
--use-subclass-enum \
--allow-population-by-field-name \
--snake-case-field \
--strict-nullable \
--use-double-quotes \
--target-python-version 3.11 \
--input myschema.yaml \
--output src/mymodels.py
one gets
# generated by datamodel-codegen:
# filename: myschema.yaml
from __future__ import annotations
from pydantic import BaseModel, ConfigDict, Field
class Book(BaseModel):
"""
A class about books
"""
model_config = ConfigDict(
extra="forbid",
populate_by_name=True,
)
title: str
isbn: str = Field(default="0000", description="The book's ISBN")
authors: list[str] = Field(default=[], description="The book author or authors")
editors: list[str] = Field(default=["nobody"], description="The editors of the book")
The request here, is that this be changed to:
authors: list[str] = Field(default_factory=list, description="The book author or authors")
editors: list[str] = Field(default_factory=lambda: ["nobody"], description="The editors of the book")
By not using factories, two instances of a class can end up sharing the same attribute. E.g.
class Book:
"""
A class about books
"""
title: str
isbn: str = "0000"
authors: list[str] = []
editors: list[str] = ["nobody"]
book1 = Book()
book1.title = "The Hobbit"
book1.isbn = "9780007440849"
book1.authors.append("Tolkien, JRR")
print(book1.authors) # ["Tolkien, JRR"]
book2 = Book()
book2.title = "Harry Potter and the Philospher's Stone"
book2.isbn = "9781408855652"
book2.authors.append("Rowling, JK")
print(book1.authors) # ["Tolkien, JRR", "Rowling, JK"]
print(book2.authors) # ["Tolkien, JRR", "Rowling, JK"]
Note again, that I am not claiming that this happens with the pydantic models. In fact, the above does not happen. But, as stated at the start, it is not clear to me whether this be by design (of pydantic's BaseModel
s/Field
s) simply a happy coincidence.
python>=3.11
datamodel-code-generator>=0.25.9
pydantic>=2.8.2
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too