Dynamic table prefix with SQLAlchemy and Alembic

fief-dev

deep_dive.png

note

Our post series Internals' deep dive guide you through the most interesting and challenging aspects of the Fief codebase 🏰

When developing a web service like Fief, it's almost always necessary to read and write from and to a database. It's also highly probable this database will be an SQL database, like PostgreSQL.

In Python, the go-to library to work with SQL databases is SQLAlchemy. This library is a pure jewel, allowing you to either work at low-level with SQL queries (the Core part) or at higher-level with the ORM capabilities.

Of course, when designing a SQL schema for a software, it's very rare we can define it once and for all for all the lifetime of the app. When we want to implement (or remove!) features, we'll likely need to change or add tables to our database. Hence, we need to migrate the existing schema — with all the data it contains — to the new one, without breaking anything. It would be quite dangerous and error-prone to do it by hand with SQL queries. That's why SQLAlchemy team also maintains Alembic, a library dedicated to create and run migrations scripts.

Standard setup 🧭

In a standard setup, here is how you would define a data model with SQLAlchemy ORM:

class User(Base):
    __tablename__ = "users"

    id: Mapped[int] = mapped_column(primary_key=True)
    email: Mapped[str] = mapped_column(String(320), nullable=False)

where Base is our declarative mapping class. We'll come back to it later! Notice also how we can tell SQLAlchemy the actual name of our table in the database, using __tablename__.

Now, with Alembic, we can create a new migration script in order to create this table in our database. Thanks to the autogeneration feature, it's even able to explore our mappings and generate the corresponding SQL commands, so we don't have to write everything by hand (even though we should always have a look at what it did 😉).

This script will look like this:

revision = "27c6a30d7c24"
down_revision = None

import sqlalchemy as sa
from alembic import op


def upgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.create_table(
        "users",
        sa.Column("id", sa.Integer()),
        sa.Column("email", sa.String(length=320), nullable=False),
        sa.PrimaryKeyConstraint("id"),
    )
    ### end Alembic commands ###


def downgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.drop_table("users")
    ### end Alembic commands ###

We see that Alembic hard-coded the table name, users, in each command. It means that we won't be able to easily modify it at runtime. Besides, what if we want to install several applications on the same database? There'll probably be table names conflicts.

Static table prefix

To solve this, we can tell SQLAlchemy to automatically add a prefix before all our table names. Since every models have to inherit from Base, it's very easy to overload the __tablename__ class attribute:

class Base(DeclarativeBase):
    def __init_subclass__(cls) -> None:
        cls.__tablename__ = f"myapp_{cls.__tablename__}"
        super().__init_subclass__()

What we do here is to overload the __init_subclass__ method to modify the given __tablename__ attribute with a prefix.

Now, SQLAlchemy will look for the table named myapp_users when performing a query. Of course, it reflects in Alembic migration:

revision = "27c6a30d7c24"
down_revision = None

import sqlalchemy as sa
from alembic import op


def upgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.create_table(
        "myapp_users",
        sa.Column("id", sa.Integer()),
        sa.Column("email", sa.String(length=320), nullable=False),
        sa.PrimaryKeyConstraint("id"),
    )
    ### end Alembic commands ###


def downgrade():
    ### commands auto generated by Alembic - please adjust! ###
    op.drop_table("myapp_users")
    ### end Alembic commands ###

However, the table name is still hard-coded. For most applications, this approach works quite well: you'll likely install your app in only one place on a single database where you control all the table names present.

For an app like Fief, which purpose is to be installed by anyone on any server, this is not good enough: we need a way to let users choose their table prefix during setup.

The challenge 🏁

To solve this, we'll need to tackle several problems:

  1. Set a dynamic table prefix on our declarative mapper.
  2. Pass the dynamic prefix value as a variable during migration
  3. Alembic should not hard-code the table prefix in the migration scripts (obviously, we are lazy and we don't want to edit the scripts by hand)

Setting a dynamic prefix on Base

The first step is to set the prefix with a dynamic value in the Base declarative mapper class. Here is how it's implemented in Fief:

import os

from sqlalchemy.orm import DeclarativeBase

from fief.settings import settings

TABLE_PREFIX_PLACEHOLDER = "__FIEF__"
GENERATE_MIGRATION = os.environ.get("GENERATE_MIGRATION") == "1"
TABLE_PREFIX = (
    TABLE_PREFIX_PLACEHOLDER if GENERATE_MIGRATION else settings.database_table_prefix
)


def get_prefixed_tablename(name: str) -> str:
    return f"{TABLE_PREFIX}{name}"


class Base(DeclarativeBase):
    def __init_subclass__(cls) -> None:
        cls.__tablename__ = get_prefixed_tablename(cls.__tablename__)
        super().__init_subclass__()

The approach on Base is the same we reviewed for a static prefix. What changes however is how we determine this prefix.

The TABLE_PREFIX constant is the interesting part. You see that we assign its value following two situations:

  • If the environment variable GENERATE_MIGRATION is set to 1, we set a placeholder, __FIEF__.
  • If not, we take the value from our settings class. This will be the dynamic prefix set by the user.

So, why we need this placeholder trick? It's important to understand that our models mapping will be explored in two different situations:

  • When the application is running; in this case we want our "real" table prefix set by the user.
  • When Alembic generates migrations scripts; in this case, having a placeholder will help us to modify the script with a dynamic variable afterwards.

Isn't get_prefixed_tablename function useless? We could directly assign the formatted string inside __init_subclass__.

Indeed but, actually, in Fief, we sometimes define imperative tables which don't inherit from Base. Hence, we need a common logic to determine the table name.

Generating migrations with dynamic prefix

Now that we have this placeholder prefix in place, we can generate Alembic migrations. The first challenge is to have a way to retrieve the table prefix value inside the migration script.

Fortunately, Alembic provides a very flexible way to pass custom options and values inside MigrationContext. This object stores basically everything Alembic needs to know to perform a migration. And it's easily accessible inside a migration script through op.get_context(). Then, we can use the opts dictionary to lookup for custom values.

So, we'll consider we have a table_prefix key set on this dictionary holding our dynamic prefix value. We'll see later how we can actually set it when Alembic runs.

Since we'll need this value in each migration script, it's a good idea to put it in the migration template, script.py.mako. It looks like this:

def upgrade():
    table_prefix = op.get_context().opts["table_prefix"]
    ${upgrades if upgrades else "pass"}


def downgrade():
    table_prefix = op.get_context().opts["table_prefix"]
    ${downgrades if downgrades else "pass"}

Quite simply, we assign the variable table_prefix right before the operations auto-generated by Alembic.

So, let's see the output of such migration:

env GENERATE_MIGRATION=1 alembic -c fief/alembic.ini revision --autogenerate -m "My first migration"

Of course, we don't forget to set our custom GENERATE_MIGRATION environment variable 😉

The result will look like this:

revision = "27c6a30d7c24"
down_revision = None

import sqlalchemy as sa
from alembic import op


def upgrade():
    table_prefix = op.get_context().opts["table_prefix"]
    ### commands auto generated by Alembic - please adjust! ###
    op.create_table(
        "__FIEF__users",
        sa.Column("id", sa.Integer()),
        sa.Column("email", sa.String(length=320), nullable=False),
        sa.PrimaryKeyConstraint("id"),
    )
    ### end Alembic commands ###


def downgrade():
    table_prefix = op.get_context().opts["table_prefix"]
    ### commands auto generated by Alembic - please adjust! ###
    op.drop_table("__FIEF__users")
    ### end Alembic commands ###

So, we do have our table_prefix variable (which is unused). Besides, the table names are now prefixed with the placeholder prefix. Indeed, from Alembic point-of-view, all we did was to set the prefix to the static string __FIEF__.

We could stop here! With a text editor and a bunch of RegEx, we could manually transform every occurence of __FIEF__table by f"{table_prefix}_table" to inject the dynamic value. And actually, that's what we did for a while 😅 Until we mastered a whole new power: codemods.

Transform Alembic migrations using a codemod

Codemods are automated scripts able to parse Python source code, apply operations on the syntax tree and output the updated source code. That's exactly what we want to do: for each string literal that contains __FIEF__, we want to change it to a formatted string with the table_prefix variable.

The codemod framework is part of the LibCST library, which purpose is to parse Python source code into Concrete Syntax Tree. Basically, we're able to work with objects and structures representing a Python source code... Inside another Python script.

This framework aims at building dedicated CLI tools to refactor entire codebase, like the excellent bump-pydantic project by Kludex. Of course, it also works for simple use cases like ours. And it's actually quite straightforward to do! Here is the full code to perform our transformation:

import libcst as cst
from libcst.codemod import VisitorBasedCodemodCommand

from fief.models.base import TABLE_PREFIX_PLACEHOLDER


class ConvertTablePrefixStrings(VisitorBasedCodemodCommand):
    DESCRIPTION: str = (
        "Converts strings containing table prefix placeholder "
        "to a format-string with dynamic table prefix."
    )

    def leave_SimpleString(
        self, original_node: cst.SimpleString, updated_node: cst.SimpleString
    ) -> cst.SimpleString | cst.FormattedString:
        value = updated_node.evaluated_value

        if not isinstance(value, str):
            return updated_node

        if TABLE_PREFIX_PLACEHOLDER in value:
            before, after = value.split(TABLE_PREFIX_PLACEHOLDER)
            before = before.replace('"', '\\"')
            after = after.replace('"', '\\"')
            return cst.FormattedString(
                [
                    cst.FormattedStringText(before),
                    cst.FormattedStringExpression(cst.Name("table_prefix")),
                    cst.FormattedStringText(after),
                ]
            )

        return updated_node

All we need is to inherit from the VisitorBasedCodemodCommand base class. Then, we just have to implement a method for each node we're interested in. Here, we want to look at SimpleString, which corresponds to simple, non-formatted string, like "__FIEF__table".

Then, we check if this string contains our placeholder. If it does, we replace it by a FormattedString node containing the text before, the variable table_prefix (using a Name node) and the text after.

And that's it! LibCST comes with a CLI tool we can use to directly summon codemods on one or several files. Let's try it:

python -m libcst.tool codemod -x fief.alembic.table_prefix_codemod.ConvertTablePrefixStrings my_migration.py

Now, our migration script looks like this:

revision = "27c6a30d7c24"
down_revision = None

import sqlalchemy as sa
from alembic import op


def upgrade():
    table_prefix = op.get_context().opts["table_prefix"]
    ### commands auto generated by Alembic - please adjust! ###
    op.create_table(
        f"{table_prefix}_users",
        sa.Column("id", sa.Integer()),
        sa.Column("email", sa.String(length=320), nullable=False),
        sa.PrimaryKeyConstraint("id"),
    )
    ### end Alembic commands ###


def downgrade():
    table_prefix = op.get_context().opts["table_prefix"]
    ### commands auto generated by Alembic - please adjust! ###
    op.drop_table(f"{table_prefix}_users")
    ### end Alembic commands ###

Magic isn't it? 🪄 We transformed Python code using Python code!

But wait, there's more. As you probably understood by now, we're super lazy. We don't want to manually run that command each time we generate a new migration. Could it be done automatically? Of course!

Alembic has a post_write_hooks mechanism allowing you to trigger commands after a migration has been generated. We just need to tell it what to do in alembic.ini:

[post_write_hooks]
hooks = codemod,format,lint

codemod.type = exec
codemod.executable = python
codemod.options = -m libcst.tool codemod -x fief.alembic.table_prefix_codemod.ConvertTablePrefixStrings REVISION_SCRIPT_FILENAME

format.type = exec
format.executable = ruff
format.options = format REVISION_SCRIPT_FILENAME

lint.type = exec
lint.executable = ruff
lint.options = check --fix REVISION_SCRIPT_FILENAME

The REVISION_SCRIPT_FILENAME variable automatically points to the current file we're working on. Notice how we also take this chance to run our linter, Ruff, on it.

Great! Now, in a single command, we can automatically generate migrations with commands having a dynamic table prefix.

Set the table_prefix configuration option

Finally, we need to make this table_prefix option available to Alembic migration scripts. Fortunately, Alembic is very flexible. The env.py script contains all the logic it executes when a migration is run. The key is to customize the context.configure call. Here is what it looks like in Fief:

context.configure(
    connection=connection,
    target_metadata=target_metadata,
    version_table=version_table,
    table_prefix=TABLE_PREFIX,
)

We simply import our TABLE_PREFIX constant and assign it to the table_prefix key. This way, it'll be available when calling op.get_context().opts["table_prefix"].

The special case of version_table

You might have noticed the version_table argument we also added to .configure(). In Alembic, the version table is a very simple table storing the latest migration ID we applied; so it knows where to start from on the next run.

By default, it's called alembic_version. But of course, we want it also to have our dynamic prefix. That's why we customize it like this in env.py:

version_table = f"{TABLE_PREFIX}{config.get_main_option('version_table_name')}"

We pull the actual name from a config option we set in alembic.ini, but we could have simply done version_table = f"{TABLE_PREFIX}alembic_version"

We did it, lads 👏

Great! With this setup, we have a fully automated way of generating and applying database migrations with a dynamic table prefix.

As we've seen, the basics are quite simple: it's just a matter of setting the right variable at the right place. The real challenge comes from the transformation of the generated migration, so the table_prefix variable is injected. Fortunately, LibCST makes it very easy for us to define powerful transformation scripts. It really feels magical to write code that changes code!

The downside of this approach is that, during development, we actually need to apply the database migration with the placeholder. It means that we have tables named __FIEF__users in the database. It's unfortunately necessary because Alembic first checks for the actual schema to determine the migration commands it has to generate. But that's not too bad and, who knows, maybe we'll find a way to circumvent that too 😄 (if you have suggestions about this, please tell me!)