abelcastro.dev

The Django n+1 issue and solutions for it

2023-02-09

DjangoChatGPT

Django ORM - call queryset update() in json field when the key to update is not present

2022-05-31

PostgresDjango

Sync model fields using Postgres triggers

2022-03-22

PostgresDjango
1
...
6
7
8
...
10

Abel Castro 2026 - checkout the source code of this page on GitHub - Privacy Policy

Django is a popular and powerful web framework for Python that has become the go-to choice for many web developers. However, one common performance problem that can occur when using Django is the "n+1" issue. This problem can arise when querying a large amount of data, leading to increased loading times and decreased performance. In this blog post, we'll explore what the n+1 issue is, why it occurs, and how to solve it.

What is the n+1 issue in Django?

The n+1 issue occurs when querying a large amount of data from a database. Consider the following example: you have a model that represents a blog post, and you want to retrieve a list of all blog posts along with the author of each post. To do this, you might run the following code:

posts = BlogPost.objects.all()
for post in posts:
    author = post.author

While this code will retrieve all the information you need, it can be slow and inefficient when dealing with large amounts of data. The reason for this is that it will execute a separate SQL query for each post to retrieve the author information. This can lead to an excessive number of SQL queries and slow down your application.

Why does the n+1 issue occur in Django?

The n+1 issue occurs because Django does not retrieve all the data for a relationship in a single query. Instead, it retrieves data for each relationship as it is needed. In the example above, this means that Django will execute a separate SQL query for each post to retrieve the author information. This can lead to a large number of SQL queries and slow down your application.

How to solve the n+1 issue in Django?

There are several ways to solve the n+1 issue in Django, and the best solution depends on your specific use case. Here are some common solutions:

  1. Use the select_related method

The select_related method is a convenient way to retrieve related data in a single query. You can use it to retrieve the author information for all posts in a single query like this:

posts = BlogPost.objects.all().select_related('author')

for post in posts:
author = post.author
  1. Use the prefetch_related method

The prefetch_related method is similar to select_related, but it is more flexible and can be used for more complex relationships. You can use it to retrieve the author information for all posts in a single query like this:

posts = BlogPost.objects.all().prefetch_related('author')

for post in posts:
author = post.author

Conclusion

The n+1 issue is a common performance problem that can occur when using Django. By understanding what the n+1 issue is, why it occurs, and how to solve it, you can ensure that your Django applications run smoothly and efficiently.

Did you liked this article?

This content was generated by ChatGPT at 9th February 2023 with the following questions:

  • Write a blog post about django n+1 issues and solutions for it
  • Generate a markdown file with that content
  • Write a meta description for this post

The original post had a solution number 3 that looked like this:

Use the in operator: The in operator can be used to retrieve multiple records in a single query. You can use it to retrieve the author information for all posts in a single query like this:

posts = BlogPost.objects.all()
author_ids = [post.author_id for post in posts]
authors = Author.objects.filter(id__in=author_ids)

I told ChatGPT that this is not a solution for the issue and it answered this:

"Thank you for pointing that out. You're correct, using the in operator as I described in Solution 3 is not a direct solution for the n+1 issue. It is more of a workaround that can be used to reduce the number of SQL queries, but it doesn't address the root cause of the problem."

I learnt today that this is not possible with the Django ORM to call update in a queryset json field when the key that you want to update is not present. For example:

queryset.update(json_data__key="something")

This will return FieldDoesNotExist: ModelClass has no field named 'json_data__key' because some objects does not have the key “key”.

I found in this stackoverflow post a great solution for this.

from django.db.models.expressions import Func

class JsonSetValue(Func):
    function = "jsonb_set"
    template = "%(function)s(%(expressions)s, '{\"%(keyname)s\"}','\"%(new_value)s\"', %(create_missing)s)"
    arity = 1

    def __init__(
        self,
        expression: str,
        keyname: str,
        new_value: str,
        create_missing: bool = False,
        **extra,
    ):
        super().__init__(
            expression,
            keyname=keyname,
            new_value=new_value,
            create_missing="true" if create_missing else "false",
            **extra,
        )

With help of JsonSetValue you will be able to do the following:

queryset.update(
    json_data=JsonSetValue(
        "json_data",
        keyname="key",
        new_value="something",
        create_missing=True,
    )
)

The goal:

Sync some model fields between two different models.

Example scenario:

class PlayerData(models.Model):
	goals = models.IntegerField()
	assists = models.IntegerField()
	tackles = models.IntegerField()
	# 10 fields more with player data 


class PlayerMostRelevantData(models.Model):
	player_data_id = models.ForeignKey(PlayerData, on_delete=models.CASCADE)
	goals = models.IntegerField()
	assists = models.IntegerField()

We want to keep the data of PlayerData and PlayerMostRelevantData on sync by every create, update and object deletion.

Possible solutions:

Of course for this task we could use django signals but a more reliable solution would be to make the data synchronisation at database level using Postgres triggers.

Following SQL-Snippet explain what we want to achieve. If you run the statements you will see that inserting a new row in the table called "original" will create a new row with the same content in the table "original_copy".

drop table if exists original;
create table original(id int, name text);
drop table if exists original_copy;
create table original_copy(original_id int PRIMARY KEY, name text);


create or replace function trigger_on_example()
returns trigger language plpgsql as $$
begin
    insert into original_copy (original_id, name)
    values(new.id, new.name)
    ON CONFLICT (original_id)
    DO
        UPDATE SET name = new.name;
	return new;
end
$$;

create trigger trigger_on_example
before insert or update on original
for each row execute procedure trigger_on_example();

insert into original (id, name)
values (1, 'John');

select *
from original;

select *
from original_copy;

The solution

For our example with the PlayerData and PlayerMostRelevantData we can write a django migration like this that creates the Postgres triggers for us in the database.


from django.db import connection, migrations


def create_triggers(apps, schema_editor):
    with connection.cursor() as cursor:
        # player data create or update trigger
        cursor.execute(
            """
            create or replace function trigger_on_player_data_create_or_update()
            returns trigger language plpgsql as $$
            begin
               insert into players_playerdata(player_data_id, goals, assists, tackles)
               values(new.id, new.goals, new.assists, new.tackles)
               ON CONFLICT (player_data_id)
               DO
                   UPDATE SET player_data_id = new.id, goals = new.goals,
                   assists = new.assists;
               return new;
            end
            $$;
            create trigger trigger_on_player_data_create_or_update
            before insert or update on players_playerdata
            for each row execute procedure trigger_on_player_data_create_or_update();
            """
        )

        # player data delete trigger
        cursor.execute(
            """
            create or replace function trigger_on_player_data_delete()
            returns trigger language plpgsql as $$
            begin
               delete from players_playermostrelevantdata where player_data_id=old.id;
               return NULL;
            end
            $$;
            create trigger trigger_on_player_data_delete
            before delete on players_playerdata
            for each row execute procedure trigger_on_player_data_delete();
            """
        )


class Migration(migrations.Migration):

    dependencies = [
        ("players", "0001_initial"),
    ]

    operations = [migrations.RunPython(create_triggers, reverse_code=migrations.RunPython.noop)]


Sources and inspiration:

  • https://www.postgresqltutorial.com/postgresql-upsert/
  • https://stackoverflow.com/a/40901037
  • https://objectpartners.com/2019/01/24/creating-raw-sql-migrations-in-django/