Django ORM - call queryset update() in json field when the key to update is not present
2022-05-31
2022-05-31
Django is a popular and powerful web framework for Python that has become the go-to choice for many web developers. However, one common performance problem that can occur when using Django is the "n+1" issue. This problem can arise when querying a large amount of data, leading to increased loading times and decreased performance. In this blog post, we'll explore what the n+1 issue is, why it occurs, and how to solve it.
The n+1 issue occurs when querying a large amount of data from a database. Consider the following example: you have a model that represents a blog post, and you want to retrieve a list of all blog posts along with the author of each post. To do this, you might run the following code:
posts = BlogPost.objects.all()
for post in posts:
author = post.author
While this code will retrieve all the information you need, it can be slow and inefficient when dealing with large amounts of data. The reason for this is that it will execute a separate SQL query for each post to retrieve the author information. This can lead to an excessive number of SQL queries and slow down your application.
The n+1 issue occurs because Django does not retrieve all the data for a relationship in a single query. Instead, it retrieves data for each relationship as it is needed. In the example above, this means that Django will execute a separate SQL query for each post to retrieve the author information. This can lead to a large number of SQL queries and slow down your application.
There are several ways to solve the n+1 issue in Django, and the best solution depends on your specific use case. Here are some common solutions:
select_related
methodThe select_related
method is a convenient way to retrieve related data in a single query. You can use it to retrieve the author information for all posts in a single query like this:
posts = BlogPost.objects.all().select_related('author')
for post in posts:
author = post.author
prefetch_related
methodThe prefetch_related
method is similar to select_related
, but it is more flexible and can be used for more complex relationships. You can use it to retrieve the author information for all posts in a single query like this:
posts = BlogPost.objects.all().prefetch_related('author')
for post in posts:
author = post.author
The n+1 issue is a common performance problem that can occur when using Django. By understanding what the n+1 issue is, why it occurs, and how to solve it, you can ensure that your Django applications run smoothly and efficiently.
This content was generated by ChatGPT at 9th February 2023 with the following questions:
The original post had a solution number 3 that looked like this:
Use the in operator: The in operator can be used to retrieve multiple records in a single query. You can use it to retrieve the author information for all posts in a single query like this:
posts = BlogPost.objects.all()
author_ids = [post.author_id for post in posts]
authors = Author.objects.filter(id__in=author_ids)
I told ChatGPT that this is not a solution for the issue and it answered this:
"Thank you for pointing that out. You're correct, using the in operator as I described in Solution 3 is not a direct solution for the n+1 issue. It is more of a workaround that can be used to reduce the number of SQL queries, but it doesn't address the root cause of the problem."
I learnt today that this is not possible with the Django ORM to call update in a queryset json field when the key that you want to update is not present. For example:
queryset.update(json_data__key="something")
This will return FieldDoesNotExist: ModelClass has no field named 'json_data__key'
because some objects does not have the key “key”.
I found in this stackoverflow post a great solution for this.
from django.db.models.expressions import Func
class JsonSetValue(Func):
function = "jsonb_set"
template = "%(function)s(%(expressions)s, '{\"%(keyname)s\"}','\"%(new_value)s\"', %(create_missing)s)"
arity = 1
def __init__(
self,
expression: str,
keyname: str,
new_value: str,
create_missing: bool = False,
**extra,
):
super().__init__(
expression,
keyname=keyname,
new_value=new_value,
create_missing="true" if create_missing else "false",
**extra,
)
With help of JsonSetValue
you will be able to do the following:
queryset.update(
json_data=JsonSetValue(
"json_data",
keyname="key",
new_value="something",
create_missing=True,
)
)
The goal:
Sync some model fields between two different models.
Example scenario:
class PlayerData(models.Model):
goals = models.IntegerField()
assists = models.IntegerField()
tackles = models.IntegerField()
# 10 fields more with player data
class PlayerMostRelevantData(models.Model):
player_data_id = models.ForeignKey(PlayerData, on_delete=models.CASCADE)
goals = models.IntegerField()
assists = models.IntegerField()
We want to keep the data of PlayerData
and PlayerMostRelevantData
on sync by every create, update and object deletion.
Possible solutions:
Of course for this task we could use django signals but a more reliable solution would be to make the data synchronisation at database level using Postgres triggers.
Following SQL-Snippet explain what we want to achieve. If you run the statements you will see that inserting a new row in the table called "original" will create a new row with the same content in the table "original_copy".
drop table if exists original;
create table original(id int, name text);
drop table if exists original_copy;
create table original_copy(original_id int PRIMARY KEY, name text);
create or replace function trigger_on_example()
returns trigger language plpgsql as $$
begin
insert into original_copy (original_id, name)
values(new.id, new.name)
ON CONFLICT (original_id)
DO
UPDATE SET name = new.name;
return new;
end
$$;
create trigger trigger_on_example
before insert or update on original
for each row execute procedure trigger_on_example();
insert into original (id, name)
values (1, 'John');
select *
from original;
select *
from original_copy;
The solution
For our example with the PlayerData
and PlayerMostRelevantData
we can write a django migration like this that creates the Postgres triggers for us in the database.
from django.db import connection, migrations
def create_triggers(apps, schema_editor):
with connection.cursor() as cursor:
# player data create or update trigger
cursor.execute(
"""
create or replace function trigger_on_player_data_create_or_update()
returns trigger language plpgsql as $$
begin
insert into players_playerdata(player_data_id, goals, assists, tackles)
values(new.id, new.goals, new.assists, new.tackles)
ON CONFLICT (player_data_id)
DO
UPDATE SET player_data_id = new.id, goals = new.goals,
assists = new.assists;
return new;
end
$$;
create trigger trigger_on_player_data_create_or_update
before insert or update on players_playerdata
for each row execute procedure trigger_on_player_data_create_or_update();
"""
)
# player data delete trigger
cursor.execute(
"""
create or replace function trigger_on_player_data_delete()
returns trigger language plpgsql as $$
begin
delete from players_playermostrelevantdata where player_data_id=old.id;
return NULL;
end
$$;
create trigger trigger_on_player_data_delete
before delete on players_playerdata
for each row execute procedure trigger_on_player_data_delete();
"""
)
class Migration(migrations.Migration):
dependencies = [
("players", "0001_initial"),
]
operations = [migrations.RunPython(create_triggers, reverse_code=migrations.RunPython.noop)]
Sources and inspiration: