How to do Bulk Update with PyMySQL and notes [Python]

Conclusion

INSERT ... ON DUPLICATE KEY UPDATE syntax

You can use the execute many method in queries that insert data at once, such as VALUES.

conn = pymysql.connect(
    mysql_endpoint,
    user=username,
    passwd=password,
    db=dbname
)

def bulk_insert_and_update_users():
    prepared_statement = [
        [
            1, # id
            'Qiita Taro', # name
            20 # age
        ],
        [
            2,
            'Qiita Hanako',
            18
        ]
    ]

    with conn.cursor() as cursor:
        sql = (
            'INSERT INTO users '
            '(id, name, age) '
            'VALUES (%s, %s, %s) '
            'ON DUPLICATE KEY UPDATE '
            'name = VALUES(name), '
            'age = VALUES(age)'
        )
        cursor.executemany(sql, prepared_statement)

    conn.commit()

Doing this will generate a query like this:

INSERT INTO users (id, name, age)
VALUES 
  (`1`, `Qiita Taro`, `20`),
  (`2`, `Qiita Hanako`, `18`)
ON DUPLICATE KEY UPDATE
  name = VALUES(name),
  age = VALUES(age);

It's simple and easy to understand.

ELT & FIELD

If you don't do INSERT or if you are worried about the problem of auto_increment due to INSERT ... ON DUPLICATE KEY UPDATE, you should use this. (Reference) In that case, there are some points to be aware of, so let's take a look.

conn = pymysql.connect(
    mysql_endpoint,
    user=username,
    passwd=password,
    db=dbname
)

def bulk_update_users():
    records = [
        {
            'user_id': 1,
            'user_name': 'Qiita Taro',
            'user_age': 20
        },
        {
            'user_id': 2,
            'user_name': 'Qiita Hanako',
            'user_age': 18
        }
    ]

    id_list = []
    name_list = []
    age_list = []
​
    for record in records:
        id_list.append(str(record['user_id']))
        name_list.append(record['user_name'])
        age_list.append(record['user_age'])
​
    id_strings = ','.join(['%s'] * len(id_list))
    name_strings = ','.join(['%s'] * len(name_list))
    age_strings = ','.join(['%s'] * len(age_list))
​
    sql = (
        'UPDATE users SET '
        'name = '
        'ELT(FIELD(id, %(user_ids)s), %(user_names)s), '
        'age = '
        'ELT(FIELD(id, %(user_ids)s), %(user_ages)s) '
        'WHERE id IN (%(user_ids)s);'
        '' % dict(user_ids=id_strings,
                  user_names=name_strings,
                  user_ages=age_strings)
    )
​
    prepared_statement = tuple(id_list) \
        + tuple(name_list) \
        + tuple(id_list) \
        + tuple(age_list) \
        + tuple(id_list)
​
    with conn.cursor() as cursor:
        cursor.execute(sql, prepared_statement)

    conn.commit()

This is the characteristic part.

id_list = []
name_list = []
age_list = []
​
for record in records:
    id_list.append(str(record['user_id']))
    name_list.append(record['user_name'])
    age_list.append(record['user_age'])
​
id_strings = ','.join(['%s'] * len(id_list))
name_strings = ','.join(['%s'] * len(name_list))
age_strings = ','.join(['%s'] * len(age_list))
​
sql = (
    'UPDATE users SET '
    'name = '
    'ELT(FIELD(id, %(user_ids)s), %(user_names)s), '
    'age = '
    'ELT(FIELD(id, %(user_ids)s), %(user_ages)s) '
    'WHERE id IN (%(user_ids)s);'
    '' % dict(user_ids=id_strings,
              user_names=name_strings,
              user_ages=age_strings)
)

When you do this, the variable sql will contain a string like this:

UPDATE users SET 
  name = ELT(FIELD(id, %s, %s), %s, %s), 
  age = ELT(FIELD(id, %s, %s), %s, %s)
WHERE id IN (%s, %s);

This time, I wrote it for the sake of clarity, but in reality, the numbers of % s are all the same, so there is no problem even if you generate a character string from the number of elements of records and reuse it.

And all the parameters are combined in the order of insertion with the% operator.

prepared_statement = tuple(id_list) \
    + tuple(name_list) \
    + tuple(id_list) \
    + tuple(age_list) \
    + tuple(id_list)

# (1, 2, `Qiita Taro`, `Qiita Hanako`, 1, 2, 20, 18, 1, 2)

By passing this to the argument of conn.execute, the following query will be executed.

UPDATE users SET 
  name = ELT(FIELD(id, `1`, `2`), `Qiita Taro`, `Qiita Hanako`), 
  age = ELT(FIELD(id, `1`, `2`), `20`, `18`)
WHERE id IN (`1`, `2`);

Why do you do such a troublesome thing?

If you just want to make a query, it's easy to concatenate strings. However, this is vulnerable to SQL injection.

For example, if the user's name is set to "'; DROP TABLE users;'" , there is no protection.

Bad example


id_list = []
name_list = []
age_list = []
​
for record in records:
    id_list += [f"`{str(record['user_id'])}`"]
    name_list += [f"`{record['user_name']}`"]
    age_list += [f"`{record['user_age']}`"]

id_list_s = ",".join(id_list)
name_list_s = ",".join(name_list)
age_list_s = ",".join(age_list)

sql = (
    'UPDATE users SET '
    'name = '
    f'ELT(FIELD(id, {id_list_s}), {name_list_s}), '
    'age = '
    f'ELT(FIELD(id,{id_list_s}), {age_list_s}) '
    f'WHERE id IN ({id_list_s});'
)

It is possible to replace ' with \' by replace etc., but since it escapes on the pymysql side when execute is executed, it is better to insert the parameter obediently with a prepared statement. It will be good. (Reference)

bonus

To check the SQL statement actually executed by pymysql, it is saved in self.executed after execute, so you can check it by accessing this property.

with conn.cursor() as cursor:
    sql = "SELECT * FROM users WHERE id = %s"
    cursor.execute(sql, (1))
    print(cursor._executed)

# SELECT * FROM users WHERE id = 1

Reference material

PyMySQL 10.5.5 MySQLCursor.executemany() Method The story of a counter stop when using ON DUPLICATE KEY UPDATE in MySQL imploding a list for use in a python MySQLDB IN clause Python prepared statement. SELECT IN problem

Recommended Posts

How to do Bulk Update with PyMySQL and notes [Python]
How to do portmanteau test with python
How to update Python Tkinter to 8.6
How to run tests in bulk with Python unittest
Python: How to use async with
To do tail recursion with Python2
How to get started with Python
What to do with PYTHON release?
How to use FTP with Python
How to calculate date with python
How to update FC2 blog etc. using XMLRPC with python
How to log in to AtCoder with Python and submit automatically
How to import CSV and TSV files into SQLite with Python
How to make a surveillance camera (Security Camera) with Opencv and Python
How to deal with errors when installing Python and pip with choco
How to use Python with Jw_cad (Part 2 Command explanation and operation)
How to build Python and Jupyter execution environment with VS Code
How to do arithmetic with Django template
How to package and distribute Python scripts
How to install and use pandas_datareader [Python]
How to do R chartr () in Python
How to update Google Sheets from Python
Fractal to make and play with Python
How to work with BigQuery in Python
How to display python Japanese with lolipop
python: How to use locals () and globals ()
How to enter Japanese with Python curses
[Python] How to calculate MAE and RMSE
How to use Python zip and enumerate
[Python] How to deal with module errors
How to use is and == in Python
How to install python3 with docker centos
What to do if ipython and python start up with different versions
[Introduction to Python] How to judge authenticity with if statement (True and None)
How to get the date and time difference in seconds with python
How to query BigQuery with Kubeflow Pipelines and save the result and notes
How to upload with Heroku, Flask, Python, Git (4)
How to read a CSV file with Python 2/3
Scraping tabelog with python and outputting to CSV
How to enjoy programming with Minecraft (Ruby, Python)
[REAPER] How to play with Reascript in Python
How to generate permutations in Python and C ++
Strategy on how to monetize with Python Java
Convert PDFs to images in bulk with Python
[Python] How to draw multiple graphs with Matplotlib
[Python] How to read excel file with pandas
How to crop an image with Python + OpenCV
[Python] How to read data from CIFAR-10 and CIFAR-100
How to convert SVG to PDF and PNG [Python]
How to specify attributes with Mock of python
How to measure execution time with Python Part 1
How to use tkinter with python in pyenv
[Python] How to use hash function and tuple.
[TF] How to specify variables to update with Optimizer
[Python] How to handle Japanese characters with openCV
[Python] How to compare datetime with timezone added
How to plot autocorrelation and partial autocorrelation in python
How to measure execution time with Python Part 2
How to get started with the 2020 Python project (windows wsl and mac standardization)
How to use Service Account OAuth and API with Google API Client for python
Python learning notes for machine learning with Chainer Chapters 11 and 12 Introduction to Pandas Matplotlib