How to format a list of dictionaries (or instances) well in Python

When developing a web application, it is common to implement a process that formats the result of executing a SQL query so that it matches the response of the API.

I think that the data obtained as a result of executing the SQL query is mostly a list of instances defined by the driver used for DB connection, but the process of formatting this data is unexpectedly troublesome. I think there are many. (Most of the processing itself is simple, so I often feel that way.)

In this article, I examined how to efficiently implement the above data formatting process.


For example, if the following data can be obtained from the DB Consider shaping this data.


data_list = [
  {
    'user_id': 1,
    'group_name': 'GroupA',
    'user_name': 'user1',
    'email': '[email protected]'
  },
  {
    'user_id': 2,
    'group_name': 'GroupB',
    'user_name': 'user2',
    'email': '[email protected]'
  },
  {
    'user_id': 3,
    'group_name': 'GroupB',
    'user_name': 'user3',
    'email': '[email protected]'
  },
  {
    'user_id': 4,
    'group_name': 'GroupA',
    'user_name': 'user4',
    'email': '[email protected]'
  },
  {
    'user_id': 5,
    'group_name': 'GroupA',
    'user_name': 'user5',
    'email': '[email protected]'
  }
]

Expected results were grouped by group_name as shown below It will be converted to data in the format.

{
  "GroupA": [
    {
      "user_id": 1,
      "user_name": "user1",
      "email": "[email protected]"
    },
    {
      "user_id": 4,
      "user_name": "user4",
      "email": "[email protected]"
    },
    {
      "user_id": 5,
      "user_name": "user5",
      "email": "[email protected]"
    }
  ],
  "GroupB": [
    {
      "user_id": 2,
      "user_name": "user2",
      "email": "[email protected]"
    },
    {
      "user_id": 3,
      "user_name": "user3",
      "email": "[email protected]"
    }
  ]
}

I examined the following two patterns as the implementation method of the shaping process.

pattern 1

I think the simplest way is to fill the data one by one in the for statement as shown below.

#The argument is the data obtained from the DB
def sample1(data_list):
    
    result_dict = {}
    
    for data in data_list:
        
        group_name = data.get('group_name')
        
        # group_Consideration when name is not registered
        if group_name not in result_dict:
            result_dict[group_name] = []
            
        # group_Generate a dictionary excluding name and add it to the list
        result_dict[group_name].append({key:value for key, value in data.items() if key != 'group_name'})
        
    return result_dict
Pattern 2

This formatting process can also be implemented by using reduce.

from functools import reduce

#The argument is the data obtained from the DB
def sample2(data_list):
    
    def accumulate(total, target):
        
        group_name = target.get('group_name')
        
        # group_Consideration when name is not registered
        if group_name not in total:
            total[group_name] = []
            
        # group_Generate a dictionary excluding name and add it to the list
        total[group_name].append({key:value for key, value in target.items() if key != 'group_name'})
        
        return total
        
    return reduce(accumulate, data_list, {})

To briefly explain this implementation, reduce can pass a function as the first argument, data as the second argument, and an optional initial value as the third argument, so it was obtained from the data formatting function (accumulate), DB. Data (data_list), an empty dictionary is passed as the initial value. Then, when accumulate is called for the first time, an empty dictionary is passed to total, the first data of data_list is passed to target, and the previous return value is set for total after the second time. Become.


The advantage of writing pattern 1 is that it can be implemented by any formatting process, but the disadvantage is that it must be implemented every time a formatting process like this one is required (low reusability). I think.

On the other hand, how to write pattern 2 may reduce readability when implementing complicated processing, but the formatting processing is common by dynamically changing the column name referenced by the data formatting function. The advantage is that it can be converted.

Also, isn't there a problem in terms of speed when using reduce? Because there was a concern Just in case, I measured the time it takes to format the data of 10000000 records for each pattern. * Implemented with jupyter notebook

%timeit sample1(data_list)
11.6 s ± 211 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit sample2(data_list)
12.3 s ± 290 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

From the above execution result, it was found that the implementation using reduce is a little slower, but even so, there is only about 1 second difference in the data of 10000000 records, so you do not have to worry about speed. think.

From the above, in conclusion I think it is realistic to give up on complicated processing and use the for statement while making the basic policy to use reduce as much as possible to standardize the processing.

Recommended Posts

How to format a list of dictionaries (or instances) well in Python
How to get a list of built-in exceptions in python
How to clear tuples in a list (Python)
[Python] How to put any number of standard inputs in a list
How to write a list / dictionary type of Python3
How to pass the execution result of a shell command in a list in Python
How to achieve something like a list of void * (or variant) in Go?
How to get a list of files in the same directory with python
[Python] How to make a list of character strings character by character
How to shuffle a part of a Python list (at random.shuffle)
How to develop in a virtual environment of Python [Memo]
How to get the last (last) value in a list in Python
How to identify the element with the smallest number of characters in a Python list?
How to check in Python if one of the elements of a list is in another list
[Python] How to convert a 2D list to a 1D list
Display a list of alphabets in Python 3
How to get a stacktrace in python
Summary of how to use Python list
How to determine the existence of a selenium element in Python
Try to get a list of breaking news threads in Python.
How to make a string into an array or an array into a string in Python
How to check the memory size of a variable in Python
How to delete multiple specified positions (indexes) in a Python list
[Python] How to delete rows and columns in a table (list of drop method options)
How to pass the execution result of a shell command in a list in Python (non-blocking version)
How to embed a variable in a python string
How to create a JSON file in Python
Make a copy of the list in Python
Summary of how to use MNIST in Python
[Python] How to draw a histogram in Matplotlib
How to remove duplicate elements in Python3 list
How to convert / restore a string with [] in python
How to get the number of digits in Python
How to measure processing time in Python or Java
[Python] How to expand variables in a character string
Things to note when initializing a list in Python
[Python] How to sort dict in list and instance in list
Group by consecutive elements of a list in Python
How to execute a command using subprocess in Python
[Python] How to output the list values in order
[Python] How to use list 1
How to develop in Python
How to count the number of occurrences of each element in the list in Python with weight
How to find the first element that matches your criteria in a Python list
How to slice a block multiple array from a multiple array in Python
How to output a document in pdf format with Sphinx
A story about how to specify a relative path in python.
How to use the __call__ method in a Python class
How to import a file anywhere you like in Python
Get the number of specific elements in a python list
Developed a library to get Kindle collection list in Python
How to define multiple variables in a python for statement
I tried "How to get a method decorated in Python"
How to display a list of installable versions with pyenv
Comparison of how to use higher-order functions in Python 2 and 3
How to get a list of links from a page from wikipedia
How to get a quadratic array of squares in a spiral!
How to connect the contents of a list into a string
How to change python version of Notebook in Watson Studio (or Cloud Pak for Data)
How to write a Python class
[Python] How to do PCA in Python