How to work with BigQuery in Python

You can read more about how to work with BigQuery in Python at here. Managing Datasets if you want to work with datasets Managing Tables if you want to manipulate tables There is a description in.

Creating a table

Creating a table


from google.cloud import bigquery

#PJ name must be specified when using with Colab
client = bigquery.Client()
client = bigquery.Client(project=project_id) # "your-project"

schema = [
    bigquery.SchemaField("full_name", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
]

# table_id = "your-project.your_dataset.your_table_name"
table = bigquery.Table(table_id, schema=schema)
table = client.create_table(table)  # Make an API request.

print(
    "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)

Schema specification method when creating a table

Details of the statement can be found at here. , For details on how to specify the schema, see BigQuery official website. Change field_type and mode as needed.

statement


SchemaField(name, field_type, mode='', description=None, fields=(), policy_tags=None)

An example of schema specification


#String(There is a space)
SchemaField(name, 'STRING', mode='NULLABLE')
#integer(There is a space)
SchemaField(name, 'INT64', mode='NULLABLE')
#Floating point(There is a space)
SchemaField(name, 'FLOAT64', mode='NULLABLE')
#date(Mandatory)
SchemaField(name, 'DATE', mode='REQUIRED')
#Date and time(Mandatory)
SchemaField(name, 'DATETIME', mode='REQUIRED')

How to execute a query

Execute query


from google.cloud import bigquery

#PJ name must be specified when using with Colab
client = bigquery.Client()
client = bigquery.Client(project=project_id) # "your-project"

#Describe the query you want to execute
query = '''
select * from `tableID`
where ...
'''

client.query(query)

How to insert

Please note that you can insert up to 10,000 lines at a time. If you want to enter more than 10,000 lines, you can use the method described below.

from google.cloud import bigquery

#PJ name must be specified when using with Colab
client = bigquery.Client()
client = bigquery.Client(project=project_id) # "your-project"

# table_id = "your-project.your_dataset.your_table_name"
table = client.get_table(table_id)  # Make an API request.

#Two-dimensional array of lists
#In the example, it is a tuple, but there is no problem with the list(Below is the case with two columns)
#However, the number of schemas and the tuples inside(or list)Note that an error will occur if the number of elements in
rows_to_insert = [("string", num), ("string", num)]

errors = client.insert_rows(table, rows_to_insert)  # Make an API request.

#When using pandas dataflame
errors = client.insert_rows(table, df.values.tolist())

if errors == []:
    print("New rows have been added.")

Assuming that dataflame is included in df, it can be changed to the two-dimensional array of the list below.

df.values.tolist()

Also, if you want to insert data larger than 10,000 rows, divide it as follows.

rows = len(df)
count = 0
while rows > 0:
  client.insert_rows(table, df[(count * 10000): ((count+1) * 10000)].values.tolist())
  rows = rows - 10000
  count = count + 1

There may be a better way to write it, but it is possible to add everything even if it is larger than 10000 above.

Recommended Posts

How to work with BigQuery in Python
To work with timestamp stations in Python
Connect to BigQuery with Python
[REAPER] How to play with Reascript in Python
How to use tkinter with python in pyenv
How to develop in Python
How to convert / restore a string with [] in python
How to do hash calculation with salt in Python
Explain in detail how to make sounds with python
I want to work with a robot in python.
[Work efficiency] How to change file names in Python
[Python] How to do PCA in Python
Python: How to use async with
How to collect images in Python
How to get started with Python
How to use Mysql in python
How to wrap C in Python
How to use ChemSpider in Python
How to use FTP with Python
How to use PubChem in Python
How to calculate date with python
How to handle Japanese in Python
How to extract any appointment in Google Calendar with Python
How to log in to AtCoder with Python and submit automatically
[Introduction to Python] How to use class in Python?
Try logging in to qiita with Python
How to access environment variables in Python
How to dynamically define variables in Python
How to do R chartr () in Python
[Itertools.permutations] How to put permutations in Python
How to get a stacktrace in python
How to display multiplication table in python
How to extract polygon area in Python
How to do portmanteau test with python
How to check opencv version in python
How to display python Japanese with lolipop
How to switch python versions in cloud9
How to adjust image contrast in Python
How to use __slots__ in Python class
How to dynamically zero pad in Python
How to enter Japanese with Python curses
How to use regular expressions in Python
[Python] How to deal with module errors
How to display Hello world in python
How to use is and == in Python
How to write Ruby to_s in Python
How to install python3 with docker centos
How to deal with python installation error in pyenv (BUILD FAILED)
How to create a heatmap with an arbitrary domain in Python
How to use python put in pyenv on macOS with PyCall
How to display legend marks in one with Python 2D plot
How to calculate "xx time" in one shot with Python timedelta
How to install Python
How to install python
How tuples work | Python
How to upload with Heroku, Flask, Python, Git (4)
How to deal with memory leaks in matplotlib.pyplot
How to use the C library in Python
How to read a CSV file with Python 2/3
How to receive command line arguments in Python
How to enjoy programming with Minecraft (Ruby, Python)