[PYTHON] Use BigQuery from your local Jupyter Notebook

Introduction

You often want to use BigQuery data from your local Jupyter Notebook for less resource-intensive analytics. Therefore, I will introduce a method for writing a BigQuery query in a local Jupyter Notebook and storing it in a DataFrame as it is. ・ MacOS Mojave ・ Python 3.7.3

Create a virtual environment for Python 3

Create an appropriate directory (here / Users / {username} / BigQueryLocal), create a virtual environment called EVN3 with virtualenv, and enable ENV3.

$ mkdir BigQueryLocal
$ cd BigQueryLocal
$ virtualenv -p python3.7 ENV3
$ source ENV3/bin/activate

gcloud authentication

First, click the URL below. https://cloud.google.com/docs/authentication/getting-started?hl=ja

When you click the above URL, the following screen will appear. Click Go to [Create Service Account Key] page. スクリーンショット 2019-11-19 20.45.05.png

Go to Create Service Account Key, set the service account to ʻApp Engine default service account, set the key type to JSON`, and press Create to download the JSON file.

スクリーンショット 2019-11-19 20.42.12.png

Place the downloaded JSON account key {xxxxxxxxxx} .json directly under / Users / {username} / BigQueryLocal and execute the following.

$ export GOOGLE_APPLICATION_CREDENTIALS="/Users/{username}/BigQueryLocal/{xxxxxxxxxx}.json"

Create a Project directory

It's not necessary, but for better visibility, create and move a directory called TestProject.

$ mkdir TestProject
$ cd TestProject

Install the required packages

Install the following minimum required packages with pip.

$ pip install google-cloud-bigquery
$ pip install jupyter
$ pip install pandas

Launch Jupyter Notebook

Start jupyter notebook with the following command.

$ jupyter notebook

After starting, create an appropriate notebook, import the following modules and authenticate the client.

[1]
from google.cloud import bigquery
client = bigquery.Client()

Here, the data of bigquery-public-data.samples.natality prepared by default is used. By executing the following, BigQuery data can be handled by DataFrame.

[2]
sql = """
SELECT
    plurality,
    COUNT(1) AS count,
    year
FROM
    `bigquery-public-data.samples.natality`
WHERE
    NOT IS_NAN(plurality) AND plurality > 1
GROUP BY
    plurality, year
ORDER BY
    count DESC
"""
df = client.query(sql).to_dataframe()
df.head()

Actually, the output is as follows, and it can be confirmed that it is output as a DataFrame. jupyter_sample.png

From the next time onward, you only have to do the following:

$ cd BigQueryLocal
$ source ENV3/bin/activate
$ export GOOGLE_APPLICATION_CREDENTIALS="/Users/{username}/BigQueryLocal/{xxxxxxxxxx}.json"
$ cd TestProject
$ jupyter notebook

reference

See below for environment construction using virtualenv. https://cloud.google.com/python/setup?hl=ja

See below for how to use BigQuery with Jupyter Notebook. https://cloud.google.com/bigquery/docs/visualize-jupyter?hl=ja

Recommended Posts

Use BigQuery from your local Jupyter Notebook
Use BigQuery from python.
Use jupyter notebook by connecting from another host
Use pip with Jupyter Notebook
Use Cython with Jupyter Notebook
How to use Jupyter Notebook
Use markdown with jupyter notebook (with shortcut)
Easy to use Jupyter notebook (Python3.5)
How to use jupyter notebook without polluting your environment with Docker
Use nb extensions with Anaconda's Jupyter notebook
Use apache Spark with jupyter notebook (IPython notebook)
Use Jupyter Lab and Jupyter Notebook with EC2
How to use jupyter notebook with ABCI
Jupyter Notebook Basics of how to use
Use vim keybindings on Docker-launched Jupyter Notebook
Use Django from a local Python script
How to use Jupyter notebook [Super Basic]
Jupyter Notebook memo
Introducing Jupyter Notebook
Powerful Jupyter Notebook
Use Python in your environment from Win Automation
Jupyter notebook password
Jupyter Notebook memo
Jupyter Notebook Settings-How to use (EC2 Amazon Linux 2)
Try Apache Spark on Jupyter Notebook (on local Docker
Run Tensorflow from Jupyter Notebook on Bash on Ubuntu on Windows
Import specific cells from other notebooks with Jupyter notebook
How to instantly launch Jupyter Notebook from the terminal
How to use jupyter lab in Windows 10 local environment
Use Django features from batch scripts outside your Django project
Make Jupyter Notebook your own: Change background and icons
Use thingsspeak from python
Get started Jupyter Notebook
Use fluentd from python
3 Jupyter notebook (Python) tricks
Use MySQL from Python
Run BigQuery from Lambda
Use nim with Jupyter
[Cloud103] # 3 Jupyter Notebook again
Use mecab-ipadic-neologd from python
Specify the cost limit when querying BigQuery casually from Jupyter
I want to use a virtual environment with jupyter notebook!