[PYTHON] Use notebook-type applications to develop customized visualizations (Part 1)

This article is a contribution to Data Visualization Advent Calendar 2015.

Introduction

There are so many different tools out there to choose from for data visualization. In addition to applications such as Excel and Tableau, toolkits such as D3 for creating custom visualizations are also available as open source software for individuals. There is no such thing as "this is the correct answer" when selecting a tool, and basically you should choose one that is easy to use according to your skill set and data type. However, since this is a place for programmers, we will focus on the visualization that is done by writing code. In this article, we will introduce useful tools for programming visualization such as Jupyter Notebook and Beaker Notebook. __ Among them, I will focus on the part of how to use such an environment when writing JavaScript code by myself and creating a custom visualization. __

Visualization work flow

Among the various visualization tasks, when dealing with relatively small data, the tasks will be as follows.

  1. Data collection
  2. Processing to machine readable state
  3. Analysis
  4. Visualization
  5. Examination of results
  6. Further analysis and visualization as needed (return to # 4)

You should be working in a loop like this. The following is a review of these steps from a tool perspective.

work tool
Data collection Analog methods such as paper and digital cameras, experimental equipment, programs such as crawlers(For data on the web)
Processing to machine readable state(cleansing) Python/R/Perl/Node.js/Data processing scripts such as awk
analysis Python/R向けの統計analysisパッケージ
Visualization Original drawing code by JavaScript, Python/RのVisualizationライブラリ

Of course it is possible to do everything in one programming language, but I think that it is often necessary to do the cleansing and analysis part and the actual drawing part in different languages, especially when creating a custom visualization. .. If you need to use multiple languages and tools like this, you can do it only with a text editor, a terminal and a browser for checking the results, but for exploratory visualization work, you also need to repeat each step. The problem is that it makes it difficult to get a complete picture of the work.

A notebook-type application is very useful in this case. Originally a software used by professionals such as Mathematica [Lab Notebook](https://ja.wikipedia.org/wiki/%E5%AE%9F%E9%A8%93%E3%83%8E%E3%83] It is a concept created in a position like the digital version of% BC% E3% 83% 88), but it is very convenient for data analysts to mix code with human-readable documents, visualization results, etc. Nowadays, it is widely used not only in science but also in the field of data analysis.

Typical notebook type application

Jupyter Notebook

main-logo.png

I think it's the most famous open source one. Originally an application named IPython Notebook for Python, it changed policy some time ago and was split into a notebook application part and a kernel part that executes the actual code, and now Python, R, Julia. It supports over 40 programming languages, including.

It has a very high affinity with the originally supported language Python, and visualizations such as the well-known library matplotlib are supported without any special action. But what if you want to develop your own visualizations such as D3.js with JavaScript?

Create a custom visualization module in Jupyter

スクリーンショット 2015-12-06 18.28.00.png

This screenshot is used in on this notebook Cytocsape.js This is a rendered network diagram with the embedded visualization module at: //js.cytoscape.org/). In this way, it is possible to embed a third-party visualization library in a cell in your notebook. However, the method is not very sophisticated ...

How to embed arbitrary visualizations?

Here, I can install with pip based on @domitory's Prototype [Python Package](https://pypi.python. Let's take a look at the cases summarized in (org / pypi / py2cytoscape).

1. Prepare an HTML file containing styles, etc.

First, prepare HTML that can be embedded. Again, this isn't exactly full HTML, but something that Jupyter Notebook can interpret as a template for jinja2. In this case, you would insert the actual visualization for the following tags.

<div id="{{uuid}}"></div>
2. Use require.js to load external JavaScript

This is also a problem with JavaScript at present, but since ES5 does not have a mechanism to handle external modules neatly, IPython Notebook uses RequireJS to externally. Supports JavaScript embedding.

if (window['cytoscape'] === undefined) {

    //Location of JS library to read from outside
    var paths = {
        cytoscape: 'http://cytoscape.github.io/cytoscape.js/api/cytoscape.js-latest/cytoscape.min'
    };

    require.config({
        paths: paths
    });

    require(['cytoscape'], function (cytoscape) {
        console.log('Loading Cytoscape.js Module...');
        window['cytoscape'] = cytoscape;

        var event = document.createEvent("HTMLEvents");
        event.initEvent("load_cytoscape", true, false);
        window.dispatchEvent(event);
    });
}
3. Write Python code to pass the data to it

And finally, write the code to pass the data from the Python side to the prepared JS or HTML template. You need the code to render the template after passing the data on the Python side in a form that the JavaScript code can interpret.

cyjs_widget = template.render(
  nodes=json.dumps(nodes),
  edges=json.dumps(edges),
  background=background,
  uuid="cy" + str(uuid.uuid4()),
  widget_width=str(width),
  widget_height=str(height),
  layout=layout_algorithm,
  style_json=json.dumps(style)
)

display(HTML(cyjs_widget))

In this way, the current Jupyter Notebook did not originally have a purpose for creating a mixture of multiple languages or custom visualization on the spot, so load an external JS library and create visualization by trial and error in the cell. I think it's more suitable if you have an existing visualization module and want to use it in a cell rather than going.

Currently, the Jupyter project is in the process of expanding its scale by acquiring large grants from various sponsors, so it is likely that the expansion mechanism in this area will be improved in the future.

Beaker Notebook

スクリーンショット 2015-12-06 23.37.22.png

Notebook for Polyglot data analysis environment

Jupyter / IPython Notebook is a very powerful tool, but at present there is no way to mix multiple languages in one notebook or exchange data between multiple languages. Also, since there is no mechanism that can easily execute JS for arbitrary HTML, the above work is required when using a unique visualization module other than the prepared visualization module (matplotlib, Bokeh, etc.). Will be. Beaker is a notebook-type application that has a mechanism to solve these problems.

Difference from Jupyter

The biggest difference from Jupyter is that __Jupyter limits the kernel to connect to each notebook and manages it in the form of one language per notebook, whereas in Beaker this is managed cell by cell. __. Therefore, you can do the following on the same notebook:

Specifically, it means that there is a standard mechanism for exchanging data between cells using a common object called beaker. For example, the value assigned in Python,

beaker.mydata = "My sample data"

Access in R language

beaker::get('mydata')

You can easily use it with JavaScript.

var myJsData = beaker.mydata + " updated by JS";

By using this, you can read CSV with Pandas of Python, convert it to Dictionary object, pass it to JavaScript cell as it is via beaker object and use it for drawing, etc. with only standard functions.

The following is an example of preparing data in Python and drawing with JavaScript code using Cytoscape.js in the embedded HTML cell:

スクリーンショット 2015-12-05 17.19.39.png

In this way, this application is recommended when you want to use Python for the processing part of __ data and R for statistic calculation, but mainly draw the data using D3.js __. This is because all the processes can be done in one notebook.

Summary

This time I introduced custom module embedding in Jupyter and Beaker Notebook, but in the second part, I will look at the actual work with Beaker.

Recommended Posts

Use notebook-type applications to develop customized visualizations (Part 1)
Use notebook applications to develop customized visualizations 2
How to use cybozu.com developer network (Part 2)
How to use Tweepy ~ Part 1 ~ [Getting Tweet]
How to use Tweepy ~ Part 2 ~ [Follow, like, etc.]