[PYTHON] [Introduction to WordCloud] It's easy to use even with Jetson-nano ♬

I think I've seen WordCloud once, but it was easy when I tried it, so I'll put it in an article. You can make something like this. 4-yosino_名.png 5-yosino_名.png

What i did

·environment ・ The simplest example ・ Try a little ・ Think about usage

·environment

This time, I used Jetson-nano. Therefore, the base environment is as linked, that is, the Ubuntu environment. Normally, you can install it with:

$ pip3 install wordcloud

However, I got some errors and could not install it, so

$ sudo pip3 install wordcloud

I was able to install it with. In the case of Japanese, it is also necessary to Install MeCab etc. to analyze the word division and part of speech. In addition, Japanese fonts were installed as shown in Reference (2) below. First, download From Link Noto Sans CJK JP.

$ unzip NotoSansCJKjp-hinted.zip
$ mkdir -p ~/.fonts
$ cp *otf ~/.fonts
$ fc-cache -f -v # optional

【reference】 ①amueller/word_cloud[Note] Create a Japanese word cloud

・ The simplest example

Looking at the reference code below, WordCloud seems to output by changing the size and output direction of characters randomly in a certain area according to the character frequency. 【reference】 ・ Word_cloud / wordcloud / wordcloud.py So, the simplest usage code is as follows.

from MeCab import Tagger
import matplotlib.pyplot as plt
from wordcloud import WordCloud

t = Tagger()

text = "On the 25th, Meijo University (Nagoya City) awarded the title of "Special Honorary Professor" to Akira Yoshino (72), a professor at the same university who won the Nobel Prize in Chemistry for the development of lithium-ion batteries and an honorary fellow of Asahi Kasei. Mr. Yoshino has been a professor at the Graduate School of Science and Engineering since 2017, and is in charge of lectures once a week. According to Meijo University, the special honorary professor is a title to honor faculty members who have won the Nobel Prize. It was founded in 2014 when Isamu Akasaki, a tenured professor, and Hiroshi Amano, a former professor, won the Nobel Prize in Physics for the development of blue light emitting diodes (LEDs)."

splitted = " ".join([x.split("\t")[0] for x in t.parse(text).splitlines()[:-1]])
print("1",splitted)
wc = WordCloud(font_path="/home/muauan/.fonts/NotoSansCJKjp-Regular.otf")
wc.generate(splitted)
plt.axis("off")
plt.imshow(wc)
plt.pause(1)
plt.savefig('./output_images/yosino0_{}.png'.format(text[0])) 
plt.close()

If you change the t = Tagger () part of this code and move it, you can generate the one in the Word Cloud column in the table below. From the top, the same item numbers correspond.

Item number dictionary Word-separation & part of speech deletion
t = Tagger() splitted = " ".join([x.split("\t")[0] for x in t.parse(text).splitlines()[:-1]])
t = Tagger(" -d " + args.dictionary) splitted = " ".join([x.split("\t")[0] for x in t.parse(text).splitlines()[:-1]])
t = Tagger(" -d " + args.dictionary) splitted = " ".join([x.split("\t")[0] for x in t.parse(text).splitlines()[:-1] if x.split("\t")[1].split(",")[0] not in ["Particle", "Auxiliary verb", "adverb", "Adnominal adjective", "verb"]])
Item number dictionary Word-separation & part of speech deletion Word Cloud
default dictionary On the 25th, Meijo University (Nagoya City) awarded the title of "Special Honorary Professor" to Asahi Kasei Honorary Fellow Akira Yoshino (72), a professor at the same university who received the Nobel Prize in Chemistry for the development of lithium-ion batteries. Mr. Yoshino has been a professor at the Graduate School of Science and Engineering since 2017, and is in charge of lectures once a week. According to Meijo University, the special honorary professor is a title to honor the faculty members who received the Nobel Prize. It was founded in 2014 when Isamu Akasaki, a tenured professor, and Hiroshi Amano, a former professor, received the Nobel Prize in Physics for the development of blue light emitting diodes (LEDs). yosino1_Name.png
neologd On the 25th, Meijo University (Nagoya City) awarded the title of "Special Honorary Professor" to Asahi Kasei Honorary Fellow Akira Yoshino (72), a professor at the same university who received the Nobel Prize in Chemistry for the development of lithium-ion batteries. Mr. Yoshino has been a professor at the Graduate School of Science and Engineering since 2017, and is in charge of lectures once a week. According to Meijo University, the special honorary professor is a title to honor the faculty members who received the Nobel Prize. It was founded in 2014 when Isamu Akasaki, a tenured professor, and Hiroshi Amano, a former professor, received the Nobel Prize in Physics for the development of blue light emitting diodes (LEDs). yosino2_Name.png
neologd +Delete particles, auxiliary verbs, etc. Meijo University (Nagoya City) Received the Nobel Prize in Chemistry for Lithium Ion Battery Development on the 25th. Professor of the same university Asahi Kasei Honorary Fellow Akira Yoshino (72) Awarded the title of "Special Honorary Professor". Mr. Yoshino, Professor, Graduate School of Science and Engineering, 2017, lecture once a week. Meijo University, Special Honorary Professor Nobel Prize Winner Title for faculty member. Tenured professor Isamu Akasaki Former professor Hiroshi Amano, Blue light emitting diode (LED) development Nobel Prize in Physics was awarded. yosino3_Name.png

・ Try a little

code

For the time being, I put the code to generate WordCloud for the text entered on the keyboard below. WordCloud/wc_input_original.py

Introduction of stop words

In the above, unnecessary part-speech characters were deleted using the part-speech that was divided by Mecab. However, even with this, it is still insufficient to extract only the characters that represent the sentence, and I would like to further reduce the characters and display them with more impact. So, we will introduce a stopword to delete specific characters and strings. The function to realize is as follows. 【reference】 ③ Review tendency of highly rated ramen shops in TF-IDF (natural language processing, TF-IDF, Mecab, wordcloud, morphological analysis, word-separation) / 14/142128 #% E3% 82% B9% E3% 83% 88% E3% 83% 83% E3% 83% 97% E3% 83% AF% E3% 83% BC% E3% 83% 89) ④ [slothlib --Revision 77: /CSharp/Version1/SlothLib/NLP/Filter/StopWord/word Japanese.txt](http://svn.sourceforge.jp/svnroot/slothlib/CSharp/Version1/SlothLib/NLP/ Filter / StopWord / word /) The character string you want to delete can be deleted by using the following exclude_stopword () function by putting the above Japanese.txt in Dir and adding it to that file.

#Stopword read function
stop_words = []
if args.stop_words:
    for line in open(args.stop_words, "r", encoding="utf-8"):
        stop_words.append(line.strip())
    print(stop_words)

#A function that converts a list to a string
def join_list_str(list):
    return ' '.join(list)

#Stopword exclusion function
def exclude_stopword(text):
    changed_text = [token for token in text.lower().split(" ") if token != "" if token not in stop_words]
    #If it is left as above, it will be in list format, so convert it to a space-separated character string
    changed_text = join_list_str(changed_text)
    return changed_text

Devise WordCloud

The function to generate WordCloud is as follows. The following code is based on Reference ⑤, Reference ③ and Reference ① below. 【reference】 ⑤ Create Wordcloud with masked image Simply put ・ First define the Japanese font -The argument sk is used to identify the file name -The argument imgpath is the file path of the mask image when using the mask function. ・ When using a mask Execute the first if statement or less ・ If you do not use the mask, execute else or less. Here, the argument of WordCloud is almost Defalt and can be redefined (explanation for each item is described as a bonus)

fpath="/home/muauan/.fonts/NotoSansCJKjp-Regular.otf"

def get_wordcrowd_color_mask(sk, text, imgpath ):
    plt.figure(figsize=(6,6), dpi=200)
    if imgpath != "":
        img_color = np.array(Image.open( imgpath ))
        image_colors = ImageColorGenerator(img_color)
        wc = WordCloud(width=400,
                   height=300,
                   font_path=fpath,
                   mask=img_color,
                   collocations=False, #Don't duplicate words
                  ).generate( text )
        plt.imshow(wc.recolor(color_func=image_colors), #Use the color of the original image
               interpolation="bilinear")
    else:
        #wc = WordCloud(font_path=fpath, regexp="[\w']+").generate( text )
        wc = WordCloud(font_path=fpath, width=400, height=200, margin=2,
                 ranks_only=None, prefer_horizontal=.9, mask=None, scale=1,
                 color_func=None, max_words=200, min_font_size=4,
                 stopwords=None, random_state=None, background_color='black',
                 max_font_size=None, font_step=1, mode="RGB",
                 relative_scaling='auto', regexp=r"\w[\w']+" , collocations=True,
                 colormap=None, normalize_plurals=True, contour_width=0,
                 contour_color='black', repeat=False,
                 include_numbers=False, min_word_length=0).generate(text)
        plt.imshow(wc)
    # show
    plt.axis("off")
    plt.tight_layout()
    plt.pause(1)
    plt.savefig('./output_images/{}-yosino_{}.png'.format(sk,text[0])) 
    plt.close()

Generation result

Generate with the following code using the above function. I decided to copy the text I want to generate in WordCloud to line. Three-step processing was carried out to see the effect of stop words and the like. Here, the last processing is described below, and the others are described in bonus 2, so the effect is clear when compared.

while True:
    line = input("> ")
    if not line:
        break
    splitted = " ".join([x.split("\t")[0] for x in t.parse(line).splitlines()[:-1] if x.split("\t")[1].split(",")[0] not in ["Particle", "Auxiliary verb", "adverb", "Adnominal adjective","conjunction","verb","symbol"]])
    splitted = exclude_stopword(splitted)
    print("2",splitted)
    get_wordcrowd_color_mask(4,splitted, '')
    get_wordcrowd_color_mask(5,splitted, './mask_images/alice_color.png')

4-yosino_名.png 5-yosino_名.png

Summary

・ WordCloud was able to display the outline of the text ・ I saw that the accuracy of the outline changes depending on the control by stop words and part of speech. ・ If you use a mask, you can see that it can be generated only in a certain area. ・ Jetson-nano can also be generated in a short time

・ I want to think about effective usage scenes and services using real-time output, etc.

Bonus (output example)

$ python3 wc_input_original.py -d /usr/lib/aarch64-linux-gnu/mecab/dic/mecab-ipadic-neologd -s japanese.txt

Japanese.txt


['over there', 'Per', 'there', 'Over there', 'after', 'hole', 'holeた', 'that', 'How many', 'When', 'Now', 'Disagreeable', 'various', 'home', 'Roughly', 'You', 'I', 'O', 'Gai', 'Draw', 'Shape', 'Wonder', 'Kayano', 'From', 'Gara', 'Came', 'Habit', 'here', 'here', 'thing', 'Every', 'Here', 'Messed up', 'this', 'thisら', 'Around', 'Various', 'Relief', 'Mr.', 'How', 'Try', 'Suka', 'One by one', 'Shin', 'all', 'All', 'so', 'There', 'there', 'Over there', 'Sleeve', 'It', 'Itぞれ', 'Itなり', 'たくMr.', 'Etc.', 'Every time', 'For', 'No good', 'Cha', 'Chaん', 'Ten', 'とOり', 'When', 'Where', 'Whereか', 'By the way', 'Which', 'Somewhere', 'Which', 'which one', 'Inside', 'Insideば', 'Without', 'What', 'Such', 'What', 'Whatか', 'To', 'of', 'Begin', 'Should be', 'Haruka', 'People', 'Peopleつ', 'Clothes', 'Yellowtail', 'Betsu', 'Strange', 'Pen', 'How', 'Other', 'Masa', 'Better', 'Decent', 'As it is', 'want to see', 'Three', 'みなMr.', 'Everyone', 'Originally', 'もof', 'gate', 'Guy', 'Yo', 'Outside', 'reason', 'I', 'Yes', 'Up', 'During ~', 'under', 'Character', 'Year', 'Month', 'Day', 'Time', 'Minutes', 'Seconds', 'week', 'fire', 'water', 'wood', 'Money', 'soil', 'Country', 'Tokyo', 'road', 'Fu', 'Prefecture', 'city', 'Ward', 'town', 'village', 'each', 'No.', 'One', 'what', 'Target', 'Every time', 'Sentence', 'Person', 'sex', 'body', 'Man', 'other', 'now', 'Department', 'Division', 'Person in charge', 'Outside', 'Kind', 'Tatsu', 'Qi', 'Room', 'mouth', 'Who', 'for', 'Kingdom', 'Meeting', 'neck', 'Man', 'woman', 'Another', 'Talk', 'I', 'Shop', 'shop', 'House', 'Place', 'etc', 'You see', 'When', 'View', 'Step', 'Abbreviation', 'Example', 'system', 'Theory', 'form', 'while', 'Ground', 'Member', 'line', 'point', 'book', 'Goods', 'Power', 'Law', 'Feeling', 'Written', 'Former', 'hand', 'number', 'he', 'hewoman', 'Child', 'Inside', 'easy', 'Joy', 'Angry', 'Sorrow', 'ring', 'Around', 'To', 'Border', 'me', 'guy', 'High', 'school', 'Woman', 'Shin', 'Ki', 'magazine', 'Re', 'line', 'Column', 'Thing', 'Shi', 'Stand', 'Collection', 'Mr', 'Place', 'History', 'vessel', 'Name', 'Emotion', 'Communicating', 'every', 'formula', 'Book', 'Times', 'Animal', 'Pieces', 'seat', 'bundle', 'age', 'Eye', 'Connoisseur', 'surface', 'Circle', 'ball', 'Sheet', 'Before', 'rear', 'left', 'right', 'Next', 'Ahead', 'spring', 'summer', 'autumn', 'winter', 'one', 'two', 'three', 'four', 'Five', 'Six', 'Seven', 'Eight', 'Nine', 'Ten', 'hundred', 'thousand', 'Ten thousand', 'Billion', 'Trillion', 'under記', 'Up記', 'Timewhile', 'nowTimes', 'BeforeTimes', 'Place合', 'oneつ', 'Year生', '自Minutes', 'ヶPlace', 'ヵPlace', 'カPlace', '箇Place', 'ヶMonth', 'ヵMonth', 'カMonth', '箇Month', 'NameBefore', 'For real', 'Certainly', 'Timepoint', '全Department', '関Person in charge', 'near', 'OneLaw', 'we', 'the difference', 'Many', 'Treatment', 'new', 'そofrear', 'middle', 'After all', 'Mr々', '以Before', '以rear', 'Or later', 'Less than', '以Up', '以under', 'how many', 'everyDay', '自body', 'Over there', 'whatMan', 'handStep', 'the same', 'Feelingじ']

input.


>On the 25th, Meijo University (Nagoya City) awarded the title of "Special Honorary Professor" to Akira Yoshino (72), a professor at the same university who won the Nobel Prize in Chemistry for the development of lithium-ion batteries and an honorary fellow of Asahi Kasei. Mr. Yoshino has been a professor at the Graduate School of Science and Engineering since 2017, and is in charge of lectures once a week. According to Meijo University, the special honorary professor is a title to honor faculty members who have won the Nobel Prize. It was founded in 2014 when Isamu Akasaki, a tenured professor, and Hiroshi Amano, a former professor, won the Nobel Prize in Physics for the development of blue light emitting diodes (LEDs).
    splitted = " ".join([x.split("\t")[0] for x in t.parse(line).splitlines()[:-1] if x.split("\t")[1].split(",")[0] not in [""]])
    print("0",splitted)
    get_wordcrowd_color_mask(0,splitted, '')
    get_wordcrowd_color_mask(1,splitted, './mask_images/alice_color.png') 

Output 0.


0 On the 25th, Meijo University (Nagoya City) awarded the title of "Special Honorary Professor" to Asahi Kasei Honorary Fellow Akira Yoshino (72), a professor at the same university who received the Nobel Prize in Chemistry for the development of lithium-ion batteries. .. Mr. Yoshino has been a professor at the Graduate School of Science and Engineering since 2017, and is in charge of lectures once a week. According to Meijo University, the special honorary professor is a title to honor the faculty members who received the Nobel Prize. It was founded in 2014 when Isamu Akasaki, a tenured professor, and Hiroshi Amano, a former professor, received the Nobel Prize in Physics for the development of blue light emitting diodes (LEDs).

0-yosino_名.png 1-yosino_名.png

    splitted = " ".join([x.split("\t")[0] for x in t.parse(line).splitlines()[:-1] if x.split("\t")[1].split(",")[0] not in ["Particle", "Auxiliary verb", "adverb", "Adnominal adjective","conjunction","verb","symbol"]])
    print("1",splitted)
    get_wordcrowd_color_mask(2,splitted, '')
    get_wordcrowd_color_mask(3,splitted, './mask_images/alice_color.png') 

2-yosino_名.png 3-yosino_名.png

Output 1.


1 Meijo University Nagoya City 25th Lithium Ion Battery Development Nobel Chemistry Award Professor of the same university Professor Asahi Kasei Honorary Fellow Akira Yoshino 72 Special Honorary Professor Akira Yoshino 2017 Graduate School of Science and Engineering Professor Weekly Lecture Meijo University Special Honorary Professor Nobel Award for faculty title 14 years Lifelong professor Isamu Akasaki Former professor Hiroshi Amano Blue light emitting diode LED development Nobel Physics Award Established

Output 2.


2 Meijo University Nagoya City 25th Lithium Ion Battery Development Nobel Chemistry Award Professor of the same university Professor Asahi Kasei Honorary Fellow Akira Yoshino 72 Special Honorary Professor Akira Yoshino 2017 Graduate School of Science and Engineering Professor 1st Lecture Meijo University Special Honorary Professor Nobel Award Title 14 years Lifelong Professor Isamu Akasaki Professor Hiroshi Amano Blue light emitting diode led development Nobel Physics Award Awarded founding

Bonus (parameter)

WordCloud argument list (reference; from the explanation in the code below) ・ Word_cloud / wordcloud / wordcloud.py

Parameters
font_path : string Font path to the font that will be used (OTF or TTF). Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don't have this font, you need to adjust this path.
width : int (default=400) Width of the canvas.
height : int (default=200) Height of the canvas.
prefer_horizontal : float (default=0.90) The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn't fit. (There is currently no built-in way to get only vertical words.)
mask : nd-array or None (default=None) If not None, gives a binary mask on where to draw words. If mask is not None, width and height will be ignored and the shape of mask will be used instead. All white (#FF or #FFFFFF) entries will be considerd "masked out" while other entries will be free to draw on. [This changed in the most recent version!]
contour_width: float (default=0) If mask is not None and contour_width > 0, draw the mask contour.
contour_color: color value (default="black") Mask contour color.
scale : float (default=1) Scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.
min_font_size : int (default=4) Smallest font size to use. Will stop when there is no more room in this size.
font_step : int (default=1) Step size for the font. font_step > 1 might speed up computation but give a worse fit.
max_words : number (default=200) The maximum number of words.
stopwords : set of strings or None The words that will be eliminated. If None, the build-in STOPWORDS list will be used. Ignored if using generate_from_frequencies.
background_color : color value (default="black") Background color for the word cloud image.
max_font_size : int or None (default=None) Maximum font size for the largest word. If None, height of the image is used.
mode : string (default="RGB") Transparent background will be generated when mode is "RGBA" and background_color is None.
relative_scaling : float (default='auto') Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good. If 'auto' it will be set to 0.5 unless repeat is true, in which case it will be set to 0. ..versionchanged: 2.0 Default is now 'auto'.
color_func : callable, default=None Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word. Overwrites "colormap". See colormap for specifying a matplotlib colormap instead. To create a word cloud with a single color, use color_func=lambda *args, **kwargs: "white". The single color can also be specified using RGB code. For example color_func=lambda *args, **kwargs: (255,0,0) sets color to red.
regexp : string or None (optional) Regular expression to split the input text into tokens in process_text. If None is specified, r"\w[\w']+" is used. Ignored if using generate_from_frequencies.
collocations : bool, default=True Whether to include collocations (bigrams) of two words. Ignored if using generate_from_frequencies. .. versionadded: 2.0
colormap : string or matplotlib colormap, default="viridis" Matplotlib colormap to randomly draw colors from for each word. Ignored if "color_func" is specified. .. versionadded: 2.0
normalize_plurals : bool, default=True Whether to remove trailing 's' from words. If True and a word appears with and without a trailing 's', the one with trailing 's' is removed and its counts are added to the version without trailing 's' -- unless the word ends with 'ss'. Ignored if using generate_from_frequencies.
repeat : bool, default=False Whether to repeat words and phrases until max_words or min_font_size is reached.
include_numbers : bool, default=False Whether to include numbers as phrases or not.
min_word_length : int, default=0 Minimum number of letters a word must have to be included.

Recommended Posts

[Introduction to WordCloud] It's easy to use even with Jetson-nano ♬
It's too easy to use an existing database with Django
[Introduction to WordCloud] Let's play with scraping ♬
[Introduction to mediapipe] It's really easy to move ♬
[Introduction to Python] Let's use foreach with Python
[Python] Easy introduction to machine learning with python (SVM)
Easy to use Flask
Easy to use SQLite3
Introduction to RDB with sqlalchemy Ⅰ
Easy to make with syntax
Easy to use E-Cell 4 Intermediate
[Introduction] How to use open3d
Easy introduction to home hack with Raspberry Pi and discord.py
Easy to use E-Cell 4 Beginner's edition
Python: How to use async with
How to use virtualenv with PowerShell
Introduction to RDB with sqlalchemy II
Easy to install pyspark with conda
[Introduction to Python] Let's use pandas
How to use FTP with Python
Easy to use E-Cell 4 Advanced Edition
Easy to use Jupyter notebook (Python3.5)
[Introduction to Python] Let's use pandas
Easy to draw graphs with matplotlib
[Introduction to Python] Let's use pandas
Use boto3 to mess with S3
[Python Tutorial] An Easy Introduction to Python
[AWS] [GCP] I tried to make cloud services easy to use with Python
Introduction to Python Image Inflating Image inflating with ImageDataGenerator
How to use ManyToManyField with Django's Admin
How to use OpenVPN with Ubuntu 18.04.3 LTS
How to use Cmder with PyCharm (Windows)
Easy introduction of speech recognition with Python
How to use Ass / Alembic with HtoA
[Python] Introduction to CNN with Pytorch MNIST
Easy way to use Wikipedia in Python
How to use Japanese with NLTK plot
How to use jupyter notebook with ABCI
[Introduction to Pytorch] I played with sinGAN ♬
Let's make jupyter lab easy to use
How to use CUT command (with sample)
Easy! Use gensim and word2vec with MAMP.
How to use SQLAlchemy / Connect with aiomysql
How to use JDBC driver with Redash
Introduction to Statistical Hypothesis Testing with stats models
Introduction to Artificial Intelligence with Python 1 "Genetic Algorithm-Theory-"
How to use GCP trace with open Telemetry
I tried to use lightGBM, xgboost with Boruta
Markov Chain Chatbot with Python + Janome (1) Introduction to Janome
Markov Chain Chatbot with Python + Janome (2) Introduction to Markov Chain
[Introduction to Udemy Python3 + Application] 23. How to use tuples
Introduction to Artificial Intelligence with Python 2 "Genetic Algorithm-Practice-"
Easy way to use Python 2.7 on Cent OS 6
It's time to stop generating SMILES with RDKit
[Introduction to StyleGAN2] Independent learning with 10 anime faces ♬
Introduction to Tornado (1): Python web framework started with Tornado
I want to mock datetime.datetime.now () even with pytest!
Specify the Python executable to use with virtualenv
Introduction to formation flight with Tello edu (Python)
[Introduction to minimize] Data analysis with SEIR model ♬
The easiest way to use OpenCV with python