Introduction

Kenshi Yonezu sells every time he composes. The lyrics that are spun out seem to have the power to fascinate people. This time, I decided to let deep learning learn its charm.

This article is "** Implementation **". See the previous article for the "pre-processing" code. The general flow of the implementation is as follows.

Hyperparameter / model / loss function / optimization method setting
Learning code
Test code

Model used

Framework: Pytorch Model: seq2seq with Attention Morphological analysis module: jonome Environment: Google Colaboratory

For the mechanism of seq2seq and Attention, see Previous article.

The schematic diagram of this model is as follows. Reference paper

Here, SOS is "_".

Implementation

After uploading the required self-made module to Google colab Copy and execute main.py described later.

** Required self-made module **

Please refer to github for the code of these self-made modules.

Problem setting (reposted last time)

As shown below, Kenshi Yonezu predicts the "next passage" from the "one passage" of the songs that have been released so far.

|Input text|Output text| |-------+-------| |I'm really happy to see you| _All of them are sad as a matter of course| |All of them are sad as a matter of course| _I have painfully happy memories now| |I have painfully happy memories now| _Raise and walk the farewell that will come someday| |Raise and walk the farewell that will come someday| _It ’s enough to take someone's place and live|

This was created by scraping from Lyrics Net.

Hyperparameter / model / loss function / optimization method setting

Here is a supplement about the previous content. Last time, my goal was to create "input text" and "output text", which were in Japanese, but in fact they are IDized (quantified) with yonedu_dataset.prepare () so that DL can be read. I will.

The following is specified for ** hyperparameters ** from the top.

--Number of nodes in the encoder embedded layer --Number of nodes in the middle layer in the encoder's LSTM layer --Batch size --Number of vocabularies of lyrics written by Mr. Yonezu so far (jonome is used for morphological analysis) --Word ID representing "blank"

The ** model ** is seq2seq, so it has two roles: encoder and decoder.

encoder: embedding layer + hidden layer with LSTM decoder with attention: embedding layer + hidden layer with LSTM + attention system + softmax layer

The ** loss function ** uses the cross-entropy error function, and the ** optimization method ** uses Adam for both the encoder and decoder.

Also, if there is a model parameter, it will be loaded.

from datasets import LyricDataset
import torch
import torch.optim as optim
from modules import *
from device import device
from utils import *
from dataloaders import SeqDataLoader
import math
import os
from utils

 ==========================================
# Data preparation
 ==========================================
# Kenshi Yonezu_lyrics.txt path
 file_path = "lyric / Kenshi Yonezu_lyrics.txt"
 edited_file_path = "lyric / Kenshi Yonezu_lyrics_edit.txt"

yonedu_dataset = LyricDataset(file_path, edited_file_path)
yonedu_dataset.prepare()
 check
print(yonedu_dataset[0])

# Divide into train and test at 8: 2
train_rate = 0.8
data_num = len(yonedu_dataset)
train_set = yonedu_dataset[:math.floor(data_num * train_rate)]
test_set = yonedu_dataset[math.floor(data_num * train_rate):]

# So far last time

 ================================================
# Hyperparameter setting / model / loss function / optimization method
 ================================================
# Hyperparameters
embedding_dim = 200
hidden_dim = 128
BATCH_NUM = 100
EPOCH_NUM = 100
 vocab_size = len (yonedu_dataset.word2id) # vocabulary number
 padding_idx = yonedu_dataset.word2id [" "] # Blank ID

# model
encoder = Encoder(vocab_size, embedding_dim, hidden_dim, padding_idx).to(device)
attn_decoder = AttentionDecoder(vocab_size, embedding_dim, hidden_dim, BATCH_NUM, padding_idx).to(device)

# Loss function
criterion = nn.CrossEntropyLoss()

# Optimization method
encoder_optimizer = optim.Adam(encoder.parameters(), lr=0.001)
attn_decoder_optimizer = optim.Adam(attn_decoder.parameters(), lr=0.001)

# Load parameters if you have a trained model
encoder_weights_path = "yonedsu_lyric_encoder.pth"
decoder_weights_path = "yonedsu_lyric_decoder.pth"
if os.path.exists(encoder_weights_path):
    encoder.load_state_dict(torch.load(encoder_weights_path))
if os.path.exists(decoder_weights_path):
    attn_decoder.load_state_dict(torch.load(decoder_weights_path))

Learning code

Next is the learning code. I think that seq2seq with Attention will look like this, but I will add only one point. Using ** my own data loader **, I get a mini-batch for 100 batch sizes for each epoch, backpropagate the total loss in that data, get the gradient, and update the parameters.

For ** self-made data loader **, refer to Mr. Yasuki Saito's Source code of deep learning 3 made from scratch. I am doing it.

 ================================================
# Learning
 ================================================
all_losses = []
print("training ...")
for epoch in range(1, EPOCH_NUM+1):
    epoch_loss = 0
 # Divide the data into mini-batch
    dataloader = SeqDataLoader(train_set, batch_size=BATCH_NUM, shuffle=False)

    for train_x, train_y in dataloader:

 # Gradient initialization
        encoder_optimizer.zero_grad()
        attn_decoder_optimizer.zero_grad()

 #Encoder forward propagation
        hs, h = encoder(train_x)

 # Attention Decoder Input
        source = train_y[:, :-1]

 Correct answer data of #Attention Decoder
        target = train_y[:, 1:]

        loss = 0
        decoder_output, _, attention_weight = attn_decoder(source, hs, h)
        for j in range(decoder_output.size()[1]):
            loss += criterion(decoder_output[:, j, :], target[:, j])

        epoch_loss += loss.item()

 #Error back propagation
        loss.backward()

 #Parameter update
        encoder_optimizer.step()
        attn_decoder_optimizer.step()

 # Show loss
    print("Epoch %d: %.2f" % (epoch, epoch_loss))
    all_losses.append(epoch_loss)
    if epoch_loss < 0.1: break
print("Done")

import matplotlib.pyplot as plt
plt.plot(all_losses)
plt.savefig("attn_loss.png ")

# Save model
torch.save(encoder.state_dict(), encoder_weights_path)
torch.save(attn_decoder.state_dict(), decoder_weights_path)

Test code

Here is the test code. What I'm doing is creating the table shown in ** [Results] **. There are two points to note.

--Don't get the gradient because it is for prediction at the test stage --First, input "_" to indicate the start of character string generation in Decoder (same conditions as in learning)

 =======================================
# test
 =======================================
# Word-> ID conversion dictionary
word2id = yonedu_dataset.word2id
# ID-> word conversion dictionary
id2word = get_id2word(word2id)

# Number of elements in one correct answer data
output_len = len(yonedu_dataset[0][1])

# Evaluation data
test_dataloader = SeqDataLoader(test_set, batch_size=BATCH_NUM, shuffle=False)

# Data frame to display the result
df = pd.DataFrame(None, columns=["input", "answer", "predict", "judge"])
# Turn the data loader to populate the data frame that displays the results
for test_x, test_y in test_dataloader:
    with torch.no_grad():
        hs, encoder_state = encoder(test_x)

 Since "_" indicating the start of character string generation is input to # Decoder first,
 # Create "_" tensors for batch size
        start_char_batch = [[word2id["_"]] for _ in range(BATCH_NUM)]
        decoder_input_tensor = torch.tensor(start_char_batch, device=device)

        decoder_hidden = encoder_state
        batch_tmp = torch.zeros(100,1, dtype=torch.long, device=device)
        for _ in range(output_len - 1):
            decoder_output, decoder_hidden, _ = attn_decoder(decoder_input_tensor, hs, decoder_hidden)
 # While getting the predicted character, it becomes the input of the next decoder as it is
            decoder_input_tensor = get_max_index(decoder_output.squeeze(), BATCH_NUM)
            batch_tmp = torch.cat([batch_tmp, decoder_input_tensor], dim=1)
 predicts = batch_tmp [:, 1:] # Receive predicted batches
        if test_dataloader.reverse:
 test_x = [list (line) [:: -1] for line in test_x] #Return the inverted one
        df = predict2df(test_x, test_y, predicts, df)
df.to_csv("predict_yonedsu_lyric.csv", index=False)

result

All questions are incorrect. However, the goal this time was "** Capturing the characteristics of Kenshi Yonezu's lyrics **". An excerpt from the table.

** input **: Input text ** output **: Correct output text ** predict **: DL predicted text ** judge **: Does output and predict match?

What I found

--The predictive sentence is not unclear (the grammar is accurate like "still") --The context from input is not too far off ――However, it is honestly delicate whether you can capture the characteristics of Mr. Yonezu's word selection.

Since overfitting was not seen this time, it is considered that the cause of the lack of learning is mainly the small number of data. No, we are the only ones who have decided that "lack of learning", and maybe there is something that AI thinks about ...

[PYTHON] I made AI think about the lyrics of Kenshi Yonezu (implementation)