Kenshi Yonezu sells every time he composes. The lyrics that are spun out seem to have the power to fascinate people. This time, I decided to let deep learning learn its charm.
This article is "** Implementation **". See the previous article for the "pre-processing" code. The general flow of the implementation is as follows.
Framework: Pytorch Model: seq2seq with Attention Morphological analysis module: jonome Environment: Google Colaboratory
For the mechanism of seq2seq and Attention, see Previous article.
The schematic diagram of this model is as follows. Reference paper
Here, SOS is "_".
After uploading the required self-made module to Google colab Copy and execute main.py described later.
** Required self-made module **
Please refer to github for the code of these self-made modules.
As shown below, Kenshi Yonezu predicts the "next passage" from the "one passage" of the songs that have been released so far.
|Input text|Output text| |-------+-------| |I'm really happy to see you| _All of them are sad as a matter of course| |All of them are sad as a matter of course| _I have painfully happy memories now| |I have painfully happy memories now| _Raise and walk the farewell that will come someday| |Raise and walk the farewell that will come someday| _It ’s enough to take someone's place and live|
This was created by scraping from Lyrics Net.
Here is a supplement about the previous content.
Last time, my goal was to create "input text" and "output text", which were in Japanese, but in fact they are IDized (quantified) with yonedu_dataset.prepare ()
so that DL can be read. I will.
The following is specified for ** hyperparameters ** from the top.
--Number of nodes in the encoder embedded layer --Number of nodes in the middle layer in the encoder's LSTM layer --Batch size --Number of vocabularies of lyrics written by Mr. Yonezu so far (jonome is used for morphological analysis) --Word ID representing "blank"
The ** model ** is seq2seq, so it has two roles: encoder and decoder.
encoder: embedding layer + hidden layer with LSTM decoder with attention: embedding layer + hidden layer with LSTM + attention system + softmax layer
The ** loss function ** uses the cross-entropy error function, and the ** optimization method ** uses Adam for both the encoder and decoder.
Also, if there is a model parameter, it will be loaded.
from datasets import LyricDataset
import torch
import torch.optim as optim
from modules import *
from device import device
from utils import *
from dataloaders import SeqDataLoader
import math
import os
from utils
==========================================
# Data preparation
==========================================
# Kenshi Yonezu_lyrics.txt path
file_path = "lyric / Kenshi Yonezu_lyrics.txt"
edited_file_path = "lyric / Kenshi Yonezu_lyrics_edit.txt"
yonedu_dataset = LyricDataset(file_path, edited_file_path)
yonedu_dataset.prepare()
check
print(yonedu_dataset[0])
# Divide into train and test at 8: 2
train_rate = 0.8
data_num = len(yonedu_dataset)
train_set = yonedu_dataset[:math.floor(data_num * train_rate)]
test_set = yonedu_dataset[math.floor(data_num * train_rate):]
# So far last time
================================================
# Hyperparameter setting / model / loss function / optimization method
================================================
# Hyperparameters
embedding_dim = 200
hidden_dim = 128
BATCH_NUM = 100
EPOCH_NUM = 100
vocab_size = len (yonedu_dataset.word2id) # vocabulary number
padding_idx = yonedu_dataset.word2id [" "] # Blank ID
# model
encoder = Encoder(vocab_size, embedding_dim, hidden_dim, padding_idx).to(device)
attn_decoder = AttentionDecoder(vocab_size, embedding_dim, hidden_dim, BATCH_NUM, padding_idx).to(device)
# Loss function
criterion = nn.CrossEntropyLoss()
# Optimization method
encoder_optimizer = optim.Adam(encoder.parameters(), lr=0.001)
attn_decoder_optimizer = optim.Adam(attn_decoder.parameters(), lr=0.001)
# Load parameters if you have a trained model
encoder_weights_path = "yonedsu_lyric_encoder.pth"
decoder_weights_path = "yonedsu_lyric_decoder.pth"
if os.path.exists(encoder_weights_path):
encoder.load_state_dict(torch.load(encoder_weights_path))
if os.path.exists(decoder_weights_path):
attn_decoder.load_state_dict(torch.load(decoder_weights_path))
Next is the learning code. I think that seq2seq with Attention will look like this, but I will add only one point. Using ** my own data loader **, I get a mini-batch for 100 batch sizes for each epoch, backpropagate the total loss in that data, get the gradient, and update the parameters.
For ** self-made data loader **, refer to Mr. Yasuki Saito's Source code of deep learning 3 made from scratch. I am doing it.
================================================
# Learning
================================================
all_losses = []
print("training ...")
for epoch in range(1, EPOCH_NUM+1):
epoch_loss = 0
# Divide the data into mini-batch
dataloader = SeqDataLoader(train_set, batch_size=BATCH_NUM, shuffle=False)
for train_x, train_y in dataloader:
# Gradient initialization
encoder_optimizer.zero_grad()
attn_decoder_optimizer.zero_grad()
#Encoder forward propagation
hs, h = encoder(train_x)
# Attention Decoder Input
source = train_y[:, :-1]
Correct answer data of #Attention Decoder
target = train_y[:, 1:]
loss = 0
decoder_output, _, attention_weight = attn_decoder(source, hs, h)
for j in range(decoder_output.size()[1]):
loss += criterion(decoder_output[:, j, :], target[:, j])
epoch_loss += loss.item()
#Error back propagation
loss.backward()
#Parameter update
encoder_optimizer.step()
attn_decoder_optimizer.step()
# Show loss
print("Epoch %d: %.2f" % (epoch, epoch_loss))
all_losses.append(epoch_loss)
if epoch_loss < 0.1: break
print("Done")
import matplotlib.pyplot as plt
plt.plot(all_losses)
plt.savefig("attn_loss.png ")
# Save model
torch.save(encoder.state_dict(), encoder_weights_path)
torch.save(attn_decoder.state_dict(), decoder_weights_path)
Here is the test code. What I'm doing is creating the table shown in ** [Results] **. There are two points to note.
--Don't get the gradient because it is for prediction at the test stage --First, input "_" to indicate the start of character string generation in Decoder (same conditions as in learning)
=======================================
# test
=======================================
# Word-> ID conversion dictionary
word2id = yonedu_dataset.word2id
# ID-> word conversion dictionary
id2word = get_id2word(word2id)
# Number of elements in one correct answer data
output_len = len(yonedu_dataset[0][1])
# Evaluation data
test_dataloader = SeqDataLoader(test_set, batch_size=BATCH_NUM, shuffle=False)
# Data frame to display the result
df = pd.DataFrame(None, columns=["input", "answer", "predict", "judge"])
# Turn the data loader to populate the data frame that displays the results
for test_x, test_y in test_dataloader:
with torch.no_grad():
hs, encoder_state = encoder(test_x)
Since "_" indicating the start of character string generation is input to # Decoder first,
# Create "_" tensors for batch size
start_char_batch = [[word2id["_"]] for _ in range(BATCH_NUM)]
decoder_input_tensor = torch.tensor(start_char_batch, device=device)
decoder_hidden = encoder_state
batch_tmp = torch.zeros(100,1, dtype=torch.long, device=device)
for _ in range(output_len - 1):
decoder_output, decoder_hidden, _ = attn_decoder(decoder_input_tensor, hs, decoder_hidden)
# While getting the predicted character, it becomes the input of the next decoder as it is
decoder_input_tensor = get_max_index(decoder_output.squeeze(), BATCH_NUM)
batch_tmp = torch.cat([batch_tmp, decoder_input_tensor], dim=1)
predicts = batch_tmp [:, 1:] # Receive predicted batches
if test_dataloader.reverse:
test_x = [list (line) [:: -1] for line in test_x] #Return the inverted one
df = predict2df(test_x, test_y, predicts, df)
df.to_csv("predict_yonedsu_lyric.csv", index=False)
All questions are incorrect. However, the goal this time was "** Capturing the characteristics of Kenshi Yonezu's lyrics **". An excerpt from the table.
** input **: Input text ** output **: Correct output text ** predict **: DL predicted text ** judge **: Does output and predict match?
input | output | predict | judge ---------+----------------+----------------+------------ I didn't care if it was a mistake or a correct answer|In the light mist that fell in a blink of an eye|I'm sad because I want to be loved, so maybe you're the only one|X I felt that everything had changed since that day|A deep spring corner that is blown away by the wind|The warm place is still beautiful|X Let's find out one by one|Like a kid getting up|Withered blue, even that color|X No matter what you are doing today|I will look for you|I was looking for a city that wouldn't change|X
--The predictive sentence is not unclear (the grammar is accurate like "still") --The context from input is not too far off ――However, it is honestly delicate whether you can capture the characteristics of Mr. Yonezu's word selection.
Since overfitting was not seen this time, it is considered that the cause of the lack of learning is mainly the small number of data. No, we are the only ones who have decided that "lack of learning", and maybe there is something that AI thinks about ...
Recommended Posts