SPHRED for Dialog modeling | Hieu Tran-Chi Nguyen

Motivation

This is a research project for my graduation thesis at International University, Vietnam National University - HCMC. More information can be found here.

In this project, I sought to expand my technical foundations beyond the standardcurriculum by studying and reconstructing SPHRED, a combination of Conditional Variational Autoencoder and Hierarchical Recurrent Encoder Decoder for controllable dialog generation. Its architecture is shown below:

SPHRED architecture includes a Hierarchical Recurrent Encoder-Decoder (HRED, comprises of an encoder RNN, a context RNN, and a decoder RNN) for dialog generation and a conditional variational autoencoder for controllability.

Investigating VAE distributions changes in training

While replicating the original results, I employed PrincipalComponent Analysis (PCA) to visualize the approximate posterior and latent prior distributions during training to gain adeeper understanding of variational inference. This revealed a very intriguing pattern: as the model began generating repetitive tokens (text degeneration), the two distributions diverged.

The prior and the approximate posterior distribution diverged (left), which happens as the model generates repeated tokens (right).

However, before text degeneration, the model were able to generate sound predictions, and the two distributions were able to come to an overlap. This agrees with the objective of minimizing the KL divergence between the prior distribution and the approximate posterior distribution.

The prior and the approximate posterior distribution ovelapped (left), which happens as the model generates sound sequences (right).

Presentation

Watch my presentation and the demo on this project here:

A presentation on the model architecture, optimization, and results (left) and a demo on the model's dialog generation (right).