Statistics Colloquium by Elijah Tamarchenko ’23

Wed, March 15th, 2023
1:10 pm
- 1:50 pm

Using Supervised Topic Models to Predict Movie Ratings Based on Labelled Text Reviews by Elijah Tamarchenko ’23, Statistics Colloquium, Wednesday, March 15, 1:10 – 1:50 pm, North Science Building 114, Wachenheim.

Abstract:  How do we use large corpora of text data to predict a corresponding response variable?  In this talk, we explore the supervised Latent Dirichlet Allocation (sLDA) model. It is based on learning latent topics in the text documents, and then using this proportions in a Generalized Linear Model (GLM) to predict the response.  We will discuss the theory behind this model, specifically focusing on variational inference and variational expectation maximization for estimation.  Finally, we will apply this model on a dataset of movie reviews and movie ratings in order to predict a movie rating based on the critic’s reviews.

