CPEN 499 Undergraduate Thesis - GRPO for Vision Language Model Reasoning

CPEN 499 (Undergraduate Thesis) is a course I completed at UBC over two terms. The course is a research project that is supervised by a faculty member and culminates in a thesis. The goal of the course is to conduct research in a specific area of interest and write a thesis on the findings.

Through this course I conducted research with Professor Renjie Liao and the Deep structured learning lab at UBC on the topic of Vision Language Model Reasoning. Additionally I directly worked with PhD and Post-Doctorate Muchen Li and Tanzila Rahman. The research was focused on the use of reinforcement learning to improve the performance of vision language models. This is a rapidly growing area of research that has the potential to significantly improve the performance of machine learning models in a variety of applications, including computer vision, natural language processing, and robotics.

The specific focus of my research was to explore the use of GRPO reinforcement learning to improve the performance of vision language models. The problem was narrowed to the task of mathematical reasoning. This is a challenging task that requires the model to understand the relationships between different objects in an image and how they relate to the text.

I conducted a series of experiments to qualitatively and quantitatively evaluate the performance changes of a vision language model when using GRPO reinforcement learning. The results of these experiments were promising and showed that the use of GRPO reinforcement learning can improve the performance of vision language models on mathematical reasoning by upwards of 20% tasks although there are limitations on some problem types.

My thesis can be found here:

Download my report

The research paper is available on UBC’s Archive and can be found here: View on UBC cIRcle Archive

CPEN 499 Undergraduate Thesis - GRPO for Vision Language Model Reasoning

Tayyib Chohan

Navigation

CPEN 499 Undergraduate Thesis - GRPO for Vision Language Model Reasoning

You may also like

CPEN 491 Computer Engineering Capstone Design Project

CPEN 412 Microcomputer Systems Design

CPEN 442 Introduction to Cybersecurity

Tayyib Chohan

Navigation