LLMTutorBench: A Benchmark for University-level TCS AI Tutoring Systems (SIGCSE TS 2026 - Lightning Talks)

Who

Anant Gupta, Hieu Nguyen, Carine G Webber, Justin Stevens, Abrahim Ladha, Sanika Ainchwar, Vijay Ganesh

Track

SIGCSE TS 2026 Lightning Talks

Time Zone

The program is currently displayed in (GMT-06:00) Central Time (US & Canada).

Use conference time zone: (GMT-06:00) Central Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 19 Feb 2026 16:40 - 16:50 at Meeting Room 241-242 - Lightning Talks #1

Abstract

Large Language Models (LLM) are transforming Intelligent Tutoring Systems (ITS) via more natural explanations, multi-turn dialogue, and more adaptive support for students. Yet their effectiveness depends on rigorous benchmarking to ensure reliability, fairness, and pedagogical soundness. Such benchmarking relies on detailed student data, especially data that accurately reflect the actual distribution of wrong answers and misconceptions. A robust dataset of domain-specific wrong answers and misconceptions is critical for the ITS research community. Such a dataset enables training and testing of LLM-based ITS designed to correct misconceived student responses and guide students appropriately. Unfortunately, in advanced areas such as Theoretical Computer Science (TCS), such data are scarce, costly to collect, and limited by privacy concerns.

To address this problem, we propose a synthetic data generation technique grounded in real-world data. Our method works as follows: we curate a set of human-generated (question, answer, misconception) tuples to seed an LLM with the goal of generating a corpus of incorrect answers that resemble the kinds of mistakes students make while solving undergraduate-level math and algorithmic problems. We then prompt the LLM to generate a synthetic dataset with similar distribution of mistakes. Once such a technique has been validated on a math topic, we can easily transfer it over to others. Our goal is to lay the groundwork for scalable benchmarks that enable rigorous evaluation and broader adoption of LLM-based tutoring systems in the most conceptually demanding areas of computer science education, namely, theoretical computer science.

Anant Gupta

Georgia Institute of Technology

United States

Hieu Nguyen

Georgia Institute of Technology

United States

Carine G Webber

Georgia Institute of Technology

United States

Justin Stevens

Washington University in St. Louis

United States

Abrahim Ladha

Georgia Institute of Technology

United States

Sanika Ainchwar

Georgia Institute of Technology

United States

Vijay Ganesh

Georgia Institute of Technology

United States

Time Zone

The program is currently displayed in (GMT-06:00) Central Time (US & Canada).

Use conference time zone: (GMT-06:00) Central Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 19 Feb
Displayed time zone: Central Time (US & Canada) change

15:40 - 17:00	Lightning Talks #1Lightning Talks at Meeting Room 241-242

15:40 10m Talk		Beyond the Sprint: Teaching Project Management as a Human Skill Lightning Talks Kristen Walcott , Emily Hill TechJoy
15:50 10m Talk		Cohorts for Community: Structuring Undergraduate Staff Support Lightning Talks Kelly Ding Harvard University, David J. Malan Harvard University
16:00 10m Talk		Compiling Course Insights: A Dashboard for Holistic Views in CS EducationGlobal Lightning Talks Matt Chen Monash University
16:10 10m Talk		Enlightning Learning ExperiencesGlobal Lightning Talks Michel Zam University of Wisconsin–Milwaukee (UWM); Paris Dauphine University – PSL; KarmicSoft; , Jacek Urbanski Sodexo — Data Hub, Tara Bogart KarmicSoft
16:20 10m Talk		Evolving Decisions, Evolving Identities: Scaffolded Tabletop Exercises as a Course Innovation in Cybersecurity Lightning Talks Lily Pharris UT Martin
16:30 10m Talk		From Fear to Practice: Integrating Quantum Computing into CS Courses Lightning Talks Olivera Grujic pc
16:40 10m Talk		LLMTutorBench: A Benchmark for University-level TCS AI Tutoring Systems Lightning Talks Anant Gupta Georgia Institute of Technology, Hieu Nguyen Georgia Institute of Technology, Carine G Webber Georgia Institute of Technology, Justin Stevens Washington University in St. Louis, Abrahim Ladha Georgia Institute of Technology, Sanika Ainchwar Georgia Institute of Technology, Vijay Ganesh Georgia Institute of Technology
16:50 10m Talk		Proactive Listening to Student Voices: Automating Reddit Summaries for Education LeadersGlobal Lightning Talks Matt Chen Monash University