Fair and consistent assessment of student learning is critical in educational settings, particularly when evaluating the impact of instructional innovations. Although widely used for efficiency, output-based auto-grading often falls short in capturing partial understanding—limiting its effectiveness for measuring learning gains. This paper presents an empirical evaluation of a rubric-based, question-focused, double-grading protocol for written-response (WR) coding questions in pre- and post-tests from a large introductory programming course. This work provides both methodological insights and practical guidance for scaling reliable grading of open-ended coding questions.

To balance efficiency and accuracy, each grader scored a specific question across all submissions, with two graders assigned per item. Adjudication was triggered when score differences exceeded a 20% threshold. Intraclass Correlation Coefficient (ICC) analysis identified two questions with low inter-rater reliability. After rubric clarification and regrading, reliability improved substantially, with ICC values ranging from 0.892 to 0.967 (all data) and 0.831 to 0.875 (excluding zero scores).

We describe the iterative development of the assessment process and show how this structured approach—combined with ICC analysis as a diagnostic tool and targeted adjudication—achieves strong inter-grader reliability. The framework is scalable and robust for WR coding question evaluation in CS1 settings and is adaptable to a range of instructional contexts. These findings support instructors and researchers seeking consistent, practical methods for assessing open-ended student work in programming courses.

Thu 19 Feb

Displayed time zone: Central Time (US & Canada) change

13:40 - 15:00
Improving Learning at Scale: Practice, Assessment, and Support in Large Computing CoursesPapers at Meeting Room 102
Chair(s): Preeti Raman Toronto Metropolitan University
13:40
20m
Talk
Developing Problem-Solving Competency in Data Science: Exploring A Case-Based Approach
Papers
Lujie Karen Chen University of Maryland, Baltimore County, Maryam M. Alomair University of Maryland - Baltimore County, Muhammad Ali Yousuf University of Maryland, Baltimore County, Shimei Pan UMBC
14:00
20m
Talk
Encouraging Learning Through Repetition: Effects of Multiple Practice Opportunities in a Large Intro Programming Course
Papers
Jordan Elise Tate pc, Supriya Naidu University of Colorado at Boulder
14:20
20m
Talk
Improving the Reliability of Grading Written-Response Coding Questions in a Large CS1 Course
Papers
Wei Jin Georgia Gwinnett College, Xin Xu Georgia Gwinnett College, Hyesung Park Georgia Gwinnett College, Evelyn Brannock Georgia Gwinnett College, Tacksoo Im Georgia Gwinnett College
14:40
20m
Talk
When Support Isn’t Enough: Understanding and Redesigning Student Support Systems in Large Computing Courses
Papers
Teresa Luo University of California, Berkeley, Chenkun Sheng University of California, Berkeley, Lisa Yan UC Berkeley