Manually grading free response questions remains a persistent challenge in education. While such questions offer valuable opportunities for student learning and critical thinking, their evaluation often requires substantial time and effort from instructors or teaching assistants. In addition to the grading workload, open-ended responses are susceptible to inconsistencies in scoring and may reflect unclear expectations, both of which can undermine the effectiveness and fairness of the assessment process.

To address these challenges, we employed an AI-based grading system integrated in PrairieLearn to automatically evaluate student submissions to free response questions using a predefined set of rubric items. This approach not only streamlines the grading process but also enables direct comparison between AI-generated rubric applications and human judgments, providing insight into alignment and potential discrepancies. These discrepancies provided valuable insight, allowing us to iteratively revise and clarify the rubric items. Our experiences with using the AI grading system across several computing courses suggest that even experienced educators face difficulties articulating rubrics that are both specific and interpretable. We furthermore argue that more attention should be given to the iterative development and evaluation of rubrics.