Personalized Coding Problem Generation Using Open-source Small Language Models
LLMs and their use cases within computer science education have been the subject of much discussion. However, the reliance on cloud-based services when using proprietary models like GPT-4 has barriers, such as cost and data privacy compliance. This work shows an end-to-end local microservice system to generate programming problems with open-source small language models that can run on consumer devices.
In addition to a baseline that uses a single model directly, we evaluate two generation pipelines for generating problems. One is a ChatGPT-4 benchmark, and the second is a multi-agent refinement loop inspired by the CHASE paradigm. We use five models in a feedback loop that work towards increasing the depth of a problem until a target difficulty is reached or exceeded.
We generated 150 problems total across the three methods, which were blindly scored by a computer science educator for metrics such as clarity, difficulty, and overall quality. The results show that CHASE did have better topic adherence, but was 18x slower than the default generation. Chaining small models might not fix the deficiencies of a single model, but rather that the deficiencies are added together. However, the single-model end-to-end method from the open-source models was reasonably fast and outperformed GPT-4 on clarity metrics. This work successfully shows the feasibility of using local models for creating meaningful coding problems, but chained-pipeline approaches may need to either have a higher degree of system robustness for storing user preferences and problem settings or simply use larger models.
