🏆 TestGenEval Leaderboard 🏆

TestGenEval is an important benchmark for measuring unit test generation and test completion capabilities!

Homepage Paper Code HF Dataset Sample Explorer

TestGenEval Full

TestGenEval First

TestGenEval Last

TestGenEval Extra

📝 Notes

1. All samples are generated from scratch using our codebase, where the raw generations can also be found.
2. The pass@1 metric is reported with T=0.2, and the pass@5, coverage and mutation score metrics are reported witih T=0.8, with 5 samples each. Models are ranked by pass@1.
3. Prompts can be found here, and generations not following the format are considered incorrect.

🤗 Acknowlegement and More Leaderboards

We greatly thank the authors of the EvalPlus leaderboard for allowing us to borrow their leaderboard code! We also recommend the following leaderboards for measuring code LM ability on various coding tasks, such as EvalPlus Leaderboard, Chatbot Arena Leaderboard, BigCode Models Leaderboard, InfiCoder-Eval, and TabbyML Leaderboard.