🏆 TestGenEval Leaderboard 🏆

TestGenEval is an important benchmark for measuring unit test generation and test completion capabilities!

📝 Notes

1. All samples are generated from scratch using our codebase, where the raw generations can also be found.
2. The pass@1 metric is reported with T=0.2, and the pass@5, coverage and mutation score metrics are reported witih T=0.8, with 5 samples each. Models are ranked by pass@1.
3. Prompts can be found here, and generations not following the format are considered incorrect.

🤗 Acknowlegement and More Leaderboards

We greatly thank the authors of the EvalPlus leaderboard for allowing us to borrow their leaderboard code! We also recommend the following leaderboards for measuring code LM ability on various coding tasks, such as EvalPlus Leaderboard, Chatbot Arena Leaderboard, BigCode Models Leaderboard, InfiCoder-Eval, and TabbyML Leaderboard.