Code Generation
When To Use Code Generation Eval Template
This Eval checks the correctness and readability of the code from a code generation process. The template variables are:
query: The query is the coding question being asked
code: The code is the code that was returned.
Code Generation Eval Template
We are continually iterating our templates, view the most up-to-date template on GitHub.
How To Run the Code Generation Eval
The above shows how to use the code readability template.
Benchmark Results
This benchmark was obtained using notebook below. It was run using an OpenAI Human Eval dataset as a ground truth dataset. Each example in the dataset was evaluating using the CODE_READABILITY_PROMPT_TEMPLATE
above, then the resulting labels were compared against the ground truth label in the benchmark dataset to generate the confusion matrices below.
GPT-4 Results
GPT-3.5 Results
GPT-4 Turbo
Precision
1.0
0.93
0.79
0.78
0.77
Recall
0.71
0.78
0.81
0.93
0.94
F1
0.83
0.85
0.80
0.85
0.85
Last updated