sample generation
Given a question from the gsm8k dataset and a LLM (configured somehow, eg temp), generate n=10 answers to that question.
The answers must be the numerical answer of what the model thinks the answer is. Model should be interchangeable, so the gsm8k file.