-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing LitQAEvaluation
bugs: incorrect reward indices, not using LLM's native knowledge
#708
Conversation
"Extract the single letter answer from the following question and answer" | ||
"\n\n{qa_prompt}" | ||
"\n\n{qa_answer}" | ||
"\n\nQuestion: {qa_prompt}" | ||
"\n\nAnswer: {qa_answer}" | ||
"\n\nSingle Letter Answer:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest something like:
"Given the following question and a proposed answer to the question, return the single-letter choice that matches the proposed answer."
"\n\nQuestion: ..."
"\n\nProposed Answer: ..."
If I didn't know the context of this method, as a human, I'd find the original prompt unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a nice suggestion, but making this change breaks two of our test cases. Let's save this for another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorporated into #724
35b33db
to
65c97e3
Compare
This PR:
EVAL_PROMPT_TEMPLATE
discounted_returns
logic and incorrect reward indicesdiscounted_returns
and theTaskDataset
distractors
be aSequence
to avoid in-place edits, further robust-ification after Making sure we copy distractors #694Closes #693