Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing LitQAEvaluation bugs: incorrect reward indices, not using LLM's native knowledge #708

Merged
merged 11 commits into from
Nov 19, 2024

Conversation

jamesbraza
Copy link
Collaborator

This PR:

  • Fixes both issues identified in Evaluation bug when answer is not in option list #693
    1. Using LLM's innate knowledge, which this PR fixes by updating EVAL_PROMPT_TEMPLATE
    2. Incorrectly handling options beyond the specified options, which this PR fixes by declaring those as "incorrect"
  • Fixes bad discounted_returns logic and incorrect reward indices
    • Previously, incorrect answers were given 0.1 reward, and unsure answers were given -1.0 reward
    • Adds test coverage of this, both upon discounted_returns and the TaskDataset
  • Makes the input distractors be a Sequence to avoid in-place edits, further robust-ification after Making sure we copy distractors #694

Closes #693

@jamesbraza jamesbraza added the bug Something isn't working label Nov 19, 2024
@jamesbraza jamesbraza self-assigned this Nov 19, 2024
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 19, 2024
paperqa/litqa.py Outdated Show resolved Hide resolved
Comment on lines 101 to 104
"Extract the single letter answer from the following question and answer"
"\n\n{qa_prompt}"
"\n\n{qa_answer}"
"\n\nQuestion: {qa_prompt}"
"\n\nAnswer: {qa_answer}"
"\n\nSingle Letter Answer:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest something like:

"Given the following question and a proposed answer to the question, return the single-letter choice that matches the proposed answer."
"\n\nQuestion: ..."
"\n\nProposed Answer: ..."

If I didn't know the context of this method, as a human, I'd find the original prompt unclear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a nice suggestion, but making this change breaks two of our test cases. Let's save this for another PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated into #724

@jamesbraza jamesbraza force-pushed the fixing-answer-incorrect branch from 35b33db to 65c97e3 Compare November 19, 2024 23:10
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 19, 2024
@jamesbraza jamesbraza merged commit 11f2727 into main Nov 19, 2024
5 checks passed
@jamesbraza jamesbraza deleted the fixing-answer-incorrect branch November 19, 2024 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluation bug when answer is not in option list
3 participants