Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(issue summary): Score possible cause confidence and novelty #1788

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

kddubey
Copy link
Contributor

@kddubey kddubey commented Jan 23, 2025

Adds "scores" to the issue summary response.

Costs 0.3 sec in latency

How has this been tested?

  1. Changed unit test to use VCR

  2. Local run which generated a summarization (trace ID: 1ebe8c50-e9ea-4b26-85e5-9fa41a06f8eb) w/ output:

output
[
    {
        "trace": "",
        "scores": {
            "possible_cause_novelty": 0.2809091323364492,
            "possible_cause_confidence": 0.487353380050403
        },
        "group_id": 16,
        "headline": "Validation Error in GroupingRequest due to Empty Stacktrace",
        "whats_wrong": "**ValidationError** raised: stacktrace must be provided and cannot be empty. Data input failed validation.",
        "possible_cause": "**Empty stacktrace** field in the input data likely caused the validation failure."
    },
    {
        "title": "Validation Error in GroupingRequest due to Empty Stacktrace",
        "scores": {
            "possible_cause_novelty": 0.2809091323364492,
            "possible_cause_confidence": 0.487353380050403
        },
        "whats_wrong": "**ValidationError** raised: stacktrace must be provided and cannot be empty. Data input failed validation.",
        "possible_cause": "**Empty stacktrace** field in the input data likely caused the validation failure.",
        "session_related_issues": ""
    }
]

@kddubey kddubey requested a review from a team as a code owner January 23, 2025 04:54
@jennmueng
Copy link
Member

Looks like in the failing test test_summarize_issue_event_details, the llm call isn't being mocked and it's trying to make an actual openai call

@kddubey
Copy link
Contributor Author

kddubey commented Jan 23, 2025

I forgot to VCR the embeddings call in test_summarize_issue_event_details. Did that and CI passes now

@kddubey kddubey requested review from jennmueng and roaga January 23, 2025 18:30
Copy link
Member

@jennmueng jennmueng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q on latency, otherwise lgtm

Comment on lines +138 to +140
issue_summary_with_scores = IssueSummaryWithScores(
**issue_summary.model_dump(), scores=score_issue_summary(issue_summary)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may have mentioned this already and I missed it, but remind me how much latency does this add?

Copy link
Member

@roaga roaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine to me, just confused on the confidence score calculation. Also idk how we feel about the latency


# Round for some tolerance during equality comparison.
# TODO: is decryption and decompression of the VCR causing tiny changes?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the local test I get exact equality. in CI (specifically, this failed run), we get inequality at the 16th decimal place:

>       assert raw_result == expected_raw_result
E       AssertionError: assert IssueSummaryW...319887938298)) == IssueSummaryW...319887938298))
E         Full diff:
E         - IssueSummaryWithScores(title='Critical Issue: red-timothy-sandwich Failure', whats_wrong='**red-timothy-sandwich** encountered **exceptions** during execution, impacting functionality. Check **event logs** for details.', session_related_issues='Related issues: **cyan-vincent-banana** and **green-fred-tennis** may indicate broader session instability.', possible_cause='Potential **resource contention** or **dependency failure** affecting multiple components.', scores=SummarizeIssueScores(possible_cause_confidence=0.4749528387220516, possible_cause_novelty=0.6286319887938298))
E         ?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ^
E         + IssueSummaryWithScores(title='Critical Issue: red-timothy-sandwich Failure', whats_wrong='**red-timothy-sandwich** encountered **exceptions** during execution, impacting functionality. Check **event logs** for details.', session_related_issues='Related issues: **cyan-vincent-banana** and **green-fred-tennis** may indicate broader session instability.', possible_cause='Potential **resource contention** or **dependency failure** affecting multiple components.', scores=SummarizeIssueScores(possible_cause_confidence=0.47495283872205174, possible_cause_novelty=0.6286319887938298))
E         ?   

if the VCR weren't used / we'd actually hit the OpenAI API in the test, we'd see inequality at around the 4th decimal place (they add noise of around 5e-4 so embeddings are not deterministic). so this is probably a weird decompression or decryption error. left it as a TODO since it's not too important

src/seer/automation/summarize/issue.py Outdated Show resolved Hide resolved
src/seer/automation/summarize/issue.py Outdated Show resolved Hide resolved
Copy link
Member

@roaga roaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, idk if we want to discuss latency before merging, but it's probably fine

@kddubey
Copy link
Contributor Author

kddubey commented Jan 24, 2025

The last commit gets rid of a mostly useless embedding. Added latency is now 0.3 sec p50, 0.43 sec p90 (see "Measure additional latency" in the notebook).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants