feat(issue summary): Score possible cause confidence and novelty #1788

kddubey · 2025-01-23T04:54:32Z

Adds "scores" to the issue summary response.

Costs 0.3 sec in latency

How has this been tested?

Changed unit test to use VCR
Local run which generated a summarization (trace ID: 1ebe8c50-e9ea-4b26-85e5-9fa41a06f8eb) w/ output:

output

[
    {
        "trace": "",
        "scores": {
            "possible_cause_novelty": 0.2809091323364492,
            "possible_cause_confidence": 0.487353380050403
        },
        "group_id": 16,
        "headline": "Validation Error in GroupingRequest due to Empty Stacktrace",
        "whats_wrong": "**ValidationError** raised: stacktrace must be provided and cannot be empty. Data input failed validation.",
        "possible_cause": "**Empty stacktrace** field in the input data likely caused the validation failure."
    },
    {
        "title": "Validation Error in GroupingRequest due to Empty Stacktrace",
        "scores": {
            "possible_cause_novelty": 0.2809091323364492,
            "possible_cause_confidence": 0.487353380050403
        },
        "whats_wrong": "**ValidationError** raised: stacktrace must be provided and cannot be empty. Data input failed validation.",
        "possible_cause": "**Empty stacktrace** field in the input data likely caused the validation failure.",
        "session_related_issues": ""
    }
]

tests/automation/summarize/test_issue.py

jennmueng · 2025-01-23T16:16:13Z

Looks like in the failing test test_summarize_issue_event_details, the llm call isn't being mocked and it's trying to make an actual openai call

kddubey · 2025-01-23T18:30:12Z

I forgot to VCR the embeddings call in test_summarize_issue_event_details. Did that and CI passes now

jennmueng

q on latency, otherwise lgtm

jennmueng · 2025-01-23T18:54:14Z

src/seer/automation/summarize/issue.py

+    issue_summary_with_scores = IssueSummaryWithScores(
+        **issue_summary.model_dump(), scores=score_issue_summary(issue_summary)
+    )


you may have mentioned this already and I missed it, but remind me how much latency does this add?

roaga

Code looks fine to me, just confused on the confidence score calculation. Also idk how we feel about the latency

roaga · 2025-01-23T21:37:02Z

tests/automation/summarize/test_issue.py


+        # Round for some tolerance during equality comparison.
+        # TODO: is decryption and decompression of the VCR causing tiny changes?


what's this TODO?

in the local test I get exact equality. in CI (specifically, this failed run), we get inequality at the 16th decimal place:

> assert raw_result == expected_raw_result E AssertionError: assert IssueSummaryW...319887938298)) == IssueSummaryW...319887938298)) E Full diff: E - IssueSummaryWithScores(title='Critical Issue: red-timothy-sandwich Failure', whats_wrong='**red-timothy-sandwich** encountered **exceptions** during execution, impacting functionality. Check **event logs** for details.', session_related_issues='Related issues: **cyan-vincent-banana** and **green-fred-tennis** may indicate broader session instability.', possible_cause='Potential **resource contention** or **dependency failure** affecting multiple components.', scores=SummarizeIssueScores(possible_cause_confidence=0.4749528387220516, possible_cause_novelty=0.6286319887938298)) E ? ^ E + IssueSummaryWithScores(title='Critical Issue: red-timothy-sandwich Failure', whats_wrong='**red-timothy-sandwich** encountered **exceptions** during execution, impacting functionality. Check **event logs** for details.', session_related_issues='Related issues: **cyan-vincent-banana** and **green-fred-tennis** may indicate broader session instability.', possible_cause='Potential **resource contention** or **dependency failure** affecting multiple components.', scores=SummarizeIssueScores(possible_cause_confidence=0.47495283872205174, possible_cause_novelty=0.6286319887938298)) E ?

if the VCR weren't used / we'd actually hit the OpenAI API in the test, we'd see inequality at around the 4th decimal place (they add noise of around 5e-4 so embeddings are not deterministic). so this is probably a weird decompression or decryption error. left it as a TODO since it's not too important

src/seer/automation/summarize/issue.py

roaga

lgtm, idk if we want to discuss latency before merging, but it's probably fine

kddubey · 2025-01-24T00:12:16Z

The last commit gets rid of a mostly useless embedding. Added latency is now 0.3 sec p50, 0.43 sec p90 (see "Measure additional latency" in the notebook).

feat(issue summary): Score possible cause confidence and novelty

a3e7346

kddubey requested a review from a team as a code owner January 23, 2025 04:54

kddubey added 3 commits January 22, 2025 20:58

re-add Step just in case

2593b4a

fix mypy

b19817a

round to 10 places

1ab880c

jennmueng reviewed Jan 23, 2025

View reviewed changes

tests/automation/summarize/test_issue.py Show resolved Hide resolved

kddubey added 2 commits January 23, 2025 09:19

VCR test_summarize_issue_event_details

5eaa522

retry pray no seg fault?

217bfc2

kddubey requested review from jennmueng and roaga January 23, 2025 18:30

jennmueng approved these changes Jan 23, 2025

View reviewed changes

roaga reviewed Jan 23, 2025

View reviewed changes

roaga approved these changes Jan 23, 2025

View reviewed changes

re-use prefixed possible cause for novetly score

3a40f96

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(issue summary): Score possible cause confidence and novelty #1788

feat(issue summary): Score possible cause confidence and novelty #1788

kddubey commented Jan 23, 2025 •

edited

Loading

jennmueng commented Jan 23, 2025

kddubey commented Jan 23, 2025

jennmueng left a comment

jennmueng Jan 23, 2025

roaga left a comment

roaga Jan 23, 2025

kddubey Jan 23, 2025

roaga left a comment

kddubey commented Jan 24, 2025 •

edited

Loading


		# Round for some tolerance during equality comparison.
		# TODO: is decryption and decompression of the VCR causing tiny changes?

feat(issue summary): Score possible cause confidence and novelty #1788

Are you sure you want to change the base?

feat(issue summary): Score possible cause confidence and novelty #1788

Conversation

kddubey commented Jan 23, 2025 • edited Loading

How has this been tested?

jennmueng commented Jan 23, 2025

kddubey commented Jan 23, 2025

jennmueng left a comment

Choose a reason for hiding this comment

jennmueng Jan 23, 2025

Choose a reason for hiding this comment

roaga left a comment

Choose a reason for hiding this comment

roaga Jan 23, 2025

Choose a reason for hiding this comment

kddubey Jan 23, 2025

Choose a reason for hiding this comment

roaga left a comment

Choose a reason for hiding this comment

kddubey commented Jan 24, 2025 • edited Loading

kddubey commented Jan 23, 2025 •

edited

Loading

kddubey commented Jan 24, 2025 •

edited

Loading