[V1][Core][1/n] Logging and Metrics #11962

robertgshaw2-neuralmagic · 2025-01-11T22:08:01Z

SUMMARY

Add logging to VLLM V1. For V1, we want to avoid doing work in EngineCore and doing full loops over Request data.
Architecture will work as follows:
- For SchedulerStats, EngineCore will create in each step() and send EngineCoreOutput.
- For RequestStats, EngineCore marks the state of each request in update_from_output and adds the metadata to EngineCoreOutput while it is being created (no additional loop).
- AsyncLLM then handles computing the Stats and logging.

Signed-off-by: [email protected] <[email protected]>

github-actions · 2025-01-11T22:08:13Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

comaniac

Overall LGTM

comaniac · 2025-01-11T23:53:17Z

vllm/v1/engine/async_llm.py


                # 3) Put the RequestOutputs into the per-request queues.
                self._process_request_outputs(request_outputs)

                # 4) Abort any requests that finished due to stop strings.
                await self.engine_core.abort_requests_async(reqs_to_abort)

+                # 5) Log any stats.
+                await self._log_stats(scheduler_stats=outputs.scheduler_stats)


Are we going to improve the metric system later to remove it from the critical path? Or its overhead is acceptable?

This is not in the EngineCore process, so it is overlapped with GPU execution (and therefore is not in the critical path). If this becomes a bottleneck for latency we can offload to a 3rd process but I don’t except this to be needed.

comaniac · 2025-01-11T23:54:23Z

vllm/v1/engine/core.py

+                        # Break out the loops so we can log_stats via step().
+                        if self.log_stats:
+                            break


Move after the logger.debug?

comaniac · 2025-01-11T23:55:20Z

vllm/v1/engine/llm_engine.py

@@ -42,6 +42,7 @@ def __init__(
        use_cached_outputs: bool = False,
        multiprocess_mode: bool = False,
    ) -> None:
+        assert log_stats is False


comaniac · 2025-01-11T23:56:35Z

vllm/v1/engine/async_llm.py

-        self.stat_loggers = stat_loggers
+        self.stat_loggers: List[StatLoggerBase] = [
+            LoggingStatLogger(),
+            # PrometheusStatLogger(),


added code

cfa8c2b

Signed-off-by: [email protected] <[email protected]>

robertgshaw2-neuralmagic requested review from WoosukKwon, njhill, ywang96, comaniac and alexm-neuralmagic as code owners January 11, 2025 22:08

robertgshaw2-neuralmagic changed the title ~~added code~~ [V1][Core][1/n] Logging and Metrics Jan 11, 2025

robertgshaw2-neuralmagic added 6 commits January 11, 2025 22:09

fixed

6d8e4f3

fixed

c78a56f

updated

7b39705

updated

6e9cd1c

fixed

2657b7f

updated

249b9ff

robertgshaw2-neuralmagic added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 11, 2025

comaniac reviewed Jan 11, 2025

View reviewed changes

comaniac approved these changes Jan 12, 2025

View reviewed changes

robertgshaw2-neuralmagic added 2 commits January 12, 2025 14:12

updated

c641866

updated

1ce7a5f

robertgshaw2-neuralmagic enabled auto-merge (squash) January 12, 2025 14:15

working again

4066fc8

robertgshaw2-neuralmagic mentioned this pull request Jan 12, 2025

[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction #11973

Open

robertgshaw2-neuralmagic merged commit 9597a09 into vllm-project:main Jan 12, 2025
50 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1][Core][1/n] Logging and Metrics #11962

[V1][Core][1/n] Logging and Metrics #11962

robertgshaw2-neuralmagic commented Jan 11, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 11, 2025

comaniac left a comment

comaniac Jan 11, 2025

robertgshaw2-neuralmagic Jan 12, 2025

comaniac Jan 11, 2025

comaniac Jan 11, 2025

comaniac Jan 11, 2025

[V1][Core][1/n] Logging and Metrics #11962

[V1][Core][1/n] Logging and Metrics #11962

Conversation

robertgshaw2-neuralmagic commented Jan 11, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 11, 2025

comaniac left a comment

Choose a reason for hiding this comment

comaniac Jan 11, 2025

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jan 12, 2025

Choose a reason for hiding this comment

comaniac Jan 11, 2025

Choose a reason for hiding this comment

comaniac Jan 11, 2025

Choose a reason for hiding this comment

comaniac Jan 11, 2025

Choose a reason for hiding this comment

robertgshaw2-neuralmagic commented Jan 11, 2025 •

edited by github-actions bot

Loading