-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1][Core][1/n] Logging and Metrics #11962
[V1][Core][1/n] Logging and Metrics #11962
Conversation
Signed-off-by: [email protected] <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
|
||
# 3) Put the RequestOutputs into the per-request queues. | ||
self._process_request_outputs(request_outputs) | ||
|
||
# 4) Abort any requests that finished due to stop strings. | ||
await self.engine_core.abort_requests_async(reqs_to_abort) | ||
|
||
# 5) Log any stats. | ||
await self._log_stats(scheduler_stats=outputs.scheduler_stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to improve the metric system later to remove it from the critical path? Or its overhead is acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not in the EngineCore process, so it is overlapped with GPU execution (and therefore is not in the critical path). If this becomes a bottleneck for latency we can offload to a 3rd process but I don’t except this to be needed.
vllm/v1/engine/core.py
Outdated
# Break out the loops so we can log_stats via step(). | ||
if self.log_stats: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move after the logger.debug?
vllm/v1/engine/llm_engine.py
Outdated
@@ -42,6 +42,7 @@ def __init__( | |||
use_cached_outputs: bool = False, | |||
multiprocess_mode: bool = False, | |||
) -> None: | |||
assert log_stats is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add note?
vllm/v1/engine/async_llm.py
Outdated
self.stat_loggers = stat_loggers | ||
self.stat_loggers: List[StatLoggerBase] = [ | ||
LoggingStatLogger(), | ||
# PrometheusStatLogger(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add TODO?
SUMMARY
EngineCore
and doing full loops over Request data.SchedulerStats
,EngineCore
will create in eachstep()
and sendEngineCoreOutput
.RequestStats
,EngineCore
marks the state of each request inupdate_from_output
and adds the metadata toEngineCoreOutput
while it is being created (no additional loop).AsyncLLM
then handles computing theStats
and logging.