-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChatCompletionClient to support request caching #4752
Comments
Heres a basic idea I have based on what we had in
We can modify the As for the actual caching, since we use pydantic Models for the messages we can encode the incoming prompt info as a json & hash it for the cache key. WDYT @ekzhu / @jackgerrits ? |
For the abstract interface we can keep it super simple so existing libraries like |
On a related note, for cases where the user requires all responses to be pulled from the cache, such as for quick regression tests, it could be useful to have the cached client throw an error (rather than calling the model_client) for any prompt that is not found in the cache. This functionality could be enabled by passing None as the model_client parameter. I've implemented a client wrapper that provides this caching and checking (plus numeric result checking) for my own regression tests, but my client wrapper isn't a complete ChatCompletionClient replacement. |
@rickyloynd-microsoft Can you share a pointer/branch to your code, if possible?
Since the original client is passed during init (for other methods like model_info/etc), this can probably be implemented as a kwarg on the |
It will be in a PR soon. |
I see the api we have documented here using context [here](cached_client = ChatCompletionCache(OpenAIChatCompletionClient(model="gpt-4o"), store=DiskCacheStore())) async def main() -> None:
with tempfile.TemporaryDirectory() as tmpdirname:
openai_model_client = OpenAIChatCompletionClient(model="gpt-4o")
cache_store = DiskCacheStore[CHAT_CACHE_VALUE_TYPE](Cache(tmpdirname))
cache_client = ChatCompletionCache(openai_model_client, cache_store)
response = await cache_client.create([UserMessage(content="Hello, how are you?", source="user")])
print(response) # Should print response from OpenAI
response = await cache_client.create([UserMessage(content="Hello, how are you?", source="user")])
print(response) # Should print cached response Does the current impl support something like async def main() -> None:
cache_client = OpenAIChatCompletionClient(model="gpt-4o", cache=True)
response = await cache_client.create([UserMessage(content="Hello, how are you?", source="user")])
print(response) # Should print response from OpenAI
response = await cache_client.create([UserMessage(content="Hello, how are you?", source="user")])
print(response) # Should print cached response |
Nice simplification @victordibia ! |
@victordibia I suppose that can be done. What would be a good default cache? in memory? |
See my other reply: #5141 (comment) I believe the code snippet you showed breaks the intentional generality of the new cache client. We should avoid it being possible to cache multiple ways. How does this snippet look? cached_model_client = ChatCompletionCache(OpenAIChatCompletionClient(model="gpt-4o")) We just need to define a default cache (I agree in memory is good) and it's doable. IMO it makes sense and is a better design as it is entirely separated from a specific client and works for all at once. |
Thanks for the clarification. It makes sense to have something that is general and separate from the chatcompletionclient implementation itself. The API you have seems like a good compromise, assuming This is good as it can be passed directly to things like AssistantAgent with zero change too. |
|
Let's make |
Support client-side caching for any
ChatCompletionClient
type.Simplest way to do it is to create a
ChatCompletionCache
type that implements theChatCompletionClient
protocol but wraps an existing client.Example how this may work:
The text was updated successfully, but these errors were encountered: