ChatCompletionClient to support request caching #4752

ekzhu · 2024-12-18T07:46:29Z

Support client-side caching for any ChatCompletionClient type.

Simplest way to do it is to create a ChatCompletionCache type that implements the ChatCompletionClient protocol but wraps an existing client.

Example how this may work:

from autogen_ext.stores.diskcache import DiskCacheStore
from autogen_ext.models.cache import ChatCompletionCache
from autogen_ext.models.openai import OpenAIChatCompletionClient

# Cached client.
cached_client = ChatCompletionCache(OpenAIChatCompletionClient(model="gpt-4o"), store=DiskCacheStore())

The text was updated successfully, but these errors were encountered:

srjoglekar246 · 2025-01-05T21:51:33Z

Heres a basic idea I have based on what we had in 0.2:

We add AbstractStoreBase as the primary interface in autogen_core for cache stores.
We implement InMemoryStore etc in autogen_ext along with a general Store factory.
This will allow the implementation of a Cached Client interface similar to what Eric mentioned above:

    from autogen_ext.store.in_memory_store import InMemoryStore
    model_client = OpenAIChatCompletionClient(
        model="gpt-4o",
        api_key="...",
    )
    model_client = ChatCompletionCache(model_client, InMemoryStore())
    
    print(f"Model info: {model_client.model_info}")
    print("\n")
    
    prompt_messages = [
        SystemMessage(content=SYSTEM_PROMPT),
        UserMessage(content=USER_PROMPT, source="user"),
    ]
    
    num_prompt_tokens = model_client.count_tokens(prompt_messages)
    print(f"Prompt tokens: {num_prompt_tokens}")
    
    result = await model_client.create(messages=prompt_messages)
    print(f"create output: {result.content}")

We can modify the Result instances from this class to have cached=True etc.

As for the actual caching, since we use pydantic Models for the messages we can encode the incoming prompt info as a json & hash it for the cache key.

WDYT @ekzhu / @jackgerrits ?

ekzhu · 2025-01-06T09:07:06Z

For the abstract interface we can keep it super simple so existing libraries like diskcache and redis can just duck-type it. e.g., an interface with just set and get. So, there will be no need to create another extension module just for this, and user can just import redis and use it directly for in-memory cache, and additional in-memory store implementation is not needed.

rickyloynd-microsoft · 2025-01-06T17:59:31Z

On a related note, for cases where the user requires all responses to be pulled from the cache, such as for quick regression tests, it could be useful to have the cached client throw an error (rather than calling the model_client) for any prompt that is not found in the cache. This functionality could be enabled by passing None as the model_client parameter. I've implemented a client wrapper that provides this caching and checking (plus numeric result checking) for my own regression tests, but my client wrapper isn't a complete ChatCompletionClient replacement.

srjoglekar246 · 2025-01-06T19:24:51Z

@rickyloynd-microsoft Can you share a pointer/branch to your code, if possible?

useful to have the cached client throw an error (rather than calling the model_client) for any prompt that is not found in the cache

Since the original client is passed during init (for other methods like model_info/etc), this can probably be implemented as a kwarg on the create method maybe?

rickyloynd-microsoft · 2025-01-07T17:55:05Z

It will be in a PR soon.

victordibia · 2025-01-23T02:30:25Z

I see the api we have documented here using context [here](cached_client = ChatCompletionCache(OpenAIChatCompletionClient(model="gpt-4o"), store=DiskCacheStore()))

async def main() -> None:
    with tempfile.TemporaryDirectory() as tmpdirname: 
        openai_model_client = OpenAIChatCompletionClient(model="gpt-4o")  
        cache_store = DiskCacheStore[CHAT_CACHE_VALUE_TYPE](Cache(tmpdirname))
        cache_client = ChatCompletionCache(openai_model_client, cache_store) 
        response = await cache_client.create([UserMessage(content="Hello, how are you?", source="user")])
        print(response)  # Should print response from OpenAI
        response = await cache_client.create([UserMessage(content="Hello, how are you?", source="user")])
        print(response)  # Should print cached response

Does the current impl support something like

async def main() -> None: 
        cache_client = OpenAIChatCompletionClient(model="gpt-4o", cache=True) 
        response = await cache_client.create([UserMessage(content="Hello, how are you?", source="user")])
        print(response)  # Should print response from OpenAI
        response = await cache_client.create([UserMessage(content="Hello, how are you?", source="user")])
        print(response)  # Should print cached response

@ekzhu @gagb

gagb · 2025-01-23T02:44:35Z

Nice simplification @victordibia !

srjoglekar246 · 2025-01-23T22:20:37Z

@victordibia I suppose that can be done. What would be a good default cache? in memory?

jackgerrits · 2025-01-23T22:37:42Z

See my other reply: #5141 (comment)

I believe the code snippet you showed breaks the intentional generality of the new cache client. We should avoid it being possible to cache multiple ways.

How does this snippet look?

cached_model_client = ChatCompletionCache(OpenAIChatCompletionClient(model="gpt-4o"))

We just need to define a default cache (I agree in memory is good) and it's doable. IMO it makes sense and is a better design as it is entirely separated from a specific client and works for all at once.

victordibia · 2025-01-23T22:50:09Z

@jackgerrits .

Thanks for the clarification. It makes sense to have something that is general and separate from the chatcompletionclient implementation itself.

The API you have seems like a good compromise, assuming cached_model_client will be a drop-in replacement for model_client. In this case CachedChatCompletionClient (a potential name) will be component that wraps ChatCompletionClient and provides the same behaviour .. just with caching args (in memory default or dir ) ?

This is good as it can be passed directly to things like AssistantAgent with zero change too.

srjoglekar246 · 2025-01-23T22:52:32Z

ChatCompletionCache should indeed behave just like any other client. Do we need a separate class to just do ChatCompletionCache(OpenAIChatCompletionClient(model="gpt-4o"))? (I agree about the need for a default in-memory store, I can add that)

ekzhu · 2025-01-24T01:26:30Z

ChatCompletionCache should indeed behave just like any other client.

Let's make ChatCompletionCache having a default in-memory cache. It will be very simple.

srjoglekar246 · 2025-01-24T19:53:39Z

#5188

ekzhu added proj-core proj-extensions labels Dec 18, 2024

ekzhu added this to the 0.4.1 milestone Dec 18, 2024

github-actions bot added the needs-triage label Dec 18, 2024

ekzhu removed the needs-triage label Dec 18, 2024

ekzhu assigned ekzhu and srjoglekar246 and unassigned ekzhu Jan 4, 2025

srjoglekar246 mentioned this issue Jan 7, 2025

Add ChatCompletionCache along with AbstractStore for caching completions #4924

Merged

2 tasks

jackgerrits modified the milestones: 0.4.1, 0.4.x Jan 13, 2025

ekzhu modified the milestones: 0.4.x, 0.4.1, 0.4.2 Jan 13, 2025

ekzhu closed this as completed in #4924 Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatCompletionClient to support request caching #4752

ChatCompletionClient to support request caching #4752

ekzhu commented Dec 18, 2024 •

edited

Loading

srjoglekar246 commented Jan 5, 2025

ekzhu commented Jan 6, 2025 •

edited

Loading

rickyloynd-microsoft commented Jan 6, 2025

srjoglekar246 commented Jan 6, 2025 •

edited

Loading

rickyloynd-microsoft commented Jan 7, 2025

victordibia commented Jan 23, 2025

gagb commented Jan 23, 2025

srjoglekar246 commented Jan 23, 2025

jackgerrits commented Jan 23, 2025

victordibia commented Jan 23, 2025 •

edited

Loading

srjoglekar246 commented Jan 23, 2025

ekzhu commented Jan 24, 2025 •

edited

Loading

srjoglekar246 commented Jan 24, 2025

ChatCompletionClient to support request caching #4752

ChatCompletionClient to support request caching #4752

Comments

ekzhu commented Dec 18, 2024 • edited Loading

srjoglekar246 commented Jan 5, 2025

ekzhu commented Jan 6, 2025 • edited Loading

rickyloynd-microsoft commented Jan 6, 2025

srjoglekar246 commented Jan 6, 2025 • edited Loading

rickyloynd-microsoft commented Jan 7, 2025

victordibia commented Jan 23, 2025

gagb commented Jan 23, 2025

srjoglekar246 commented Jan 23, 2025

jackgerrits commented Jan 23, 2025

victordibia commented Jan 23, 2025 • edited Loading

srjoglekar246 commented Jan 23, 2025

ekzhu commented Jan 24, 2025 • edited Loading

srjoglekar246 commented Jan 24, 2025

ekzhu commented Dec 18, 2024 •

edited

Loading

ekzhu commented Jan 6, 2025 •

edited

Loading

srjoglekar246 commented Jan 6, 2025 •

edited

Loading

victordibia commented Jan 23, 2025 •

edited

Loading

ekzhu commented Jan 24, 2025 •

edited

Loading