response_token_limit does'nt seem to actually limit #650

kuatroka · 2025-01-09T14:42:38Z

When using UsageLimits(response_tokens_limit=100), I get an error that stops the rest of the code and the output length is not according to the specified limit. Thanks.

error

   raise UsageLimitExceeded(
pydantic_ai.exceptions.UsageLimitExceeded: Exceeded the response_tokens_limit of 100 (response_tokens=11210)

code

model4 = OpenAIModel(
    'mistralai/ministral-8b',
    base_url='https://openrouter.ai/api/v1',
    api_key=os.getenv("OPENROUTER_API_KEY"),
)
# agent = Agent(model)

# Define a very simple agent including the model to use, you can also set the model when running the agent.
agent4 = Agent(
    model=model4,
    # Register a static system prompt using a keyword argument to the agent.
    # For more complex dynamically-generated system prompts, see the example below.
    system_prompt='You are a parsing and data extracting AI assistant.',

)

# Run the agent synchronously, conducting a conversation with the LLM.
# Here the exchange should be very short: PydanticAI will send the system prompt and the user query to the LLM,
# the model will return a text response. See below for a more complex run.
result4 = await agent4.run(f"""
Parse and extract data from

    {txt2}

    into a list of JSON objects.

    - Analyse and understand the file in full.
    - Only after understanding full text, then extract the data as indicated in this schema:
    ##
    {schema}
    ##

    ### Instructions:
    - Don't add any additional text with explanations.
    - Output exclusively a list of JSON objects.
    - Process the entire file.
    - Don't add, apply display any calculation logic or calculations. Only extract existing data.
    - Start the final output with "[" and end with "]"
    - don't add the code highlighting to the output
""",
model_settings={'temperature': 0.3},                           
usage_limits=UsageLimits(response_tokens_limit=100)
)
# print("request_tokens: ", result4.usage.request_tokens)
# print("response_tokens: ", result4.usage.response_tokens)
# print("total_tokens: ", result4.usage.total_tokens)
print(result4._usage)
print(result4.data)

The text was updated successfully, but these errors were encountered:

sachq · 2025-01-11T00:28:22Z

The title of the issue is a bit unclear, but based on your description, I assume you're facing an exception that's preventing the rest of the code from running. To fix this, you should catch the UsageLimitExceeded exception using a try-except block, like this:

try:
    # Code where the agent runs with usage limits
except UsageLimitExceeded as e:
    print(e)  # This will print the exception details when the usage limit is exceeded

This approach will allow the UsageLimitExceeded exception to be handled gracefully, enabling you to log the error or manage it in another way, without interrupting the rest of the execution.

kuatroka · 2025-01-11T16:27:34Z

Thanks, but I actually want the code to indeed stop when the set limit is reached.
What happens now is first the limit is reached but the code still runs past the set token limit and only later it's caught by the LimitUsage mechanism and only the code after that statement errors out and does not run further

My actual goal to limit the output tokens, so if I want to test something , I don't want the entire output of maybe 16K tokens to be used and printed out, but only a thousand, for example. I thought the LimitUsage's was for that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

response_token_limit does'nt seem to actually limit #650

response_token_limit does'nt seem to actually limit #650

kuatroka commented Jan 9, 2025

sachq commented Jan 11, 2025

kuatroka commented Jan 11, 2025

response_token_limit does'nt seem to actually limit #650

response_token_limit does'nt seem to actually limit #650

Comments

kuatroka commented Jan 9, 2025

sachq commented Jan 11, 2025

kuatroka commented Jan 11, 2025