Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response_token_limit does'nt seem to actually limit #650

Open
kuatroka opened this issue Jan 9, 2025 · 2 comments
Open

response_token_limit does'nt seem to actually limit #650

kuatroka opened this issue Jan 9, 2025 · 2 comments

Comments

@kuatroka
Copy link

kuatroka commented Jan 9, 2025

When using UsageLimits(response_tokens_limit=100), I get an error that stops the rest of the code and the output length is not according to the specified limit. Thanks.

error

   raise UsageLimitExceeded(
pydantic_ai.exceptions.UsageLimitExceeded: Exceeded the response_tokens_limit of 100 (response_tokens=11210)

code

model4 = OpenAIModel(
    'mistralai/ministral-8b',
    base_url='https://openrouter.ai/api/v1',
    api_key=os.getenv("OPENROUTER_API_KEY"),
)
# agent = Agent(model)

# Define a very simple agent including the model to use, you can also set the model when running the agent.
agent4 = Agent(
    model=model4,
    # Register a static system prompt using a keyword argument to the agent.
    # For more complex dynamically-generated system prompts, see the example below.
    system_prompt='You are a parsing and data extracting AI assistant.',

)

# Run the agent synchronously, conducting a conversation with the LLM.
# Here the exchange should be very short: PydanticAI will send the system prompt and the user query to the LLM,
# the model will return a text response. See below for a more complex run.
result4 = await agent4.run(f"""
Parse and extract data from

    {txt2}

    into a list of JSON objects.

    - Analyse and understand the file in full.
    - Only after understanding full text, then extract the data as indicated in this schema:
    ##
    {schema}
    ##

    ### Instructions:
    - Don't add any additional text with explanations.
    - Output exclusively a list of JSON objects.
    - Process the entire file.
    - Don't add, apply display any calculation logic or calculations. Only extract existing data.
    - Start the final output with "[" and end with "]"
    - don't add the code highlighting to the output
""",
model_settings={'temperature': 0.3},                           
usage_limits=UsageLimits(response_tokens_limit=100)
)
# print("request_tokens: ", result4.usage.request_tokens)
# print("response_tokens: ", result4.usage.response_tokens)
# print("total_tokens: ", result4.usage.total_tokens)
print(result4._usage)
print(result4.data)
@sachq
Copy link

sachq commented Jan 11, 2025

The title of the issue is a bit unclear, but based on your description, I assume you're facing an exception that's preventing the rest of the code from running. To fix this, you should catch the UsageLimitExceeded exception using a try-except block, like this:

try:
    # Code where the agent runs with usage limits
except UsageLimitExceeded as e:
    print(e)  # This will print the exception details when the usage limit is exceeded

This approach will allow the UsageLimitExceeded exception to be handled gracefully, enabling you to log the error or manage it in another way, without interrupting the rest of the execution.

@kuatroka
Copy link
Author

Thanks, but I actually want the code to indeed stop when the set limit is reached.
What happens now is first the limit is reached but the code still runs past the set token limit and only later it's caught by the LimitUsage mechanism and only the code after that statement errors out and does not run further

My actual goal to limit the output tokens, so if I want to test something , I don't want the entire output of maybe 16K tokens to be used and printed out, but only a thousand, for example. I thought the LimitUsage's was for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants