Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added DeepSeek R1 + DeepSeek V3 benchmark #2998

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

serialx
Copy link

@serialx serialx commented Jan 26, 2025

I'd like to share the result of DeepSeek R1 architect + DeepSeek V3 editor benchmark results: It's 59.1%. Near the performance of o1 but at the fractional cost of $6.33! Half of R1+Sonnet.

- dirname: 2025-01-25-13-53-23--deepseek-r1-v3
  test_cases: 225
  model: deepseek/deepseek-reasoner
  edit_format: architect
  commit_hash: b276d48-dirty
  editor_model: deepseek/deepseek-chat
  editor_edit_format: editor-diff
  pass_rate_1: 30.7
  pass_rate_2: 59.1
  pass_num_1: 69
  pass_num_2: 133
  percent_cases_well_formed: 100.0
  error_outputs: 13
  num_malformed_responses: 0
  num_with_malformed_responses: 0
  user_asks: 388
  lazy_comments: 1
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  test_timeouts: 4
  total_tests: 225
  command: aider --model deepseek/deepseek-reasoner
  date: 2025-01-25
  versions: 0.72.3.dev
  seconds_per_case: 949.4
  total_cost: 6.3330

Note: The dirty commit simply occurred because I've added DEEPSEEK_API_KEY env variable to the docker launch parameter. Otherwise it's a genuine clean copy of the code.

@CLAassistant
Copy link

CLAassistant commented Jan 26, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants