The OpenCompass team is thrilled to announce the release of OpenCompass v0.4.0!
π Highlights
β¨ This version introduces several new features and improvements that enhance the user experience and expand the capabilities of OpenCompass. Notable changes include support for Longbenchv2, compatibility with InternLM3 models and also refactorization of Openai model class. We are also consolidating all AMOTIC configuration files (previously located in ./configs/datasets, ./configs/models, and ./configs/summarizers) into the opencompass package.
π New Features
-π Support for Longbenchv2 has been added to provide more comprehensive model evaluation. (#1801)
-π The Bradley-Terry Subjective Evaluation method has been extended to the Arena Hard dataset. (#1802)
-π Predicted Win Probabilities have been added to CompassArenaBradleyTerrySummarizer for better insights into model performance. (#1815)
-π Support for InternLM3 models is now available, expanding our model library. (#1829)
-π MMLU-CF Benchmark support has been introduced to further enrich our suite of benchmarks. (#1775)
π Documentation
-π Documentation on adding new datasets (new_dataset.md) has been updated for clarity. (#1827)
-π The installation guide (installation.md) has been revised to reflect the latest setup process. (#1830)
π Bug Fixes
-π§ A path conflict within the CI pipeline has been resolved. (#1814)
-π§ The logic for max_out_len in OpenAI models has been corrected. (#1839)
β Enhancements and Refactors
-πͺ Code refactoring has been performed to improve project structure and maintainability. (#1831)
-πͺ The threshold for CI checks has been updated for more stringent quality control. (#1812)
-πͺ LiveMathBench has been updated to ensure it remains a cutting-edge benchmarking tool. (#1809)
π Welcome New Contributors
@fistyee contributed support for the MMLU-CF Benchmark. (#1775)
@Myhs-phz improved documentation on adding new datasets. (#1827)
@thejishnunair updated the installation guide. (#1830)
Thank you for being part of the OpenCompass community! Your support and contributions make each release possible. We look forward to your feedback on this release.