Large Language Models Benchmarks

IEEE Spectrum on MSN

Why are large language models so terrible at video games?

AI models code simple games, but struggle to play them ...

Morning Overview on MSN

Google’s TurboQuant claims 6x lower memory use for large AI models

Google researchers have proposed TurboQuant, a method for compressing the key-value caches that large language models rely on ...

ZDNet

With AI models clobbering every benchmark, it's time for human evaluation

Artificial intelligence has traditionally advanced through automatic accuracy tests in tasks meant to approximate human knowledge. Carefully crafted benchmark tests such as The General Language ...

Geeky Gadgets

AI Benchmarks Are Broken : The Leaderboard Illusion

What if the tools we trust to measure progress are actually holding us back? In the rapidly evolving world of large language models (LLMs), AI benchmarks and leaderboards have become the gold standard ...

TurboQuant: Google aims to curb the memory hunger of large LLMs

Google's TurboQuant reduces the KV cache of large language models to 3 bits. Accuracy is said to remain, speed to multiply.

VentureBeat

Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data

Every AI model release inevitably includes charts touting how it outperformed its competitors in this benchmark test or that evaluation matrix. However, these benchmarks often test for general ...

Live Science

AI benchmarking platform is helping top companies rig their model performances, study claims

LMArena, a popular benchmark for large language models, has been accused of giving preferential treatment to AIs made by big tech firms, potentially enabling them to game their results. When you ...

12d

OpenAI, Mistral AI release new hardware-efficient language models

OpenAI Group PBC and Mistral AI SAS today introduced new artificial intelligence models optimized for cost-sensitive use ...

InfoQ

Google Researchers Propose Bayesian Teaching Method for Large Language Models

Google Research has proposed a training method that teaches large language models to approximate Bayesian reasoning by ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results