As large language models (LLMs) continue to improve at coding, the benchmarks used to evaluate their performance are steadily becoming less useful. That's because though many LLMs have similar high ...
Over the past month or so, I've been putting ChatGPT through its paces, giving it a wide range of challenges to solve -- from writing a Star Trek script to coding a WordPress plugin. The shocking (and ...