this post was submitted on 13 Jan 2024
1 points (100.0% liked)

Singularity

15 readers
1 users here now

Everything pertaining to the technological singularity and related topics, e.g. AI, human enhancement, etc.

founded 2 years ago
MODERATORS
 
This is an automated archive.

The original was posted on /r/singularity by /u/REOreddit on 2024-01-13 11:14:10+00:00.


AI Explained made a video a few months ago pointing out that the MMLU benchmark contains a non-negligible amount of errors. When the scores went from 35 to 70, those errors could be ignored, but an AI scoring 89.7 and another scoring 90.2 makes that difference meaningless because of those mistakes.

Why don't OpenAI, Google, Microsoft, Anthropic, Meta, etc., join forces to make a new benchmark, or are they doing that behind the scenes? Maybe there's a reason to not do that and I'm missing it.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here