Singularity

15 readers

1 users here now

Everything pertaining to the technological singularity and related topics, e.g. AI, human enhancement, etc.

founded 2 years ago

MODERATORS

bOt@zerobytes.monster

Why don't the top AI labs pool their resources to create a new benchmark better than MMLU? (zerobytes.monster)

submitted 1 year ago by bOt@zerobytes.monster to c/singularity@zerobytes.monster

0 comments fedilink hide all child comments

This is an automated archive.

The original was posted on /r/singularity by /u/REOreddit on 2024-01-13 11:14:10+00:00.

AI Explained made a video a few months ago pointing out that the MMLU benchmark contains a non-negligible amount of errors. When the scores went from 35 to 70, those errors could be ignored, but an AI scoring 89.7 and another scoring 90.2 makes that difference meaningless because of those mistakes.

Why don't OpenAI, Google, Microsoft, Anthropic, Meta, etc., join forces to make a new benchmark, or are they doing that behind the scenes? Maybe there's a reason to not do that and I'm missing it.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here