technology

23375 readers

13 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 4 years ago

MODERATORS

context@hexbear.net

EmmaGoldman@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

ZoomeristLeninist@hexbear.net

AI Researchers 6x Model Performance to Match Humans in Abstract Reasoning Benchmark (arxiv.org)

submitted 1 month ago by AtmosphericRiversCuomo@hexbear.net to c/technology@hexbear.net

2 comments fedilink hide all child comments

Test-time training (TTT) significantly enhances language models' abstract reasoning, improving accuracy up to 6x on the Abstraction and Reasoning Corpus (ARC). Key factors for successful TTT include initial fine-tuning, auxiliary tasks, and per-instance training. Applying TTT to an 8B-parameter model boosts accuracy to 53% on ARC's public validation set, nearly 25% better than previous public, neural approaches. Ensemble with recent program generation methods achieves 61.9% accuracy, matching average human scores. This suggests that, in addition to explicit symbolic search, test-time training on few-shot examples significantly improves abstract reasoning in neural language models.

you are viewing a single comment's thread
view the rest of the comments

[–] SocialistDovahkiin@hexbear.net 6 points 1 month ago (1 children)

achieving similar statistical accuracy when training off of large datasets which probably have the answers to a lot of the parts of these benchmarks doesn't seem too impressive

[–] AtmosphericRiversCuomo@hexbear.net 5 points 1 month ago

The training datasets don't have the answers because the benchmark is diverse enough. That's why other models struggled to perform as well as humans until they applied the approach outlined in the paper. This is the benchmark: https://liusida.github.io/ARC/