this post was submitted on 10 Jun 2023
168 points (98.8% liked)
Programmer Humor
32426 readers
1007 users here now
Post funny things about programming here! (Or just rant about your favourite programming language.)
Rules:
- Posts must be relevant to programming, programmers, or computer science.
- No NSFW content.
- Jokes must be in good taste. No hate speech, bigotry, etc.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Yes. My (minimally informed from a single class) understanding is that it sort-of depends on the problem too. Like perhaps in looking at all the data on proteins, the neural network might notice a pattern in protein folding is applicable to the tweaked problem. Of course, there is no guarantee that such a generally applicable rule exists. And even if it does, it might not be discovered by the net before overtraining occurs.
It sounds like your memory from that class is pretty good, and you're right, it depends on what we're trying to solve, but the problem in this case is protein folding, so if a neural network spotted a pattern, that's what we want. Figuring out the generalisable "rules" (i.e. why proteins fold a certain way) isn't what we're trying to do with these tools (yet), we're just on the pattern finding side, which is why the developments from AlphaFold are so incredible, it's just limited.
It's articles like this that do my head in https://www.scientificamerican.com/article/one-of-the-biggest-problems-in-biology-has-finally-been-solved/
It feels like my job for the next few years is going to be "professional killjoy", because I get people's excitement, but we can't properly use these tools if we don't acknowledge their limitations. If we did that, they'd actually become more powerful because we could develop new and different tools, or go gather experimental data to validate some of the generated structures (or to round out the training data).
I don't know if this would count as overtraining, because it has so far performed amazingly on structures that are similar to the training data but not in the training data. The problem is we don't have much training data for the tricky parts. That's fine, it just means it won't help us learn much about those areas, but headlines like "alphafold predicts the structures of all human proteins" are so misleading