Rendered at 23:17:22 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
seamossfet 1 days ago [-]
The problem with models like this is they're built on very little actual training data we can trace back to verifiable protein data. The protein data back, and other sources of training data for stuff like this, has a lot of broken structures in them and "creative liberties" taken to infer a structure from instrument data. It's a very complex process that leaves a lot for interpretation.
On top of that, we don't have a clear understanding on how certain positions (conformations) of a structure affect underlying biological mechanisms.
Yes, these models can predict surprisingly accurate structures and sequences. Do we know if these outputs are biologically useful? Not quite.
This technology is amazing, don't get me wrong, but to the average person they might see this and wonder why we can't go full futurism and solve every pathology with models like these.
We've come a long way, but there's still a very very long way to go.
stardust2 1 days ago [-]
How do we get more verifiable protein data? So even if we had better data, we don't yet understand how the structure impacts the biology?
nradclif 19 hours ago [-]
"Complete results, architectural decisions, and runnable code below."
Yeah. Things like "Complete results, architectural decisions, and runnable code below." is literally how AI outputs stuff, so I'd expect the post was AI written too. :(
Nice work! Here is an article you may find helpful if you have not already come across it.[0]. You may also want to consider benchmarking against some non ML methods.[1]
What makes this dataset or problem worth solving compared to other health datasets? Would the results on this task be broadly useful to health?
CyberDildonics 1 days ago [-]
What other "datasets" are you talking about? How do you "solve a dataset" ?
xyz100 18 hours ago [-]
You solve a dataset when you learn what there is to learn about the phenomenon of interest. The limit of such phenomenon is “cure all disease”, and clearly this is not solving that.
CyberDildonics 10 hours ago [-]
What are you talking about? "the phenomenon of interest"? There is nothing you wrote in either comment that makes sense.
What is a "dataset" that has been "solved" and what did the program do that 'solved' it?
xyz100 56 minutes ago [-]
MNIST (the number classification task) has been “solved” a billion times and it is hard to imagine any subsequent advances there as scores using a variety of methods have hit the saturation point of accuracy. Any further improvements are likely overfitting to noise. Therefore, we know that it is easy to detect handwritten numbers. However, we may not know how to detect other things as well, like reading an MRI. Those datasets/tasks are clearly different and require different techniques. Training an LLM is likewise different.
CyberDildonics 1 minutes ago [-]
has been “solved” a billion times
If it was really solved, wouldn't it just need to happen once?
You think classifying handwriting of 10 numbers is the same as this that took 55 hours of GPU time for someone to go through?
rubicon33 1 days ago [-]
Can someone explain what one might use this model for? As a developer with a casual interest in biology it would be fun to play with but honestly not sure what I would do
colechristensen 1 days ago [-]
You can get your feet wet with genetic engineering for surprisingly little money.
Lab strains of things tend to be extremely sensitive and not human adapted. You shouldn't study and modify human-infecting organisms in your basement anyway. While you shouldn't ignore protective equipment and proper procedure... paranoia about infecting yourself with a lab leak isn't warranted.
_zoltan_ 4 hours ago [-]
I'd love to experiment with this stuff, just literally have no idea how it would be safe to start.
jazzpush2 17 hours ago [-]
A Codon-based model is cool. I know NVIDIA is building quite a large one.
Interesting work - Looks like AI for science is having it's day right now.
khalic 1 days ago [-]
> In Progress: CodonJEPA
JEPA is going to break the whole industry :D
digdugdirk 1 days ago [-]
Can you explain this? I haven't heard of JEPA, and from a quick search it seems to be vision/robotics based?
khalic 1 days ago [-]
It’s a self supervised learning architecture, and it’s pretty much universal. The loss function runs on embeddings, and some other smart architectural choices allover. Worth diving into for a few hours, Yann LeCun gives some interesting talks about it
On top of that, we don't have a clear understanding on how certain positions (conformations) of a structure affect underlying biological mechanisms.
Yes, these models can predict surprisingly accurate structures and sequences. Do we know if these outputs are biologically useful? Not quite.
This technology is amazing, don't get me wrong, but to the average person they might see this and wonder why we can't go full futurism and solve every pathology with models like these.
We've come a long way, but there's still a very very long way to go.
This is a weird post, there doesn't seem to be any "below" here. Another comment linked the article: https://huggingface.co/blog/OpenMed/training-mrna-models-25-...
0. https://pubmed.ncbi.nlm.nih.gov/35318324/
1. https://www.nature.com/articles/s41586-023-06127-z
What is a "dataset" that has been "solved" and what did the program do that 'solved' it?
If it was really solved, wouldn't it just need to happen once?
You think classifying handwriting of 10 numbers is the same as this that took 55 hours of GPU time for someone to go through?
This guy shows a lot of how it's done: https://www.youtube.com/@thethoughtemporium
Basically you can design/edit/inject custom genes into things and see real results spending on the scale of $100-$1000.
The (public!) school had a grant from one of Seattle's biotech boom companies.
Lab strains of things tend to be extremely sensitive and not human adapted. You shouldn't study and modify human-infecting organisms in your basement anyway. While you shouldn't ignore protective equipment and proper procedure... paranoia about infecting yourself with a lab leak isn't warranted.
At GTC they showed an SAE they built on a smaller version of it, allowing you to see what their model learned: https://research.nvidia.com/labs/dbr/blog/sae/
JEPA is going to break the whole industry :D
I am a structural biologist working in pharmaceutical design and this type of thing could be wildly useful (if it works).
Who says we don't?