from Hacker News

AlphaFold Protein Structure Database

by matejmecka on 7/22/21, 3:15 PM with 58 comments

by stephanheijl on 7/22/21, 4:10 PM
I'm impressed and grateful that DeepMind released this resource, this will save a lot of compute from labs trying to replicate an entire exome for themselves. While some structures look great, there are still some misses here. Important structures like BRCA1 (a well-studied breast cancer associated protein) are just structures for the BRCT and RING domains surrounded by a low-confidence string of amino acids, likely shaped to be globular: https://alphafold.ebi.ac.uk/entry/P38398
Maybe I was wrong for expecting the impossible here, but I was excited to see this specific structure and it appears that there is still work to do. Nevertheless, kudos to Deepmind on their amazing achievement and contributions to the field!
by ramraj07 on 7/22/21, 6:38 PM
As an ex biomedical researcher I was trying to think what protein I should enter and see, and couldn't come up with a protein that I know of, that didn't have a structure already (at least a crude one). That is, we roughly know how most known important proteins look like. This is an amazing tool, and will he indispensable in labs (I'll expect any lab to use this site at least once a year?) But it's not as transformative as some might think.
by moyix on 7/22/21, 3:25 PM
Anyone else getting a 403 Forbidden?
If so it might be better to link to the paper instead: https://www.nature.com/articles/s41586-021-03828-1
by jkh1 on 7/22/21, 3:31 PM
Didn't see this post so posted it also. Also relevant: https://www.embl.org/news/science/alphafold-potential-impact...
by sdbrown on 7/22/21, 3:56 PM
This is a fabulous convenience! The reach of this ready-to-go data will be much larger (in some directions) than the model and CASP results themselves.
by lumost on 7/22/21, 6:58 PM
I used to do some RNA molecular dynamics simulations in college which were both computationally expensive and difficult to replicate. Having the ability to reasonably predict protein structure is an incredible scientific achievement - however I am curious if anyone here who is better informed has takes on the following.
1. How likely is it that alphafold learned to accurately predict protein structure in the narrow domain of proteins that have been experimentally synthesized and whose structure has been measured? in other words will AlphaFold's results generalize to proteins which cannot yet be synthesized in the laboratory.
2. If Alphafold's accuracy holds, what type of commercial applications does this open up?
by pelorat on 7/22/21, 8:08 PM
There's a lot of news about AlphaFold lately but what about Rossettafold? Wasn't it more accurate and much faster?
by _RPL5_ on 7/22/21, 10:00 PM
This is awesome! When they announced CASP results a few months ago, I was wondering if AlphaFold will be accessible as an API, where you can submit a protein id or a sequence and get back a 3D structure. This database is basically that, except it's free & open to the public. Major props!
by nharada on 7/22/21, 4:03 PM
From the abstract[1]:
> After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally-determined structure. Here we dramatically expand structural coverage by applying the state-of-the-art machine learning method, AlphaFold2, at scale to almost the entire human proteome (98.5% of human proteins).
[1] https://www.nature.com/articles/s41586-021-03828-1
by spacecity1971 on 7/22/21, 9:03 PM
Quick question, please excuse my ignorance, but is there a way to extrapolate sequence from structure? In other words, can we design proteins and calculate the sequence required to make it?
by Ovah on 7/22/21, 5:52 PM
Interesting that they're porting it to other organisms. Different organisms have variations in ribosomes, post translational modifications and even tRNA repertoire. So it's not a guarantee that two identical DNA sequences will give identical proteins in two different organisms.
by culopatin on 7/22/21, 7:16 PM
I happen to be working on a database for folds as well. But RNA folds not protein folds. I’m not a bio guy but my gf is and if I understand correctly this is not the same. I hope they are different because it would suck to be me lol.
This is my first big boy project and I’m driving solo so it takes me a while to make any progress. But at least now I have this db and genbank to model after
by dnautics on 7/22/21, 4:17 PM
yikes, this doesn't even do some basic stuff like trim off pre-protein segments for secreted proteins... Without this, you could get some very incorrect structures.
by visarga on 7/22/21, 4:22 PM
Citation factory, that's what it is.
by ricksunny on 7/22/21, 3:44 PM
I’m sorry but why don’t tbey just release the ability for a user to enter a known real-world sequence’s accession number from Genbank / GISAID, and generate the protein structure from that? Why do they have to abstract the user from the process by only exposing a completed database of the protein structures the Alphafold researchers decided would be worth producing?