by matejmecka on 7/15/21, 6:22 PM with 165 comments
by thesausageking on 7/15/21, 7:30 PM
https://www.nature.com/articles/s41586-021-03819-2_reference...
by dekhn on 7/15/21, 8:16 PM
One of the things I say about CASP has to be updated. It used to be "2 years after Baker wins CASP, the other advanced teams have duplicated his methods and accuracy, and 4 years after, everything Baker did is now open source and trivially reproducible"
now, it's baker catching up to DeepMind and it took about a year
by devindotcom on 7/15/21, 8:45 PM
More info here and here:
https://www.bakerlab.org/index.php/2021/07/15/accurate-prote...
https://techcrunch.com/2021/07/15/researchers-match-deepmind...
by nextos on 7/15/21, 7:12 PM
For example, multi-complex proteins are not well predicted yet and these are really important in many biological processes and drug design:
https://occamstypewriter.org/scurry/2020/12/02/no-deepmind-h...
A disturbing thing is that the architecture is much less novel than I originally thought it would be, so this shows perhaps one of the major difficulties was having the resources to try different things on a massive set of multiple alignments. This is something an industrial lab like DeepMind excels at. Whereas universities tend to suck at anything that requires a directed effort of more than a handful of people.
by dekhn on 7/15/21, 7:10 PM
The underlying sequence datasets include PDB strucrures and sequences, and how those map to large collections of sequences with no known structure (no surprise). Each of those datasets represents decades of thousands of scientists work, along with programmers and admins who kept the databases running for decades with very little grant money (funding long-term databases is something NIH hated to do until recently).
by tdfirth on 7/16/21, 6:46 AM
My experience working with code written by researchers is that it frequently contains a large number of bugs, which brings the whole project into question. I've also found that encouraging them to write tests greatly improves the situation. Additionally, when they get the hang of testing they often come to enjoy it, because it gives them a way to work on the code without running the entire pipeline (which is a very slow feedback loop). It also gives them confidence that a change hasn't lead to a subtle bug somewhere.
Again, I'm not criticising. I am aware that there are many ways to produce high quality software and Google/DeepMind have a good reputation for their standards around code review, testing etc. I am, however, interested to understand how the team that wrote this think about and ensure accuracy.
In general, I hope that testing and code review become a central part of the peer review process for this kind of work. Without it, I don't think we can trust results. We wouldn't accept mathematical proofs that contained errors, so why would we accept programs that are full of bugs?
edit: grammar
by duckerude on 7/15/21, 8:12 PM
Does CC BY-NC actually do this? As far as I can tell it only really talks about sharing/reproducing, not using.
Or is the only thing prohibiting other commercial use the words "available for non-commercial use only"?
by COGlory on 7/15/21, 7:24 PM
by fossuser on 7/15/21, 7:12 PM
Could you give an overview of how people can leverage this (or how you might?).
From reading around about it, it sounds like there's often a need to find a certain type of molecule to activate/inhibit another based on shape and the ability to programmatically solve for this makes the searching way easier.
Is this too oversimplified/wrong? How will this be used in practice.
[Edit]: Thanks for the answers!
by Cas9 on 7/15/21, 7:32 PM
by qeternity on 7/15/21, 7:07 PM
by stupidcar on 7/15/21, 7:21 PM
by pjfin123 on 7/15/21, 6:51 PM
by mensetmanusman on 7/15/21, 10:23 PM
by jfengel on 7/15/21, 7:49 PM
by culopatin on 7/15/21, 7:48 PM
by swalsh on 7/15/21, 7:04 PM
by hermitsings on 7/16/21, 6:30 AM
by Cas9 on 7/15/21, 7:38 PM