from Hacker News

Alphafold

by matejmecka on 7/15/21, 6:22 PM with 165 comments

  • by thesausageking on 7/15/21, 7:30 PM

  • by dekhn on 7/15/21, 8:16 PM

    I missed an important detail: """an academic team has developed its own protein-prediction tool inspired by AlphaFold 2, which is already gaining popularity with scientists. That system, called RoseTTaFold, performs nearly as well as AlphaFold 2, and is described in a paper in Science paper also published on 15 July"""

    One of the things I say about CASP has to be updated. It used to be "2 years after Baker wins CASP, the other advanced teams have duplicated his methods and accuracy, and 4 years after, everything Baker did is now open source and trivially reproducible"

    now, it's baker catching up to DeepMind and it took about a year

    https://doi.org/10.1126/science.abj8754

  • by devindotcom on 7/15/21, 8:45 PM

    Also announced today was RoseTTAFold from UW's Baker Lab, which claims nearly the same accuracy at much higher efficiencies. There's a public server and paper in Science.

    More info here and here:

    https://www.bakerlab.org/index.php/2021/07/15/accurate-prote...

    https://techcrunch.com/2021/07/15/researchers-match-deepmind...

  • by nextos on 7/15/21, 7:12 PM

    Alphafold 2 is very very cool, but we need a little dose of reality. It's still a bit away from really solving protein folding as it was marketed.

    For example, multi-complex proteins are not well predicted yet and these are really important in many biological processes and drug design:

    https://occamstypewriter.org/scurry/2020/12/02/no-deepmind-h...

    A disturbing thing is that the architecture is much less novel than I originally thought it would be, so this shows perhaps one of the major difficulties was having the resources to try different things on a massive set of multiple alignments. This is something an industrial lab like DeepMind excels at. Whereas universities tend to suck at anything that requires a directed effort of more than a handful of people.

  • by dekhn on 7/15/21, 7:10 PM

    Fantastic, they released the dataset and code to train the model. Science will be able to proceed. edit: not the code to train the model, just the code to run inference.

    The underlying sequence datasets include PDB strucrures and sequences, and how those map to large collections of sequences with no known structure (no surprise). Each of those datasets represents decades of thousands of scientists work, along with programmers and admins who kept the databases running for decades with very little grant money (funding long-term databases is something NIH hated to do until recently).

  • by tdfirth on 7/16/21, 6:46 AM

    This isn't a criticism - I'm just curious to hear people's thoughts on this. When I look at this code, one of my initial reactions is that it does not seem to be very thoroughly tested. Sure, certain modules have been tested (e.g. `model.quat_affine`) but it's not clear how completely. Meanwhile, other modules, for example `model.folding`, have not been tested at all, despite containing large amounts of complex logic. That kind of code that works with arrays is very easy to get wrong and bugs are difficult to spot.

    My experience working with code written by researchers is that it frequently contains a large number of bugs, which brings the whole project into question. I've also found that encouraging them to write tests greatly improves the situation. Additionally, when they get the hang of testing they often come to enjoy it, because it gives them a way to work on the code without running the entire pipeline (which is a very slow feedback loop). It also gives them confidence that a change hasn't lead to a subtle bug somewhere.

    Again, I'm not criticising. I am aware that there are many ways to produce high quality software and Google/DeepMind have a good reputation for their standards around code review, testing etc. I am, however, interested to understand how the team that wrote this think about and ensure accuracy.

    In general, I hope that testing and code review become a central part of the peer review process for this kind of work. Without it, I don't think we can trust results. We wouldn't accept mathematical proofs that contained errors, so why would we accept programs that are full of bugs?

    edit: grammar

  • by duckerude on 7/15/21, 8:12 PM

    > The AlphaFold parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode

    Does CC BY-NC actually do this? As far as I can tell it only really talks about sharing/reproducing, not using.

    Or is the only thing prohibiting other commercial use the words "available for non-commercial use only"?

  • by COGlory on 7/15/21, 7:24 PM

    I am a structural biologist. This is one of the handful of topics that overlaps with my field here. I'm very excited to play with this, although it might eventually put me out of a job.
  • by fossuser on 7/15/21, 7:12 PM

    Does anyone on HN work in bio or drug discovery?

    Could you give an overview of how people can leverage this (or how you might?).

    From reading around about it, it sounds like there's often a need to find a certain type of molecule to activate/inhibit another based on shape and the ability to programmatically solve for this makes the searching way easier.

    Is this too oversimplified/wrong? How will this be used in practice.

    [Edit]: Thanks for the answers!

  • by Cas9 on 7/15/21, 7:32 PM

    Honest question: since AlphaFold doesn't really _solve_ the protein folding problem (it's NP-complete after all), but only _approximates_ solutions very well, what are the real impacts of this? Isn't a good approximation of a protein enough to cause unexpected problems? How do we know that an approximate structure will perform the same as the correct solution?
  • by qeternity on 7/15/21, 7:07 PM

    Ok, so biochemists: which bit of the secret sauce are they leaving out?
  • by stupidcar on 7/15/21, 7:21 PM

    The model parameters are only available for non-commercial use. That's a shame, as I presume there might be a lot of medical startups that would benefit from having this kind protein-folding tech available.
  • by pjfin123 on 7/15/21, 6:51 PM

    I'm assuming you can't run this on any consumer computer?
  • by mensetmanusman on 7/15/21, 10:23 PM

    Distribution of this 2 TB file seems like a good use of torrent…
  • by jfengel on 7/15/21, 7:49 PM

    So... is it possible to clone this and turn it into a Folding@Home client? How does it do?
  • by culopatin on 7/15/21, 7:48 PM

    Does anyone know if this can be made to work with rna fold?
  • by swalsh on 7/15/21, 7:04 PM

    edit I was wrong. Please ignore.
  • by hermitsings on 7/16/21, 6:30 AM

    fodl
  • by Cas9 on 7/15/21, 7:38 PM

    Honest question: since AlphaFold doesn't really _solve_ the protein folding problem (it's NP-complete after all), but only _approximates_ solutions very well, what are the real impacts of this? Isn't a good approximation of a protein enough to cause unexpected problems? How do we know that an approximate structure will perform the same as the correct solution?