from Hacker News

Machines are better referees than humans but we’ll be sued if we use them

by inglesp on 2/19/14, 8:26 AM with 61 comments

by Blahah on 2/19/14, 9:38 AM
Peter Murray Rust (author of this blog post) is a really great man. He's been a tireless advocate for dismantling privelege and setting knowledge free for several decades. I'm proud to say he's becoming a sort of mentor to me. Last week I spent a couple of days with his research group and saw this software in action - it's really impressive.
They can take an ancient paper with very low quality diagrams of complex chemical structures, parse the image into an open markup language and reconstruct the chemical formula and the correct image. Chemical symbols are just one of many plugins for their core software which interprets unstructured, information rich data like raster diagrams. They also have plugins for phylogenetic trees, plots, species names, gene names and reagents. You can develop plugins easily for whatever you want, and they're recruiting open source contributors (see https://solvers.io/projects/QADhJNcCkcKXfiCQ6, https://solvers.io/projects/4K3cvLEoHQqhhzBan).
As a side effect of how their software works, it can detect tiny suggestive imperfections in images that reveal scientific fraud. I was shown a demo where a trace from a mass spec (like this http://en.wikipedia.org/wiki/File:ObwiedniaPeptydu.gif) was analysed. As well as reading the data from the plot, it revealed a peak that had been covered up with a square - the author had deliberately obscured a peak in their data that was inconvenient. Scientific fraud. It's terrifying that they find this in most chemistry papers they analyse.
Peter's group can analyse thousands or hundreds of thousands of papers an hour, automatically detecting errors and fraud and simultaneously making the data, which are facts and therefore not copyrightable, free. This is one of the best things that has happened to science in many years, except that publishers deliberately prevent it. Their work also made me realise it would be possible to continue Aaron Swartz' work on a much bigger scale (http://blahah.net/2014/02/11/knowledge-sets-us-free/).
Academic publishers who are suppressing this are literally the enemies of humanity.
by yoha on 2/19/14, 9:34 AM
Google cache: https://webcache.googleusercontent.com/search?q=cache:b2trH5...
by atmosx on 2/19/14, 3:01 PM
When I asked my journalist friend, why in football (soccer) games the ref don't use high-tech, he thought about it for 5 minutes and then told me: "If they use technology it will be really hard to set up games. If you take from a league the ability to set-up games and promote specific teams/individuals, then I don't know how the game will be shaped".
Of course it's universal, it's not like everything is a set-up but happens more often than most would likely imagine, especially since betting came into play.
So there you got it.
by JackFr on 2/19/14, 2:52 PM
This should be supported (both financially and ideologically) by the National Library of Medicine at the National Institutes of Health. The NIH doles out about $30 billion in research grants every year. If they could spend a tiny fraction of a percent to dramatically improve the quality of the rest and make such automatic checking a standard practice that would be tremendous bang for the buck.
Oh yeah -- and they're big enough to fight academic publishers.
by tomp on 2/19/14, 11:31 AM
Can they release the software to the world? Maybe, if we all make an effort to analyse whatever papers we can access, we will together make enough noise that it will be impossible to ignore, and also impossible to silence (cf. The Pirate Bay). This could be one of the most important advancements of science in the past few years.
by Shivetya on 2/19/14, 11:06 AM
At first I thought the article would be about sports, which in itself would make for an interesting discussion about using machines to judge rules adherence, not that I would want to take that human element out of sports.
However this is more along the lines of validating what is published. Of any group you would hope that scientist and their like would jump on technology like this so as to provide the most accurate representation of their work as possible. The same for publishers, why wouldn't they want to brag the use the most advanced interrogation methods for the papers they publish?
I guess they are people too, hyper sensitive that fault will be found
by _greim_ on 2/20/14, 12:04 AM
So as a non-scientist, let me see if I understand.
There are lots of uncaught errors floating around out there in scientific papers, and many of them can now be found with this software. But the exposing the errors so that they can be corrected is tricky because: A) you have to have legal access to a paper in order to scan it, and B) even if you do have access, under the current rules only the publishers have the right expose the errors, and they're not interested because they want to avoid the embarrassment.
Am I understanding it?
by Udo on 2/19/14, 12:33 PM
I see a very exciting possibility for the future of academic papers in certain disciplines where we could have a machine validation step performed automatically, not only on submission but as a tool for the author to check their work. Like a git commit hook that forces a test suite to run. Of course, this would require some formalism to tag data, diagrams, and formulae but it's probably in our best interest in the long run to make the body of our research more machine-accessible anyway.
by sov on 2/19/14, 9:43 PM
For those curious, the 5 membered ring in cyclopiazonic acid should have a NH atom rather than a CH2.
by bloaf on 2/20/14, 1:41 AM
When people talk about the future, they always seem to think that it will be the scientific jobs that get roboticized last. I think it will be the opposite, it won't be long before systems like this one will be able to analyze the scientific literature, identify shortcomings, and tell us what experiments to do next. Science will become less about creativity and problem solving, and more about following directions; eventually becoming completely automated.
http://www.aejournal.net/content/2/1/1
by nder on 2/19/14, 5:41 PM
Any chance you could farm out the software to lab in a nationality with MUCH MUCH looser copyright laws, and a court system that would be problematic for outside law suits?
by dflock on 2/19/14, 8:02 PM
This blog post is down, try here: http://blogs.ch.cam.ac.uk.nyud.net/pmr/2014/02/18/machines-a...
by ylem on 2/20/14, 4:27 AM
I suppose one way around this would be the NSF to require any grant awardees to deposit their structures in a publicly accessible database...But, I'm a bit surprised--is there nothing like arxiv.org for chemistry? Why not?
by nl on 2/19/14, 8:57 PM
There is of course a way around the problems cited in the article.
If the referees ran the software on the preprint it would find the same problem.
I agree this isn't as good, but it would be a step forward.
by bloaf on 2/20/14, 1:47 AM
I think the dream would be to couple a literature-analyzer like this with a specialized search engine like Wolfram Alpha.