from Hacker News

RCN is much more data efficient than traditional Deep Neural Networks

by barbolo on 10/30/17, 10:41 AM with 46 comments

by dpandya on 10/31/17, 2:11 AM
It seems that the primary contribution of this technique is that it uses specific assumptions supported by neuroscience research in order to allow for composability of learning and better generalization. By introducing these specific assumptions (e.g. contours define objects), they are able to reduce the complexity that the model has to learn and thereby reduce the amount of data that it needs.
Obviously, the question then becomes: what happens when you have visual situations that violate or come close to violating the assumptions made?
I'm not familiar enough with the specifics of RCNs to be able to answer this; maybe someone else can. Very interesting paper and approach regardless.
by bufo on 10/31/17, 5:51 AM
Again: no one cares about CAPTCHA in the deep learning world compared to other more challenging benchmarks. I wouldn’t be surprised that many optimizations could be made with ANY kind of effort put into it. Still waiting for Vicarious to go beyond MNIST and text CPATCHA.
by flor1s on 10/31/17, 1:47 AM
I only skimmed over the article, but I think the title on HN does not reflect the claims the authors are making.
The title of the paper is: A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs
The title of the article is: Common Sense, Cortex, and CAPTCHA
That's nowhere near the sensationalist title at HN: RCN is much more data efficient than traditional Deep Neural Networks
by sherbondy on 10/31/17, 3:31 AM
As far as I can tell, the code on GitHub (https://github.com/vicariousinc/science_rcn) only works for the MNIST dataset.
Unclear how to run on the CAPTCHA examples referenced in the paper, even though they did make the datasets for those examples available.
Bummer, a big part of what the paper mentions about being so great with this RCN model is being able to segment sequences of characters (of indeterminate length even!). Sad that I cannot easily verify this for myself!
by BucketSort on 10/31/17, 1:06 AM
I'd love to read this, but the faint text on white background... good god. I went through the code looking to change the background so I could read it and found this:
body { text-rendering: optimizeLegibility; }
Ok
by nightcracker on 10/31/17, 12:43 AM
Featuring some of the worst typography I've seen on the internet. There clearly was an attempt, but just leaving font-face as default would've been more readable.
by cs702 on 10/31/17, 2:28 AM
This paper looks really interesting to me, although after quickly reading the introduction it's evident that I'm going to have to invest quite a bit of time and effort on the paper to grasp its key ideas. I come from more an encoding-decoding, deep/machine-learning background, as opposed to a probabilistic graphical modeling or PGM background, and my knowledge of neuroscience is minimal.
To date, my experience with "deep PGM models" (for lack of a better term) is limited to some tinkering with (a) variational autoencoders using ELBO maximization as the training objective, and to a much lesser extent (b) "bi-directional" GANs using a Jensen-Shannon divergence between two joint distributions as the training loss.
Has anyone here with a similar background to mine had a chance to read this paper? Any thoughts?
by real-hacker on 11/7/17, 12:11 PM
It looks RCN sits between traditional machine learning (with manual feature selection) and 'modern' neural networks (CNN). The traditional methods are too rigid to capture the essential information, while the CNNs sometimes are too flexible to avoid overfitting. Different from CNNS, RCNs have a predetermined structure. Humans are not born a blank slate, we have a neural structure encoded in our genes, so we don't need millions of training samples to recognize objects. So maybe RCN is onto something.
I am curious how RCN performs on real-life images like ImageNet, and how do they perform against adversarial examples. If they can easily recognize adversarial examples, that would be very interesting...
by dx034 on 10/31/17, 10:31 AM
> In 2013, we announced an early success of RCN: its ability to break text-based CAPTCHAs like those illustrated below (left column). With one model, we achieve an accuracy rate of 66.6% on reCAPTCHAs, 64.4% on BotDetect, 57.4% on Yahoo, and 57.1% on PayPal, all significantly above the 1% rate at which CAPTCHAs are considered ineffective (see [4] for more details). When we optimize a single model for a specific style, we can achieve up to 90% accuracy.
66% with reCaptcha and up to 90% when optimised is much higher than what I can achieve with my actual brain. Maybe I should consider using a neural network to answer those, it happens quite frequently that I need 2-3 rounds to get through reCaptcha.
by nnx on 10/31/17, 1:47 AM
Is RCN more of a CNN alternative most useful to image-related tasks or could also work well to other types of neural networks?
ps: thanks god for Reader mode on Safari
by visarga on 10/31/17, 5:16 AM
This is a paper that departs from the 'normal' AI routine and takes a very different approach. Is there another paper formally describing the RCN network? What goes inside the RCN cell? I find it more like a teaser than a revelation at this point.
by _0w8t on 10/31/17, 9:34 AM
I do not see a discussion in the paper regarding computational efficiency of RCN detection. The only hint about performance that I found is at the end of supplementary material where the authors state:
> Use of appearance during the forward pass: Surface appearance is now only used after the backward pass. This means that appearance information (including textures) is not being used during the forward pass to improve detection (whereas CNNs do). Propagating appearance bottom-up is a requisite for high performance on appearance-rich images.
I presume from this that in the current form RCN requires much more computations than CNN per detection, but I could be wrong.
by stochastic_monk on 10/31/17, 4:49 AM
If I'm not mistaken, a Deep Belief Net or Deep Belief Machine would also be a generative model with enormously greater data efficiency. Comparing against CNNs is a red herring: the advantage of requiring less data to develop a model is more a generative/discriminative issue than it is an "RCN vs everyone else" issue.
What I don't quite understand is why Deep Belief Nets seem to not be getting press these days. For example, see this paper from 2010: http://proceedings.mlr.press/v9/salakhutdinov10a.html.
by gugagore on 10/31/17, 4:30 AM
Here's another example of a generative model that improves data efficiency, in a similar-ish domain.
https://gizmodo.com/a-new-ai-system-passed-a-visual-turing-t... / http://web.mit.edu/cocosci/Papers/Science-2015-Lake-1332-8.p...
by taneq on 10/31/17, 2:59 AM
Recent discussion on Vicarious' CAPTCHA cracking: https://news.ycombinator.com/item?id=15564922
by singularity2001 on 10/31/17, 10:07 AM
The git 'reference implementation' is only for MNIST not for real captchas.
by jostmey on 10/31/17, 3:38 AM
I'll need to see this approach work well across many datasets before I am convinced, not just captchas and the MNIST
by m3kw9 on 10/31/17, 3:22 AM
How hard is it to get it to run using CoreML?