by romain_g on 10/16/13, 11:55 AM with 104 comments
by nailer on 10/16/13, 2:18 PM
It cost, IIRC, a tenth of a cent per image URL. Rather than being based on skin tone, it was created based on algos to specifically identify labia, anuses, penises, etc. REST API: send a URL, get back a yes/no/maybe. You decided what to do with the maybes.
My experience:
- Before launch, I tested it with 4chan b as a feed, and was able to produce a mostly clean version of b with the exception of cartoon imagery.
- It could catch most of the stuff people tried to post to the site. Small breasted women (being that breasts are considered 'adult' in the US) was the only thing that would get through and wasn't a huge concern. Completely unmaintained public hair (as revealing as a black bikini) would also get through.
- Since people didn't know what I was testing with they didn't work around it (so nobody tried posting drawings or cartoons), but I imagine eg a photo of a prolapse might not trigger the anus detection as the shape would be too different.
- pifilter erred on the side of false negative, but one notable false positive: a pastrami sandwich.
by Theodores on 10/16/13, 1:41 PM
Notionally the oscilloscope would be there to show that the luminance and chroma was okay in the signal (i.e. it could be broadcast over the airwaves to look as intended at the other end - PAL/NTSC), however, porn and anything likely to be porn had a distinctive pattern on the oscilloscope screen. Should porn be suspected then the source material would obviously be patched through to a monitor 'just in case'.
Note that the oscilloscope was analog and that the image would be changing 25/30 times a second. Also, back then there were not so many false positives on broadcast TV, e.g. pop videos etc. where today's audience deems them artful rather than porn.
If I had to solve the problem programatically I would find a retired broadcast engineer and start from there, with what can be learned from a 'scope.
by adorable on 10/16/13, 2:10 PM
I found out that no single technique works great. If you want an efficient algorithm, you probably have to blend different ideas and compute a "nudity score" for each image. That's at least what I do.
I'd be happy to discuss how it works. Here are a few techniques used:
- color recognition (as discussed in other comments)
- haar-wavelets to detect specific shapes (that's what Facebook and others use to detect faces for example)
- texture recognition (skin and wood may have the same colors but not the same texture)
- shape/contour recognition (machine learning of course)
- matching with a growing database of NSFW images
The algorithm is open for test here: http://sightengine.com It works OK right now but once version 2 is out it should really be great.
by asolove on 10/16/13, 12:56 PM
Source: I helped implement a MT job to filter adult content for a large hosting company.
by ma2rten on 10/16/13, 3:03 PM
I used the so called Bag of Visual Words approach. At that time the state of the art in image recognition (now it's neural networks). You can read about on Wikipeida. The only main change from the standard approach (SHIFT + k-means + histograms + SVM + chi2 kernel) was that I used a version of SHIFT that uses color features. In addition to this I used a second machine learning classifier based on the context of the picture. Who posted it? Is it a new user? What are the words in the title? How many view does the picture have....
In combination the two classifiers worked nearly flawless.
Shortly after that, chat roulette has having it's porn problem and it was in the media that the founder was working on a porn filter. I send an email to offer my help, but didn't get an reaction.
by VLM on 10/16/13, 1:23 PM
puritanweirdos.example.com with no skin showing between toes and top of turtleneck (edited to add no pokies either)
and
normalpeople.example.com with 99% of the human race
The best solution to a problem involving computers is sometimes computer related, but sometimes is social. The puritans are never going to get along with the normal people anyway, so its not like sharding them is going to hurt.
Another way to hack the system is not to hire or accept holier than thou puritans. Personality doesn't mesh with the team, doesn't fit culture, etc. You have to draw the line somewhere, and weirdos on either end should get cut, so no CP or animals at one extreme, and no holy rollers on the other extreme.
The final social hack is its kind of like dealing with bullies via appeasement. So they're blocking reasonable stuff today, tomorrow they want to block all women not wearing burkhas or depictions of women damaging their ovaries by driving. Appeasing bullies never really works in the long run, so why bother starting. "If you claim not to like it, or at least enjoy telling everyone else repeatedly how you claim not to like it, stop looking at it so much, case closed"
by _mulder_ on 10/16/13, 1:53 PM
Develop a bot to trawl NSFW sites and hash each image (combined with the 'skin detecting' algorithms detailed previously). Then compare the user uploaded image hash with those in the NSFW database.
This technique relies on the assumption that NSFW images that are spammed onto social media sites will use images that already exist on NSFW sites (or are very similar to). Then it simply becomes a case of pattern recognition, much like SoundHound for audio, or Google Image search.
It wouldn't reliably detect 'original' NSFW material, but given enough cock shots as source material, it could probably find a common pattern over time.
edit: I've just noticed rfusca in the OP suggests a similar method
by mixmax on 10/16/13, 12:53 PM
Detecting smurf-porn(1) (yes that's a thing...) is even harder since all the actors are blue.
http://pinporngifs.blogspot.dk/2012/09/smurfs-porn.html?zx=7... - obviously very NSFW, but quite funny.
by eksith on 10/16/13, 12:45 PM
Edit: No shortage of stock image reviewer jobs https://google.com/search?hl=en&q=%22image%20reviewer%22
I'm trying to find an interview of one of these people describing what it's like on the other end. It wasn't a pleasant story. These folks are employed by the likes of Facebook, Photobucket etc... Most are outsourced, obviously, and they all have very high turnover.
by VLM on 10/16/13, 1:29 PM
If you're trying for "must not offend any human being on the planet" then you've got an AI problem that exceeds even my own human intelligence problem to figure out. Especially when it extends past pr0n and into stuff like satire, is that just some dudes weird self portrait, or a satire of the prophet, and are you qualified to figure it out?
by betterunix on 10/16/13, 1:36 PM
The classic problem of trying to filter pornography is trying to separate it from information about human bodies. I suspect that doing this with images will be even harder than doing it with text.
by quarterto on 10/16/13, 12:37 PM
by nathanb on 10/16/13, 2:43 PM
We as humans can readily classify images into three vague categories: clean, questionable, and pornographic. The problem of classification is not only one of determining which bucket an image falls into but also one of determining where the boundaries between buckets are. Is a topless woman pornographic? A topless man? A painting of a topless woman created centuries ago by a well-recognized artist? A painting of a topless woman done yesterday by a relatively unknown artist? An infant being bathed? A woman breastfeeding her baby? Reasonable people may disagree on which bucket these examples fall in.
So what if I create three filter sets: restrictive, moderate, and permissive, and then categorize 1,000 sample images as one of those three categories for each filter set (restrictive could be equal to moderate but filter questionable images as well as pornographic ones).
Assuming that the learning algorithm was programmed to look at a sufficiently large number of image attributes, this approach should easily be capable of creating the most robust (and learning!) filter to date.
Has anyone done this?
by Houshalter on 10/16/13, 8:47 PM
>There are already a few image based search engines as well as face recognition stuff available so I am assuming it wouldn't be rocket science and it could be done.
Just do a reverse image search for the image, see if it comes up on any porn sites or is associated with porn words.
by lectrick on 10/16/13, 7:38 PM
http://en.wikipedia.org/wiki/I_know_it_when_I_see_it
Basically, it's impossible to completely accurately identify pornography without a human actor in the mix, due to the subjectivity... and especially considering that not all nudity is pornographic.
by primaryobjects on 10/16/13, 2:07 PM
Take a look at the scores for classifying dogs vs cats with 97% accuracy http://www.kaggle.com/c/dogs-vs-cats/leaderboard. You could use a technique of digitizing the image pixels and feeding to a learning algorithm, similar to http://www.primaryobjects.com/CMS/Article154.aspx.
by denzil_correa on 10/16/13, 9:05 PM
[0] Shih, J. L., Lee, C. H., & Yang, C. S. (2007). An adult image identification system employing image retrieval technique. Pattern Recognition Letters, 28(16), 2367-2374. Chicago
http://sjl.csie.chu.edu.tw/sjl/albums/userpics/10001/An_adul...
by jmngomes on 10/16/13, 1:19 PM
by racbart on 10/16/13, 1:43 PM
Nudity != porn and certainly half-nudity != porn.
I'd rather go for pattern recognition. There's lot of image recognition software these days that can distinguish the Eiffel Tower from the Statue of Liberty and it might be useful to detect certain body parts and certain body configurations (for these shots that don't contain any private body part but there are two bodies in an unambiguous configuration).
by hugofirth on 10/16/13, 1:23 PM
If you assume that porn tends to cluster, rather than exist in isolation, then a crawl of other images on the source pages , applying computer vision techniques, should allow you to block pages that score above a threshold number of positive results (thus accounting for inaccuracy and false positives).
by ismaelc on 10/16/13, 2:47 PM
by unoti on 10/16/13, 6:42 PM
by beat on 10/18/13, 4:46 AM
Depending on the site, I'd go to a trust-based solution. New users get their images approved by a human censor (pr0n == spambot in most cases). Established users can add images without approval.
If you're going to try software, try something that errs on the side of caution, and send everything to a human for final decision-making, just like spam filters.
by npatten on 10/16/13, 7:00 PM
hilarious!
by hcarvalhoalves on 10/16/13, 5:28 PM
Maybe a good approach is an image lookup, trying to find the image on the web and seeing if it appears on a porn site, or a pornographic context.
by jcfiala on 10/16/13, 3:01 PM
by nate510 on 10/16/13, 10:13 PM
Um, so to speak.
by singlow on 10/16/13, 3:32 PM
by wehadfun on 10/16/13, 1:27 PM
by djent on 10/16/13, 3:44 PM
by dschiptsov on 10/16/13, 2:00 PM
by bedhead on 10/16/13, 2:40 PM
by level09 on 10/16/13, 3:41 PM
by digitalsushi on 10/16/13, 3:39 PM
by bicknergseng on 10/16/13, 4:58 PM
by bachback on 10/16/13, 12:55 PM