by jhunter1016 on 1/26/25, 1:35 PM with 42 comments
However, this simple script does not catch some of the more advance phishing sites and it doesn't catch other types of content that we can't support on our platform like porn.
Curious if anyone has tips and tricks for content moderation? We're still going to be manually reviewing sites because we haven't reached a scale that makes that impossible. But automation is nice.
by nbadg on 1/28/25, 8:35 AM
I'm aware of eg NSFWJS, which is a tensorflowJS model [1]. Is there anything else that, say, can also do violence/gore detection?
by Freak_NL on 1/28/25, 7:58 AM
Wouldn't a solid set of processes to handle content complaints and knowing who your customers are in case the hosting country's law enforcement has a case suffice?
Or are you having some free tier where users can anonymously upload stuff?
In the latter case — a free place to stash megabytes — you'll need to detect password protected archives in addition to unencrypted content. Get ready for a perpetual game of whack-a-mole though.
by brudgers on 1/26/25, 7:19 PM
That makes approximately all business ideas to host user generated content non-viable. The conflict is dynamic and you are the Maginot Line...except that any breach of laws creates a potential attack by state enforcement agencies too.
To put it another way, ASCII files and a teletype were enough to see pictures of naked ladies. Good luck.
by Terretta on 1/27/25, 2:48 PM
by mooreds on 1/28/25, 12:37 PM
We have many happy clients that moderate UGC with Cleanspeak, including gaming and customer service applications. You can read more about the approach in the docs[1] or the blog[2]. Here's a blog post[3] that talks about root word list extrapolation, which is one of the approaches.
Cleanspeak is not the cheapest option and there's some integration work via API required, but if you are looking for performant, scalable, flexible moderation, it's worth an eval.
1: https://cleanspeak.com/docs/3.x/tech/
2: https://webflow.cleanspeak.com/blog
3: https://cleanspeak.com/blog/root-word-list-advantage-content...
by AyyEye on 1/28/25, 8:57 AM
by dsr_ on 1/28/25, 12:37 PM
by BrunoBernardino on 1/28/25, 8:14 AM
[1]: https://news.ycombinator.com/item?id=42780265 [2]: https://oxcheck.com/safe-api
by jamesponddotco on 1/28/25, 6:02 PM
[1]: https://git.sr.ht/~jamesponddotco/bonk
[2]: Because I host in Germany.
by bauerpl on 1/28/25, 12:25 PM
by scarface_74 on 1/27/25, 3:27 AM
by raywu on 1/28/25, 9:57 PM
Moderation is a vast topic - there are different services that focus on different areas: such as, text, images, CSAM, etc. Traditionally you treat each problem area differently.
Within each area, you, as an operator, need to define the level of sensitivity for the category of offense (policies).
Some policies seem more clear cut (eg image: porn) while others seem more difficult to define precisely (eg text: bullying or child grooming).
In my experience, text moderation is more complex and presents a lot of risks.
There are different approaches for text moderation.
Keyword based matching services like Cleanspeak, TwoHat, etc. are baseline level useful but limiting because assessing a keyword requires context. A word can be miscategorized and results in false positive or false negative with this approach, which may impact your operation at scale; or UX if a platform requires more of a real-time experience.
LLM is theoretically well suited for taking context into account for text moderation; however they are also pricier and may require furthering fine tuning or self-hosting for cost savings.
CSAM as a problem area presents the highest risks though may be more clear cut. There are dedicated image services and regulatory bodies that focus on this area (for automating reporting to local law enforcement).
Finally, EU (DSA) also requires social media companies adhere to self report on moderation actions. EU also requires companies to provide pathways for users to own and delete their data (GDPR).
Edit: FIXED typos; ADDED a note on CSAM and DSA & GDPR