from Hacker News

We in-housed our data labelling

by EricButton on 2/27/25, 6:53 PM with 50 comments

by mlsu on 3/3/25, 6:54 AM
Where I'm from, "in house" means employees. I see "contractors" and "negative earnings" in the same article.
They do say that reviewers have to have some kind of aviation experience. I'd be more curious reading an article about how they source the talent here.
by yorwba on 3/3/25, 11:03 AM
> Failing a test will cost a user 600 points, or roughly the equivalent of 15 minutes of work on the platform. A correctly tuned penalty system removes the need for setting reviewer accuracy minimums; poor performers will simply not earn enough money to continue on the platform.
This still sets a reviewer accuracy minimum, but it is determined implicitly by the arbitrary test penalty instead of consciously chosen based on application requirements. I don't see how that's an improvement. If you absolutely want to have negative earnings, it would make more sense to choose a reviewer accuracy minimum to aim for, and then determine the penalty that would achieve that target, instead of the other way around.
Moreover, a reviewer earning nothing on expectation under this scheme (they work for 15 minutes, then fail a test, and have all their earnings wiped out) could team up with a second reviewer with the same problem, submitting their answer only when both agree, and as long as their errors aren't 100% correlated, they would end up with positive expected earnings they could split between them.
This clearly indicates that the incentive scheme as designed doesn't capture the full economic value of even lower-quality data when processed appropriately. Of course you can't expect random reviewers to spontaneously work together in this way, so it's up to the data consumer to combine the work of multiple reviewers as appropriate.
Trying to get reliable results from humans by exclusively hiring the most reliable ones can only get you so far; you can do much better by designing systems to use redundancy to correct errors when they inevitably do appear. Ironically, this is a case where treating humans as fallible cogs in a big machine would be more respectful.
by nsedlet on 3/3/25, 7:52 PM
Data labeling has been moving to onshore / higher paid work. There's still a lot offshore, but for LLMs in particular and various specialized models, there's a massive trend toward hiring highly educated, highly paid specialists in the US.
But as other commenters have warned: beware of labor laws, especially in CA/NY/MA.
I've had a front-row seat to this...our company hires + employs contract W2 and 1099 workers for the tech industry. Two years ago we started to get a ton of demand from data labeling companies and more recently foundation model cos who are doing DIY data labeling. Companies are converting 1099 workforces to W2 to avoid misclassification. Or they're trying to button up their use of 1099 to avoid being offside.
by gpvos on 3/3/25, 12:38 PM
> All labellers are either licensed pilots or controllers (or VATSIM pilots/controllers).
I would think such people can make better money by actually working as a pilot or controller?
by turtlebits on 3/3/25, 4:42 PM
"and assess financial penalties for failed tests"
That's an immediate nope for me. I don't care if I can file a dispute, unless I can resolve it then and there, I'm not going to be at the whim of some faceless escalation system, or an uninformed CS agent.
by v9v on 3/3/25, 9:31 AM
> Still, expert reviewers will occasionally disagree in their labelling. To ensure quality, an audio clip [box characters], at which point [...]
Have they censored their own article?
by llm_trw on 3/3/25, 4:30 AM
Data is king. Even when a new better model comes along a high quality dataset is still just as valuable.
Paying top performers above market rates to do nothing but data labelling is a moat that just keeps getting deeper.
by stevage on 3/3/25, 10:29 PM
So they are building a system which has all the hallmarks of an extremely addictive game. But that's ok because they pay the players a small amount of money?
They didn't even address the wellbeing of players, managing addiction and overwork etc.
by neilv on 3/3/25, 6:00 AM
I think this didn't age well, for HN, and it prompts some serious questions about our techbro startup culture.
> Obvious but necessary: to incentivize productive work, we tie compensation to the number of characters transcribed, and assess financial penalties for failed tests (more on tests below). Penalties are priced such that subpar performance will result in little to no earnings for the labeller.
So, these aren't employees? The writeup talks about not trusting gig workers, but it sounds like they have gig workers, and a particularly questionable kind.
Not like independent contractors with the usual freedoms. But rather, under a punishing set of Kafkaesque rules, like someone was thinking only of computer programs, oops. "Gamified", with huge negative points penalties and everything. To be under threat of not getting paid at all.
I see that this article is dated the 16th, so it's before the HN outrage last week, over the founders who demoed a system for monitoring factory worker performance, and were ripped a new one online for dehumanizing employees.
Despite the factory system being not as invasive, dehumanizing, and potentially labor law-violating as what's described in this article: about whip-cracking of gig workers, moment-to-moment, and even not paying them.
I'm not even sure you'd get away with calling them "independent contractors", under these conditions, when workers save copies of this blog post, to show to labor lawyers and state regulators.
(Incidentally, I wasn't aware that a company working in aviation gets skilled workers this way. The usual way I've seen is to hire someone, with all the respect, rights, and benefits that entails. Or to hire a consultant who is decidedly not treated like a gig worker in a techno-dystopian sweatshop.)
I don't want Internet mob justice here, but I want to ask who is advising these startups regarding how they think of their place in the world, relative to other humans?
I can understand getting as far as VC pitches while overwhelmed with fixating on other aspects of the business problems, and still passing the "does this person have a good enough chance to have a big exit" gut feel test of the VCs. But are there no ongoing checks and advising, so that people don't miss everything else?
by blitzar on 3/3/25, 9:43 AM
Pivot to sweat shop.