by troelsSteegin on 6/12/25, 1:54 PM with 78 comments
by tbrownaw on 6/14/25, 10:32 PM
There's a huge problem with people trying to use umbrella usage to predict flooding. Some people are trying to develop a computer model that uses rainfall instead, but watchdog groups have raised concerns that rainfall may be used as a proxy for umbrella usage.
(It seems rather strange to expect a statistical model trained for accuracy to infer and indirect through a shadow variable that makes it less accurate, simply because it's something easy for humans to observe directly and then use as a lossy shortcut or to promote alternate goals that aren't part of the labels being trained for or whatever.)
> These are two sets of unavoidable tradeoffs: focusing on one fairness definition can lead to worse outcomes on others. Similarly, focusing on one group can lead to worse performance for other groups. In evaluating its model, the city made a choice to focus on false positives and on reducing ethnicity/nationality based disparities. Precisely because the reweighting procedure made some gains in this direction, the model did worse on other dimensions.
Nice to see an investigation that's serious enough to acknowledge this.
by thatguymike on 6/14/25, 10:50 PM
by GardenLetter27 on 6/15/25, 10:51 AM
What's the problem with this? It isn't racism, it's literally just Bayes' Law.
by 3abiton on 6/14/25, 10:26 PM
Very well written, but that last part id concerning and point to one part: did they hire interns? How cone they do not have systems? It just cast a big doubt on the whole experiment.
by bananaquant on 6/15/25, 7:45 AM
by BonoboIO on 6/14/25, 9:06 PM
Without figures for true positives, recall, or financial recoveries, its effectiveness remains completely in the dark.
In short: great for moral grandstanding in the comments section, but zero evidence that taxpayer money or investigative time was ever saved.
by tomp on 6/14/25, 9:26 PM
The model is considered fair if its performance is equal across these groups.
One can immediately see why this is problematic, easily by considering equivalent example in less controversial (i.e. emotionally charged) situations.
Should basketball performance be equal across racial, or sex groups? How about marathon performance?
It’s not unusual that relevant features are correlated with protected features. In the specific example above, being an immigrant is likely correlated with not knowing the local language, therefore being underemployed and hence more likely to apply for benefits.
by wongarsu on 6/14/25, 10:01 PM
The issue is that we don't know how many Danish commit fraud, and we don't know how many Arabs commit fraud, because we don't trust the old process to be unbiased. So how are we supposed to judge if the new model is unbiased? This seems fundamentally impossible without improving our ground truth in some way.
The project presented here instead tries to do some mental gymnastics to define a version of "fair" that doesn't require that better ground truth. They were able to evaluate their results on the false-positive rate by investigating the flagged cases, but they were completely in the dark about the false-negative rate.
In the end, the new model was just as biased, but in the other direction, and performance was simply worse:
> In addition to the reappearance of biases, the model’s performance in the pilot also deteriorated. Crucially, the model was meant to lead to fewer investigations and more rejections. What happened instead was mostly an increase in investigations , while the likelihood to find investigation worthy applications barely changed in comparison to the analogue process. In late November 2023, the city announced that it would shelve the pilot.
by zeroCalories on 6/14/25, 10:04 PM
by dannykwells on 6/16/25, 4:18 AM
by Jimmc414 on 6/14/25, 11:56 PM
Training on past human decisions inevitably bakes in existing biases.
by ncruces on 6/14/25, 11:53 PM
by octo888 on 6/15/25, 2:19 PM
by LorenPechtel on 6/15/25, 3:42 AM
Not all misdeeds are equally likely to be detected. What matter is minimizing the false positives and false negatives. But it sounds like they don't even have a base truth to be comparing it against, making the whole thing an exercise in bureaucracy.
by londons_explore on 6/15/25, 12:04 AM
Fraud detection models will never be fair. Their job is to find fraud. They will never be perfect, and the mistaken cases will cause a perfectly honest citizen to be disadvantaged in some way.
It does not matter if that group is predominantly 'people with skin colour X' or 'people born on a Tuesday'.
What matters is that the disadvantage those people face is so small as to be irrelevant.
I propose a good starting point would be for each person investigated to be paid money to compensate them for the effort involved - whether or not they committed fraud.
by djoldman on 6/14/25, 8:49 PM
It's generally straightforward to develop one if we don't care much about the performance metric:
If we want the output to match a population distribution, we just force it by taking the top predicted for each class and then filling up the class buckets.
For example, if we have 75% squares and 25% circles, but circles are predicted at a 10-1 rate, who cares, just take the top 3 squares predicted and the top 1 circle predicted until we fill the quota.
by talkingtab on 6/14/25, 10:35 PM
One has to wonder if the study is more valid a predictor of the implementers' biases than that of the subjects.