by biscuit1v9 on 5/17/24, 5:07 AM with 16 comments
by tripletao on 5/17/24, 7:23 AM
> The researchers defined 50 percent as success on the Turing test, since participants then couldn't distinguish between human and machine better than chance.
54% of GPT-4 conversations were judged to be human, so the "decoder" article says the Turing test has been passed--indeed, it seems more human than human. But the paper says:
> humans’ pass rate was significantly higher than GPT-4’s (z = 2.42, p = 0.017)
The seeming discrepancy arises because they've run a nonstandard test, in which the meaning of that 50% threshold is very hard to interpret (and definitely not what the "decoder" author claims). The canonical version of Turing's test is passed by a machine that can
> play the imitation game so well, that an average interrogator will not have more than a 70 percent chance of making the right identification after five minutes of questioning
The canonical experiment is thus to give the interrogator two conversations, one with a human and one with a non-human, and ask them to judge which is which. The probability that they judge correctly maps directly to Turing's criterion. If the two conversations were truly indistinguishable, then the interrogator would judge correctly with p = 50%; but that would take infinitely many trials to distinguish, so Turing (arbitrarily, but reasonably) increased the threshold to 70%.
That doesn't seem to be the experiment that this paper actually conducted. They don't say it explicitly, but it seems like each interrogator had a single conversation, with a human with p = 1/4. The interrogator wasn't told anything about that prior, leading them to systematically overestimate P(human). If every interrogator had simply always guessed "non-human", then they'd collectively have been right more often.
Even if the interrogators had been given that prior, very few would have the mathematical background to make use of it. GPT-4 is impressive, but this test is strictly worse than Turing's, whose result has clear and intuitive meaning.
by anonzzzies on 5/17/24, 6:13 AM
by somenameforme on 5/17/24, 6:34 AM
Modern takes generalize the identity to absurdity (with the identity being human or not), generally feature idiots (or people acting like such) for interrogators, and participants who are actively trying to act like a computer to trick the interrogator. Like in this article, the human is B and was asked, "What could you say to convince me that you're a human?" His response was "You just have to believe!" Why not just skip the pretext and just have the human start responding 01001001 01000010 01101111 01110100 01001100 01101111 01101100 to every question? And if all this nonsense wasn't enough, they bumped it up to 3 comps and 1 human pretending to be a comp. This isn't the Turing Test - it's complete LARPing!
[1] - https://redirect.cs.umbc.edu/courses/471/papers/turing.pdf