from Hacker News

New Top Score on AI2 Reasoning Challenge (ARC) Is 53.84%

by alexwg on 1/5/19, 4:19 PM with 6 comments

by vxl on 1/5/19, 5:37 PM
The challenge appears to be to create an AI that can answer the most multiple choice questions correctly.
Here's some of the questions from the dataset:
• Which of these is inherited by a person from his or her parents? (A) short hair (B) long arms (C) pierced ears (D) scar on the leg
• Which object occupies the greatest amount of space? (A) a galaxy (B) a black hole (C) a neutron star (D) a solar system
• What is a similarity between sound waves and light waves? (A) Both carry energy. (B) Both travel in vacuums. (C) Both are caused by vibrations. (D) Both are traveling at the same speed.
There's an entry called "Guess All" that scored 25% as you might expect.
They provide a list of 14 million science-related sentences (presumably for training) but there's no requirement to rely solely on them to solve the challenge. The list has been scraped from web search results so looks quite noisy.
by stakhanov on 1/5/19, 6:08 PM
A bit reminiscent of the "Recognizing Textual Entailment Challenge (RTE)" which was run under the "Text Analysis Conference" umbrella and hosted by NIST until it was discontinued a few years back. An interesting insight from a qualitative analysis of the deviations of submitted answers versus gold standard answers is that it can be explained surprisingly well by: random choice minus publication bias. See here:
http://richard.bergmair.eu/pub/thesis.pdf [page 43]
That's what the number "53.84%" sounds like to me.