Amazing. I only had a chance to read the README.md but my question is this. What happens if you ask it questions that it could not possibly answer, as in if it were given a picture of the man playing tennis and you asked it what the score was? Is it capable of discerning between questions that cannot be answered (given a particular input) and those that can?
I have seen sites using captchas which ask such visual questions thinking that only a human can answer them. This project really makes me doubt the effectiveness of such techniques.