from Hacker News

Introducing the Open Images Dataset

by hurrycane on 9/30/16, 5:38 PM with 36 comments

  • by imh on 10/1/16, 12:30 AM

    Lawyers are funny:

    >Today, we introduce Open Images, a dataset consisting of ~9 million URLs ... having a Creative Commons Attribution license* .

    Then the footnote below:

    >* While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

    I think this might be the most blatant instance I've ever seen of, "We have to write this even though it's essentially impossible for you to actually follow our directions."

  • by transcranial on 9/30/16, 8:17 PM

    Interesting that the base data consists of URLs. I guess it makes sense given copyright issues. Anybody know what the ballpark expected half-life of such URLs?
  • by diyseguy on 10/1/16, 12:49 AM

    Any guesses on how large the resulting dataset would be if you actually downloaded all the images? I imagine the urls will get removed in a hurry as everybody starts automating it.
  • by devindotcom on 9/30/16, 8:05 PM

    First video, now images - wonder if speech and others are on the way?

    It's nice that they're doing this, helps advance the art I think. But it also puts a lot of smaller operations in unis sort of under the Google system in that they're best compared to Google's ML work and others using these datasets. It's a small way of stacking the deck to make Google and DeepMind more embedded in the community.

    That said, its utility for others surely outweighs the strategic advantage gained here, so I for one welcome these libraries. A lot of work goes into them. Hopefully others will release theirs as well.

  • by zappo2938 on 10/1/16, 5:55 AM

    I'm glad I'm getting a return on all the effort clicking street signs and store fronts on reCaptcha.
  • by pilooch on 10/1/16, 6:53 PM

    I've put an efficient downloader here for the interested crowd: https://github.com/beniz/openimages_downloader It's a fork of the one script I used to grab Imagenet.
  • by dharma1 on 9/30/16, 10:58 PM

    Is there a link to the trained model somewhere?
  • by rocky1138 on 9/30/16, 8:24 PM

    Are there any other libraries that are similar?
  • by Omnipresent on 10/1/16, 1:01 AM

    Looking forward to someone trying tensorFlow CNN on this