from Hacker News

Open Flamingo – open framework to train multimodal LLMs

by mpaepper on 3/28/23, 8:47 PM with 25 comments

  • by ftxbro on 3/28/23, 9:00 PM

    In the demo I put the obama prank photo http://karpathy.github.io/2012/10/22/state-of-computer-visio... and asked "Why is this picture funny?" and it responded "Question: Why is this picture funny? Answer: President Obama is taller than the average person."
  • by yeldarb on 3/28/23, 9:54 PM

    I always like to try these zero-shot models on things outside of the "normal" COCO classes. Here are some chess board queries:

    Counting: https://imgur.com/KTuQ1Bv

    Parse the chess board: https://imgur.com/2zYFK1P

    (Result): https://imgur.com/Ei4MAl7

    Few-Shot Object Detection (Pascal VOC): https://imgur.com/gZkDMn8

    Few-Shot Object Detection (simplified): https://imgur.com/Hk8QGMd

    Not quite there yet. I've been more impressed with the other new zero-shot multimodal models like Grounding DINO and Azure Dense Captioning. Really looking forward to putting multimodal GPT-4 through its paces as well.

  • by vagabund on 3/28/23, 9:23 PM

    Even at this scale the model's able to answer questions fairly impressively, but I created an image with some distinct shapes in different positions and it didn't go well [0]. I think however they're doing the image encoding doesn't capture positional information which, to my mind, limits a lot of use cases.

    [0] https://i.postimg.cc/GtrGs8mw/Screenshot-2023-03-28-at-5-19-...

  • by mpaepper on 3/28/23, 8:59 PM

    This is awesome work and they also provide their 9B OpenFlamingo model which is based on Llama:

    https://huggingface.co/openflamingo/OpenFlamingo-9B

  • by dfrankle on 3/28/23, 11:22 PM

    What are the key features of Open Flamingo, and how does it compare to other frameworks for training multimodal LLMs?
  • by juxtaposicion on 3/29/23, 1:10 AM

    What’re the techniques that’ll get this to run on a single GPU?
  • by duxup on 3/28/23, 8:58 PM

    That title is pretty impressive/ big on mobile!