from Hacker News

Gemini Robotics On-Device brings AI to local robotic devices

by meetpateltech on 6/24/25, 2:05 PM with 90 comments

  • by jagger27 on 6/24/25, 7:31 PM

    These are going to be war machines, make absolutely no mistake about it. On-device autonomy is the perfect foil to escape centralized authority and accountability. There’s no human behind the drone to charge for war crimes. It’s what they’ve always dreamed of.

    Who’s going to stop them? Who’s going to say no? The military contracts are too big to say no to, and they might not have a choice.

    The elimination of toil will mean the elimination of humans all together. That’s where we’re headed. There will be no profitable life left for you, and you will be liquidated by “AI-Powered Automation for Every Decision”[0]. Every. Decision. It’s so transparent. The optimists in this thread are baffling.

    0: https://www.palantir.com/

  • by baron816 on 6/24/25, 5:22 PM

    I’m optimistic about humanoid robotics, but I’m curious about the reliability issue. Biological limbs and hands are quite miraculous when you consider that they are able to constantly interact with the world, which entails some natural wear and tear, but then constantly heal themselves.
  • by Toritori12 on 6/24/25, 3:44 PM

    Does Anyone know how easy is to join the "trusted tester program" and if they offer modules that you can easily plug-in to run the sdk?
  • by martythemaniak on 6/24/25, 4:59 PM

    I've spent the last few months looking into VLAs and I'm convinced that they're gonna be a big deal, ie they very well might be the "chatgpt moment for robotics" that everyone's been anticipating. Multimodal LLMs already have a ton of built-in understanding of images and text, so VLAs are just regular MMLLMs that are fine-tuned to output a specific sequence of instructions that can be fed to a robot.

    OpenVLA, which came out last year, is a Llama2 fine tune with extra image encoding that outputs a 7-tuple of integers. The integers are rotation and translation inputs for a robot arm. If you give a vision llama2 a picture of a an apple and a bowl and say "put the apple in the bowl", it already understands apples, bowls, knows the end state should apple in bowl etc. What missing is a series of tuples that will correctly manipulate the arm to do that, and the way they did it is through a large number of short instruction videos.

    The neat part is that although everyone is focusing on robot arms manipulating objects at the moment, there's no reason this method can't be applied to any task. Want a smart lawnmower? It already understands "lawn" "mow", "don't destroy toy in path" etc, just needs a finetune on how to corectly operate a lawnmower. Sam Altman made some comments about having self-driving technology recently and I'm certain it's a chat-gpt based VLA. After all, if you give chatgpt a picture of a street, it knows what's a car, pedestrian, etc. It doesn't know how to output the correct turn/go/stop commands, and it does need a great deal of diverse data, but there's no reason why it can't do it. https://www.reddit.com/r/SelfDrivingCars/comments/1le7iq4/sa...

    Anyway, super exciting stuff. If I had time, I'd rig a snowblower with a remote control setup, record a bunch of runs and get a VLA to clean my driveway while I sleep.

  • by suyash on 6/24/25, 3:36 PM

    What sort of hardware does the SDK runs on, can it run on a modern Raspberry Pi ?
  • by moelf on 6/24/25, 9:12 PM

    The MuJoCo link actually points to https://github.com/google-deepmind/aloha_sim
  • by TZubiri on 6/25/25, 3:04 AM

    Nice. I work with some students younger than 13, so most cloud and llms are quite tricky to work with, local only models like vertex are nice for this use case. I will try this as a replacement for chatgpt as Computer Vision in robotics like Lego Mindstorm
  • by zzzeek on 6/24/25, 6:33 PM

    THANK YOU.

    Please make robots. LLMs should be put to work for *manual* tasks, not art/creative/intellectual tasks. The goal is to improve humanity. not put us to work putting screws inside of iphones

    (five years later)

    what do you mean you are using a robot for your drummer

  • by polskibus on 6/24/25, 8:52 PM

    What is the model architecture? I'm assuming it's far away from LLMs, but I'm curious about knowing more. Can anyone provide links that describe architectures for VLA?
  • by Workaccount2 on 6/24/25, 7:19 PM

    I continued to be impressed how Google stealth releases fairly groundbreaking products, and then (usually) just kind of forgets about them.

    Rather than advertising blitz and flashy press events, they just do blog posts that tech heads circulate, forget about, and then wonder 3-4 years later "whatever happened to that?"

    This looks awesome. I look forward to someone else building a start-up on this and turning it into a great product.

  • by antonkar on 6/25/25, 12:40 AM

    The only way to prevent robots from being jailbroken and set to rob banks is to move GPUs to private SOTA secure GPU clouds
  • by sajithdilshan on 6/24/25, 3:23 PM

    I wonder what kind of guardrails (like Three Laws of Robotics) there are to prevent the robots going crazy while executing the prompts
  • by san1927 on 6/25/25, 8:32 AM

    meanwhile i will drink a coffee while it loads a reply from the API
  • by MidoriGlow on 6/25/25, 7:43 AM

    Elon Musk said in last week’s Starship Update: the very first Mars missions are planned to be flown by Optimus humanoid robots to scout and build basic infrastructure before humans arrive (full transcript + audio: https://transpocket.com/share/oUKhep6cUl3s/). If Gemini Robotics On-Device can truly adapt to new tasks with ~50–100 demos, pairing that with mass-produced Optimus bodies and Starship’s lift capacity could be powerful—offline autonomy, zero-latency control, and the ability to ship dozens of robots per launch.
  • by suninsight on 6/24/25, 3:21 PM

    This will not end well.