from Hacker News

Mind the Trust Gap: Fast, Private Local-to-Cloud LLM Chat

by wolecki on 5/13/25, 3:31 PM with 2 comments

  • by wolecki on 5/13/25, 3:31 PM

    A fast Trusted Execution Environment protocol utilizing the H100 confidential mode. Prompts are decrypted and processed in the GPU enclave. Key innovation is that it can be really fast especially on ≥10B parameter models, with latency overhead less than 1%. Like in CPU confidential cloud computing, that opens up a channel for communication with cloud GenAI models that even the provider cannot intercept. Wonder whether something like that could boost the trust for all the AI neoclouds out there.
  • by danbiderman on 5/13/25, 3:43 PM

    author here!