from Hacker News

Share AI Safety Ideas: Both Crazy and Not

by antonkar on 3/11/25, 2:08 PM with 2 comments

AI safety is one of the most critical issues of our time, and sometimes the most innovative ideas come from unorthodox or even "crazy" thinking. I’d love to hear bold, unconventional, half-baked or well-developed ideas for improving AI safety. You can also share ideas you heard from others.

Let’s throw out all the ideas—big and small—and see where we can take them together.

Feel free to share as many as you want! No idea is too wild, and this could be a great opportunity for collaborative development. We might just find the next breakthrough by exploring ideas we’ve been hesitant to share.

A quick request: Let’s keep this space constructive—downvote only if there’s clear trolling or spam, and be supportive of half-baked ideas. The goal is to unlock creativity, not judge premature thoughts.

Looking forward to hearing your thoughts and ideas!

by antonkar on 3/11/25, 2:17 PM
The product of 3 years of modeling ethical systems of the ultimate future, I call it Static Place AI, I think it's Xerox PARC moment right now, please steelman:
I think we’ll need a GUI for the models to democratize interpretability and let even gamers explore them. Basically to train another model, that will take the LLM and convert it into 3D shapes and put them in some 3D world that is understandable for humans.
**
Simpler example: represent an LLM as a green field with objects, where humans are the only agents:
You stand near a monkey, see chewing mouth nearby, go there (your prompt now is “monkey chews”), close by you see an arrow pointing at a banana, father away an arrow points at an apple, very far away at the horizon an arrow points at a tire (monkeys rarely chew tires).
So things close by are more likely tokens, things far away are less likely, you see all of them at once (maybe you’re on top of a hill to see farther). This way we can make a form of static place AI, where humans are the only agents.
And behind the "chewing monkey", you can see bananas, apples and tires (popular, average and unpopular next tokens), basically the future possible tokens. This way you have some "Multiversal Typewriter" by seeing 100s or 1000s of objects and subtitles under them. Both next tokens and future tokens, this way you can write stories, posts and possibly code that will be yours, you're the only agent but augmented by this non-agentic static place AI.
**
The problem is to map higher dimensional space onto 4D visualizations (as jgord said).
I think at least some lossy “compression” into a GUI is possible. The guy who’ll make the GUI for LLMs is the next Jobs/Gates/Musk and Nobel Prize Winner (I think it’ll solve alignment by having millions of eyes on the internals of LLMs), because computers became popular only after the OS with a GUI appeared. I recently shared how another one of those “multiversal apps” possibly can look (I call it "Spacetime Machine"): https://news.ycombinator.com/item?id=43319726
We'll also need an analog of a browser in this Static Place AI OS.
We need game developers and gamers, to have millions of eyeballs on the internals of multimodal LLMs (to democratize the interpretability them).
I was thinking and modeling the ultimate future of humanity (possibly billions of years from now) for 3+ years, you can find more in my profile. I also have the binary ethics that is based on Feynman's quantum path integrals and CBT, everything is based on a binary tree of freedoms (choices) and unfreedoms. Each agent is just a sum of freedoms (choices) and unfreedoms. I call it "physicalization of ethics", I have some primitive code I shared that basically can create a very crude model of all the agents evolving from the Big Bang all the way to final dystopia or utopia, check the link in my profile to see a screenshot