from Hacker News

Notes on Vision Pro

by firloop on 6/6/23, 9:55 PM with 688 comments

by jb1991 on 6/7/23, 6:43 AM
I use quite a few Apple products every day of the last 10 years, and I’m very impressed with certainly all the technology and reviews and explanations of how this device works and what the user experience will be. However, this is the first Apple device that really makes me pause. All of the marketing material, the WWDC videos, all of it feels very uncomfortable to see all these people isolating themselves in a room with big goggles on. I find it really hard to comprehend this is the direction that technology is taking our lives. Imagine walking into a house, and a lot of the people are just sitting around by themselves in corners with goggles on. The whole thing just feels very strange and post apocalyptic to me.
And these 3D spacial moment recordings, imagine children growing up in a house where when something nice happens, the parent rushes to put on goggles and stare at them through them, their little virtual eyes displayed on the outside, it’s frankly creepy to me.
by dvt on 6/6/23, 11:18 PM
This is a bit of a rant, but I bought a Quest Pro last year, and I feel it was a total waste of money. It's actually embarrassing how much of an unpolished piece of garbage it feels like. Sure, it's fun as a PCVR headset, but as a productivity tool or as a multimedia consuming device, or as a social tool, it completely misses the mark.
And I'd say that 80% of its failure can be attributed to bad software. It's buggy, I constantly have to re-calibrate my work/play area, the hand tracking is janky, and even though it supports (incredible) eye tracking, almost nothing takes actual advantage of the hardware.
It's insane that a billion-dollar company like Meta actually felt proud to release such a steaming pile of trash. Do they seriously expect VR enthusiasts to build an entire operating system for them? Much like the reports we're getting about the Vision Pro, the eye tracking in the Quest Pro feels like magic, but nothing uses it—basically it's irrelevant to navigating the operating system and barely any games use foveated rendering. It's infuriating.
If I was Zuck, I'd fire all my product managers. I say good for Apple. I'll probably be selling my Quest Pro and buying the Vision Pro.
by JohnBooty on 6/6/23, 10:45 PM
```
    But it does put an enormous amount of pressure 
    on the eye tracking. As far as I can tell so 
    far, the role of precise 2D control has been 
    shifted to the eyes.
```
I've got one good eye and one bad eye. The bad eye is legally blind, has an off-center iris and is kind of lazy w.r.t. tracking.
I'm extremely curious to know how Vision Pro deals with this. One certainly hopes there's some kind of "single eye" mode; certainly seems possible with relatively small effort and the % of the population who'd benefit seems fairly significant.
Eye tracking most certainly sounds like the way to go, relative to hand-waving.
The Minority Report movie probably set the industry back by a decade or two. Waving your hands around to control stuff seems logical but is quickly exhausting.
by smugma on 6/7/23, 1:24 AM
Reading through his notes, I couldn’t help but think “I don’t know this guy, but it seems like he has spent years thinking deeply about interfaces. Also, he sounds ex-Apple.” It made me smile when I got towards the end and my suspicions were confirmed.
Andy’s thoughts were infinitely deeper than anything else I’ve read about the device+interface so far. In particular, I liked his observations that so far it’s mostly a 2D plane, with exceptions like the Breathe app and a few demos, which might just be good for demos.
Seeing the heart in 3D reminded a lot of what we saw 10 years ago around what book publishers were trying to do with the iPad. Cool demos but limited real world use.
Here was one particularly thoughtful section:
“unlike the physical version, a virtual infospace could contend with much more material than could actually fit in my field of view, because the computational medium affords dynamic filtering, searching, and navigation interactions (see Softspace for one attempt). And you could swap between persistent room-scale infospaces for different projects. I suspect that visionOS’s windowing system is not at all up to this task.“
by lukevp on 6/7/23, 1:04 AM
From years of observing first-hand how hard it is to develop software, I’m shocked that Apple continually makes products that fit their aesthetic (eg. The crown and headband evoking the Apple Watch) and have coherent and mostly-baked software on launch like this. Is there a book about how they’re able to operate at scale and deliver such consistent things? Do they just have an insanely good product team? How do they document requirements? It’s fascinating to me, because I think it’s their real competitive advantage - they get to the finish line with product that would make sense to a single, invested, super smart person, instead of a pile of inconsistencies and incoherencies that contradict each other as you move through the application. The only other product I feel is delivered at Apple’s level is CloudFlare, and I think that’s why they’ve grown so fast.
by BoppreH on 6/6/23, 11:27 PM
I echo the awe at the hardware and the disappointment at the paradigm. I wish the keynote had three extra demos:
- Something whimsical to inspire developers. Give me files represented as physical blocks that I can pile, or a task manager that shows processes as plants around me. Close apps by scrunching their screen like paper. A globe with my geotagged pictures.
- A game, any game. Beat Saber was the killer app for me to get my Vive, and Valve already spent tens of millions of dollars to create a triple-A VR game. Neither seem compatible with the Vision Pro input methods. Apple could at least play their strengths, like an AR hide-and-seek, or a horror game with eye tracking.
- Content creation or professional work. The last thing we need is another passive device to watch 3D TikTok. Show someone customizing their "home screen" environment to look like a fantasy potion shop; or a mechanic looking at a 20x magnified broken part; or an SRE watching their kubernetes cluster as a floating 3D graph; or a VFX artist in the scene scrubbing forward and backward to adjust an effect.
It feels like inventing the first smartphone, camera apps and all, but it doesn't make calls and the only input is tapping on icons.
by jillesvangurp on 6/7/23, 6:32 AM
I think Apple is doing smart things here with their software:
- leverage existing content means they won't have an empty room problem or a big "now what?!" moment for users after the novelty wears off. Without Steam, most existing VR platforms would be completely pointless. So far VR is for games and most of those are published on Steam. That's because most VR hardware vendors suck at software and end up outsourcing that to game studios. Meta included. Despite their ambitions, their goggles are mainly devices for running games not developed by Meta.
- a focus on the living room experience with a high end movie theater experience running, again, off existing content. Genius move because people already spend lots of money on home theaters. Some people buy 10K$ plasma screens even. This market is very real. Also, involving Disney with their huge back catalog of completely unmonetized 3D movies ... so obvious.
- Extend the highly successful IOS ecosystem to the new experience. They have tens of millions of apps already. And they'll work fine in AR. Why not do that?
The strategy is about content. It's the right strategy. The first generation hardware is of course amazing too and it will bootstrap a new generation of application developers that will be using Apple SDKs and tools to target all this with new content. But to bootstrap the ecosystem they need an audience.
Here too the strategy is genius: SDKs are based on things developers already use. They just co-announced a push into gaming for Mac along with some convenient porting kits for developers. That was just a footnote buried deep into the mac related announcements. But of course this means more content coming to AR as well. All that lovely content currently available via Steam.
Top to bottom the strategy is about compelling content. I think it's going to work.
This is not the final answer to AR but the opening salvo in a decades long push to completely own this space. Step 0 is to get millions of these things in the market with enough content to get people hooked and keep them consuming content. The rest will come later. It's appropriately radical, pragmatic, and conservative at the same time.
by gnicholas on 6/6/23, 10:17 PM
> Given how ambitious the hardware package is, the software paradigm is surprisingly conservative.
I actually see this as a bit less surprising. After all, if you change the hardware in a big way, and you change the software in a big way, users will have a harder time adjusting to the new platform. Instead, they're making a big leap on the hardware side, keeping legacy apps and concepts, and then will presumably iterate to more 'native' experiences that were previously impossible/unimaginable/unintuitive.
by cobertos on 6/7/23, 5:14 AM
I find it interesting that Microsoft had _so much of this_ 9 years go. I developed with the HoloLens 1 around 2016 and I recall:
* Fingers together gesture clicking.
* Voice activated menu navigation. It was glitchy/I never used it though. In 2023 these sorts of systems are much better though.
* No controllers. It was all gesture based. Opening the start menu required your hand upturned, fingers together then outstretch. Kind of like an "open" gesture.
* "Pointing" based on head looking vector, which was annoying.
* Spatial anchors and being able to remember past spaces and how you used them. There was a whole set of SDK APIs for the spatial stuff built into Win 10
The Vision Pro iterates on some of these. Eye tracking for pointing and more cameras for tracking hand pose for clicking, to reduce the annoyance/strain. IMO all mandatory for long term usage (I frequently got motion sick with the HoloLens and the 3 months of debugging/developing on it were a challenge).
Maybe if Microsoft had switched to more commonplace display technology and continued to iterate on their product instead of let it languish, they could have had a solid competitor to this, if not maybe be first to the market.
by zmmmmm on 6/6/23, 10:44 PM
Interesting notes.
I'm disappointed even though it's entirely predictable that VisionOS is built on the iOS/iPadOS foundation rather than OSX. I guess we'll see how "walled in" it is but it's hard to see any reason Apple isn't going to be just as constraining and controlling about what happens on this OS as they are on iOS, if not more so. Which ultimately means I'll be very reluctant to ever adopt it in any meaningful way as my primary computing device.
by skydhash on 6/7/23, 1:51 AM
> What if programs live in places, live in physical objects in your space? For instance, you might place all kinds of computational objects in your kitchen: timers above your stove; knife work reference overlays above your cutting board; a representation of your fridge’s contents; a catalog of recipes organized by season; etc.
I'd prefer an OS that spans multiple devices. Something how computers behave in “The Expanse” where someone can flick stuff from their mobile device to another system for a better workspace. If I'm typing this on my workstation, but want to switch environment and bring something light like a tablet or just my phone, I could send the tab and it would retain its state. The closest to something like this is a server-client architecture like Logitech Media Server. A paradigm like this would be more useful than immersion.
This is also why I like Vim. Buffer is a separate concept from File, adding an extra flexibility – in terms of layout – to how I edit to complete my current task. I want a similar flexibility with applications, at least lightweight – state wise – ones.
CloudKit and Handoff are close, but very brittle. Apps like Bear, Reeder, Anybox, Things 3, make switching devices seamless. Another good example is handoff between the HomePod and the iPhone.
by armchairhacker on 6/6/23, 11:37 PM
I think the big usecase of the Vision Pro is just a larger screen and immersion. That's what it looks like Apple highlights in the demos.
Imagine developing with the ultimate 10x setup: several files, documentation, and debuggers open at once, with a zen nature background. Or making music, or painting, or even writing a research paper.
Imagine playing a video game but the environment wraps around you. Imagine watching movies in a virtual theater with a simulated 100-inch TV (one of Apple's demos). Even reading and browsing the web can be improved with an ambient environment and extra space for more stuff.
Is it worth $3500? For end-users probably not, but if it genuinely makes professionals and hobbyists substantially more productive, it will be worth $3500 or even way more. How much money would you spend to write, code, create faster?
Of course, this assumes the VR actually performs, and whether a bigger screen and immersion actually makes people work faster. As of now it seems VR is still a gimmick which impresses people at first, but doesn't provide much outside of niche experiences; which, if this holds for the Vision Pro, makes it very much not worth the price.
by david_van_loon on 6/6/23, 11:32 PM
His comments on anchoring virtual tools in physical space reminds me of the work Microsoft was doing with Hololens and Windows Mixed Reality. If I recall correctly, there were APIs for anchoring things in space.
by layer8 on 6/7/23, 12:00 AM
Regarding the last part, I was wondering if an AR UI would maybe start a trend back to a more 3D/skeuomorphic design (in the sense of NextStep/Windows 95), which then would also have a comeback on the desktop. That's mostly just wishful thinking on my part, though. ;)
by gnicholas on 6/6/23, 10:21 PM
PSA: this page only scrolls if your mouse is on the left side.
by mcintyre1994 on 6/7/23, 5:33 AM
Related to his comment about the windows, I’m really curious how window management works in this. They’ve shown you can have multiple windows kinda in the space at the same time. But how do you rearrange them/move them around in space? How do you close them? Is there some way to put 2 apps next to each other in the same window, and keep them together when you move stuff around? I’m mostly curious because interacting with this stuff is still a mess on iPad, and the hand gestures they demoed for Vision Pro are a lot more limited than what we’ve seen with multitouch.
by fio_ini on 6/7/23, 5:49 PM
I cant stop thinking about how cool it will be to finally remove all these monitors and wires from my desk. Operating inside of a virtual space has an amazing appeal to me to de clutter desk and office space area. Scale screens bigger and smaller and not be confined to: tight windows, bringing in more physical screens, setting them all up, switching between work spaces with hotkeys. Even the virtual workspace on a single physical screen uses extra mental RAM capacity to always be remembering which space the tabs are and where. This evolution frees up my neck, my posture, minus the bit of weight on my head, I can be sitting or standing. This next development from this company I'm connected to seems to go to such a more natural way of thinking about and interacting with our computing space. Besides all the great human side of it's value added, this headset is tangible for me because it's connected to an ecosystem that I'm part of. If I were to get a Quest for example, now what? I'm connected to Facebook? Great... The price for Apple Vision is pretty high but damn it's a pretty amazing piece of technology, and blows most other major headsets out of the water, there's over 20 million pixels for each eye for God's sake. The amount of virtual real estate we're talking about is like walls of 4k screens all wired together throughout your house a hundred different HDMI/DVI/DisplayPort Adapters to GPUs, what a freakin nightmare that would be. Easily all costing well over $3.5k.
by whywhywhywhy on 6/7/23, 12:13 PM
Hit me this morning they never showed anyone working with files.
Can’t believe after the failed attempts to kill files on iPhone and iPad they’ve learned nothing and still think we can pretend a professional work tool can exist without a proper file system.
Not saying it should have desktop icons strewn across your living room, just that the “files siloed in apps transferring via share buttons” iPad model is objectively productivity hell and puts mental load menu fumbling where MacOS is instinctive drag and drop.
by xivusr on 6/6/23, 10:15 PM
Great notes and I really liked his idea of large persistent info spaces and sharing those with others.
by ftxbro on 6/7/23, 8:48 AM
> "Something in the Apple omertà makes me uncomfortable naming my collaborators as I normally would, even as I discuss the project itself."
I don't understand this. It's things they made them sign? I saw another article saying they make them sign a promise to not show they are wearing the headset, allegedly because they might look like huge dorks and cause brand risk.
by elif on 6/7/23, 9:32 AM
I'm about 90% sold but I'd love to have someone's experience using it as an emacs terminal for coding.
As long as it doesn't get sweaty or nauseating after 30 minutes, I would still be content with it as a glorified monitor, not needing new UI paradigms or gesture controls.
Distraction-free huge monitor computing anywhere in the world is a huge advantage over laptops imo.
by coryfklein on 6/7/23, 3:49 PM
Side note unrelated to Apple Vision:
Why would anyone want text fully aligned to the left of the browser window?! At least it's distinctive, I don't think I've ever seen that on any other website in the last 15 years.
by 8jef on 6/7/23, 2:08 PM
To me, Apple Vision vision's, so-called spacial computing, is exactly what I was always looking for in a portable computing device: beefy processor, pixeless experience, app and file navigation using natural eyes and hands movement. This has the potential of replacing all desktop, laptop and tablet I may own.
Now I just have to wait for a Linux compatible version.
Because there's no way I'm gonna use any Apple software as if it was twenty five years ago, before iTunes insanities and whatever happened since. NFW.
by cudgy on 6/7/23, 10:29 PM
“… you won’t be confined to “a tiny black rectangle”. You’ll use all the apps you already use. You don’t have to wait for developers to adapt them. This is not a someday-maybe tech demo of a future paradigm; it’s (mostly) today’s paradigm, transliterated to new display and input technology.“
This is why I think this device will be different than its predecessors. There are practical reasons to buy it … assuming you can use it without getting sick or strained.
by robwwilliams on 6/7/23, 3:14 AM
Vision Pro opens opportunities for a new web interface of the type that Gary Flake and team were building with Silverlight and Pivot at Live Labs.
Gary has a great TED talk (Mar 2010) on this radical but intuitive spatial sorting interface. Foveation and hand movements are perfect user links.
https://www.youtube.com/watch?v=LT_x9s67yWA
Combined with a Vision Pro—OMG.
by sebkomianos on 6/7/23, 12:47 AM
Slightly offtopic: What a cool personal website!
by dboreham on 6/7/23, 12:03 PM
VR has been "just about to become mainstream" my entire career. I'm quite old now. Perhaps it'll break through before I retire.
That said, the same thing was true about video phones, to the point that everyone assumed humans do not need to see each other on a phone call. Then we had a pandemic and things changed.
by notnotjake on 6/7/23, 4:30 AM
I was hoping to see more paradigm shifting user interface ideas. I think those exist inside Apple Park, but are waiting for their time. It's telling that this was announced at the developers conference. The focus now is on the platform, and I think mind-blowing user experiences will come later. (At least I hope)
by wouldbecouldbe on 6/7/23, 12:20 PM
Yeah projecting they eyes is such a stretch, seems to suggest they think we really will hangout with eachother while having the goggles on. Maybe they are right in the end, but weird. Same with the cameras displaying the world instead of using the eyes. Interesting choice, but also a big bet.
by tromp on 6/7/23, 10:14 AM
> the amazing power of the iPhone map is that I can fly to Tokyo with no plans and have a great time, no stress.
Figuring out how you're going to get mobile data access in a foreign country without being charged through the nose does feel somewhat stressful...
by pflenker on 6/7/23, 7:04 AM
The vision pro showcases one use case which is compelling enough to make me consider buying it (in a couple of years, when it’s more settled): As a replacement for a monitor for work. Seriously, having a huge monitor everywhere is a game changer for me.
by brandall10 on 6/7/23, 4:57 AM
Anyone surprised they didn't incorporate an EEG sensor and use that in conjunction with eye tracking? They could acquire someone like Neurosky.
I thought at this stage, it would be fairly foolproof to use that at the very least to simulate something like a click, no?
by amelius on 6/7/23, 12:54 PM
> The hardware seems faintly unbelievable—a computer as powerful as Apple’s current mid-tier laptops (M2)
So I take it that you can basically leave your laptop home and take your Vision visor with you and work from a virtual environment?
by uxp100 on 6/7/23, 12:46 PM
Interesting to read that the iOS 7 depth/parallax effect was tried all over the UI. It makes sense to indicate interactivity in a better way, but probably the right call to ditch that.
by AlexanderTheGr8 on 6/7/23, 2:37 AM
Sorry for hijacking this thread for a question about personal curiosity but does anyone know of an affordable way of measuring pupil distance?
I want to measure my pupil distance while browsing social media. The reasoning is my reading in "Thinking Fast and Slow" that pupils dilate when we see something we find interesting / something we like / when we are thinking and vice-versa. I want to put it to the test and it seems like finally the consumer tech is close to making it possible.
Does quest support pupil tracking? I did some cursory research but couldn't find any reference for it. Industry pupil tracking headsets are way too expensive; My last hope is that someone will jailbreak vision pro...
by nullandvoid on 6/7/23, 6:14 AM
The primary input is eye tracking to me is really exciting, feels like the future.
I wonder what happens in games though. Will this not essentially be aimbot, looking at the target and then shoot?
by dav_Oz on 6/7/23, 2:05 PM
I dig his vision of huge, persistent infospaces after all the "method of loci" [0] is a very effective and well described mnemonic device at least since antiquity.
The basic observation: It takes very little effort and exposure to remember places (spatial information) in contrast to taking in linear heavy processed information from writing and numbers (e.g. "facts").
Combine these two and you have a powerful way navigating through the information space itself.
If you look at it, we are confined to rectangular boxes, fancy interactive books enhanced by moving pictures, audio and some very limited tactile feedback but nevertheless bound by the superstructure of a static tiny detail in our overall spatial awareness.
Of course we are now so used to the endless iterations of rectangular boxes that we are extraordinarily proficient at extracting a lot of richness/information from that: in scrolling, clicking, typing, imagining ... I'm reminded by the scene of "The Matrix" where Cipher let Neo glimpse the Matrix code on his screens: "You get used to it, I don't even see the code. All I see is blond, brunette, redhead."[1]
IIRC once a 50s western movie was shown to a rural "test" public from the Eurasian steppe some of who saw a moving picture for the very first time. The reactions were mostly enthusiastic during slow "panorama shots" but the audience reacted very agitated in "close-ups" or "medium shots" where e.g. in the heat of the action the horse's legs were "cut off".
We take it as given that we learn the language of movies and find them at times 'realistic' and 'immersive' when in fact contrasted to 'reality' itself qbit by qbit they are basically - with some rounding error - as highly processed as books and thereby rely heavily on our imagination filling the gaps.
While we are on the subject of of naïve visions, another area which isn't appreciated enough imho is the immense dexterity of our hands as with "vision processing" the devices themselves are glued on our dexterity of "exact" and "repetitive" moves, there is too little wiggle room for organic experimenting, refining and expression like one would naturally do for a musical instrument.
[0]https://en.m.wikipedia.org/wiki/Method_of_loci
[1]https://m.youtube.com/watch?v=MvEXkd3O2ow
by sw104 on 6/7/23, 11:34 AM
Has anything been said about how the eye-display proximity might affect vision?
Maybe this is a better time than ever to start a new opticians.
by gampleman on 6/7/23, 8:34 AM
It's a strange time to be releasing this product. Not from a technological perspective (clearly the hardware Apple managed to put together is extremely impressive and they seem to believe that it finally got good enough to have real appeal albeit at a rather hefty price), but from a political perspective.
It seems like the last few years have generally had a technological backlash with worries about both invasive spying and surveillance capitalism on the one hand and mental health issues (especially in teens) sprouting on the other.
So it seems strange to react to all that by offering a product that glues a computer straight on your eyeballs. Now of course Meta has already been going down that road, but let's just say that company doesn't have much to loose in the reputation department.
To be fair, Apple seems to have made a few interesting design choices. They seem to have gone out of their way to create an illusion of two way transparency to their headset making casual human communication at least theoretically possible (although how much conversation you can have with someone actually wearing these due to distraction is an open question - if it's like trying to talk to someone on their phone that it will be at best half their attention).
by stavros on 6/6/23, 11:17 PM
Does anyone know how these notes are generated? Is it a custom site, or something off the shelf?
by russellbeattie on 6/6/23, 10:41 PM
> But it does put an enormous amount of pressure on the eye tracking. As far as I can tell so far, the role of precise 2D control has been shifted to the eyes.
I've been researching eye tracking for my own project for the past year. I have a Tobii eye tracker which is probably the best eye tracking device for consumers currently (or the only one really). It's much more accurate than trying to repurpose a webcam.
The problem with eye tracking in general is what's called the "midas touch" problem. Everything you look at is potentially a target. If you were to simply connect your mouse pointer to your gaze, for example, any sort of hover effect on a web page would be activated simply by glancing at it. [1]
Additionally, our eyes are constantly making small movements call saccades [2]. If you track eye movement perfectly, the target will wobble all over the screen like mad. The ways to alleviate this are by expanding the target visually so that the small movements are contained within a "bubble" or by delaying the targeting slightly so the movements can be smoothed out. But this naturally causes inaccuracy and latency. [3] Even then, you can easily get a headache from the effort of trying to fixate your eyes on a small target (trust me). Though Apple is making an effort to predict eye movements to give the user the impression of lower latency and improve accuracy, it's an imperfect solution. Simply put, gazing as an interface will always suffer from latency and unnatural physical effort. Until computers can read our minds, that isn't going to change.
Apple decided to incorporate desktop and mobile apps into the device, so it seems this was really their only choice, as they need the equivalent of a pointer or finger to activate on-screen elements. They could do this with hand tracking, but then there's the issue of accuracy as well as clicking, tapping, dragging or swiping - plus the effort of holding your arms up for extended periods.
I think it's odd that they decided that voice should not be part of the UI. My preference would be hand tracking a virtual mouse/trackpad (smaller and more familiar movements) plus a simple, "tap" or "swipe" spoken aloud, with the current system for "quiet" operation. But Apple is Apple, and they insist on one way to do things. They have a video of using Safari and it doesn't look particularly efficient/practical to me [4].
But who knows - I haven't tried it yet, maybe Apple's engineers nailed it. I have my doubts.
1. https://uxdesign.cc/the-midas-touch-effect-the-most-unknown-...
2. https://en.m.wikipedia.org/wiki/Saccade
3. https://help.tobii.com/hc/en-us/articles/210245345-How-to-se...
4. https://developer.apple.com/videos/play/wwdc2023/10279/
by nerdbert on 6/7/23, 9:08 PM
I'd like to connect this thing to a drone and go flying.
by andsoitis on 6/7/23, 3:15 AM
Punch Cards --> Text Terminals --> GUIs/WIMP --> ?
? =! Vision Pro
by TheRealPomax on 6/6/23, 11:35 PM
How about we wait until someone actually has one to review?
by ganesh7 on 6/7/23, 10:41 AM
Every day another submission about these goggles nobody seems to like. I'm just annoyed by all that unwarranted attention. Just because it's Apple.