by unixpickle on 3/15/25, 3:33 PM with 22 comments
The backend is written in Swift, and is hosted on a single Mac Mini. It performs nearest neighbors on the GPU over ~3M product images.
No vector DB, just pure matrix multiplications. Since we aren't just doing approximate nearest neighbors but rather sorting all results by distance, it's possible to show different "variety" levels by changing the stride over the sorted search results.
Nearest neighbors are computed in a latent vector space. The model which produces the vectors is also something I trained in pure Swift.
The underlying data is about 2TB scraped from https://www.shopltk.com/.
All the code is at https://github.com/unixpickle/LTKlassifier
by araes on 3/16/25, 2:09 AM
Picture leads to books leads to toys leads to ... a television that emits heat like a fireplace. I actually kind of want a television that emits heat / cold in response to what's on television (the product was just a tv fireplace that made heat).
Notably, like others have pointed out, and the author already addressed, the model definitely starts giving some rather "strange" results when you travel far away from the central training theme. Just click on a kid's hoverboard, and it gives me vacuum cleaners and leaf blowers. Dinosaur toy, and not a dinosaur in the results.
They're still interesting, and the main idea of having a "variety" slider with nearness of similarity was a cool feature for image browsing like most image searches. Be kind of nice if some browser image searches would let you have a "variety" or "nearness" slider when you're just looking at similar images.
by IgorPartola on 3/15/25, 4:17 PM
Also this is what I imagine Stitch Fix uses for their stylists. I wish there was a polished stylist service that didn’t also have me buying clothes from them. I don’t need a $60 white T shirt or a $120 basic jean jacket but I do want to have styles that look good specially for me.
by crusty on 3/15/25, 8:28 PM
Our feels like there's a bit of a gamification in just clicking one more time, like "I like this, but if I click one more, maybe I'll like something in the next set even more." And repeat forever - like a great (window) shopping tool that doesn't result in much buying. But I'm not a shopper/consumer, so maybe my impression is not representative.
by itake on 3/15/25, 10:46 PM
I’d think the model should focus on the patterns and cut, not which way they are laying for the marketing photo.
by abeppu on 3/15/25, 9:39 PM
I think they died over legal issues with rights to product images.
But in the vibewall demo, I wonder if the embedding is capturing the right similarity concept for this application. E.g. in the results most similar to this men's polo, I see a bunch of henleys, a women's quarter-zip pull-over, a women's full-zip fleece, a men's tank, a women's top with a plunging neckline, even a baby wrap! These are appropriate to be worn by different people in different social contexts and in different seasons. The main visual similarity seems to be that they include human upper bodies on white backgrounds? https://vibewall.shop/?id=c43bc222-e68b-11ef-8208-0242ac1100...
by ResearchAtPlay on 3/15/25, 4:31 PM
Would you mind sharing how you trained the model to produce the vectors? Are you using a vision transformer under the hood with contrastive training against price, product category, etc.?
EDIT: I see that the training script is included in the repo and you are using a CNN. Inspiring work!
by binarymax on 3/15/25, 8:59 PM
by lavela on 3/15/25, 7:14 PM
If anyone reimplements this for men's fashion let me know! I think this tool is great for anyone who isn't well educated in terms of fashion and I guess it is safe to say that this applies to men more often than to women.
by thinkingemote on 3/15/25, 4:07 PM
"hat" gives a range of poses
by fredophile on 3/16/25, 2:58 AM
by whiplash451 on 3/15/25, 5:17 PM
by lgvld on 3/17/25, 3:29 PM
by 6stringmerc on 3/15/25, 4:23 PM