from Hacker News

Designing a website without 404s

by lilouartz on 6/11/24, 2:18 AM with 102 comments

  • by lm411 on 6/14/24, 3:46 AM

    I understand why the author thinks this is a good idea, but 404 exists for a reason. Literally "Not Found", not "I will guess at what you are looking for".

    The "implications for SEO" will almost certainly not be "positive" for quite a few reasons. As a general rule in my experience and learning, Google's black box algorithm doesn't like anything like this, and I expect you will be penalized for it.

    There are many good comments already and my suggestions would merely be repeating them, so, just adding my voice that this is likely to be a bad idea. Far, far, better to simply have a useful 404 page.

    Edit just to add: if you do something like this, make sure you have a valid sitemap and are using canonical tags.

  • by chrismorgan on 6/13/24, 6:54 PM

    WordPress does this sort of thing, or at least can (I don’t know, I’ve only ever been involved with a couple of WordPress sites). It’s obnoxiously awful. You end up with people accidentally relying on non-canonical URLs that subsequently change in meaning, or using links that seem to work but actually give the wrong content and you don’t notice. Both of these are very often much worse than a 404. There’s also an issue where articles can become inaccessible from their correct URLs, due to something else taking precedence in some way; I wouldn’t say that’s fundamental to the technique, but I do think that the philosophy encourages system designs where this can happen. (Then again… it’s WordPress, which is just all-round awful, technically; maybe I shouldn’t blame the philosophy.)

    So no, please don’t ever do this.

    But what you can do is provide a useful 404 page. Say “there’s no page at this URL, but maybe you meant _____?” Although there’s a strong custom of 404 pages being useless static fluff, you are actually allowed to make useful 404 pages. Just leave it as a 404, not a 3xx.

    (Also, care about your URLs and make sure that any URL that ever worked continues to work, if the content or an analogue still exists at all. Distressingly few people even attempt this seriously, when making major changes to a site.)

  • by conductr on 6/13/24, 6:29 PM

    > the implementation is so simple that I am surprised that more websites do not implement it

    I might be wrong but for me it’s because IRL this isn’t an issue. Users shouldn’t be finding/using random URLs to navigate the site. Where is this broken URL traffic coming from anyway? Are you trying to solve for people that randomly edit the URL and expect it work, most people don’t care about those users getting 404 because they should expect 404. They’re not real users they’re just playing around.

    However, If you purposely changed the URL format after a lot of people have the old format bookmarked or indexed on the web, then do a 301 redirect to the new URL.

    I’m not sure of the SEO implications of the described solution, however it seems like only risk and no upside.

  • by lilouartz on 6/14/24, 1:38 AM

    Hey everyone! Thank you for your feedback.

    Whether it is positive or negative, I do appreciate it as it helps me to learn and improve the product. I really didn't expect this to get any attention, let alone dozens of comments!

    To clarify: This was originally designed to help me auto migrate URL schema. I am learning as I develop this website, and SEO has been one of those vague topics where there are few hard rules. I wanted to leave space for experimentation. As I rolled it out, I became intrigued with how it functions and wanted to share my experiment with you to get feedback.

    Based on the feedback, I plan to change the logic such that:

    - I will track which URLs are associated with which products - If user hits 404, I will check if there was previously a product associated with that URL and redirect accordingly - If it is a new 404, I will display a 404 page which lists products with similar names

    I appreciate everyone hopping in to share their perspective!

  • by bad_username on 6/13/24, 6:39 PM

    > I am surprised that more websites do not implement it

    Maybe because it is not a good idea. Masking errors is harmful. The information "what you're looking for is not there" is very important, because it lets users identify that something is wrong. Smart redirection can be outright dangerous: what if I am buying a medication, and the smart website silently replaces the correct drug with something similar but wrong? Not to mention the pollution of the indexes of search engines with all the permutations of the same thing they may discover. Lastly, the U in URL stands for unique; the web is designed around unique locators and this rule shouldn't be broken without a very good reason.

    Show the user (and the crawlers) a 404, and suggest your corrected URL in the content of.the 404, and let the user know that it's a guess, so they make an informed choice about the situation.

  • by pimlottc on 6/13/24, 7:39 PM

  • by jamies888888 on 6/14/24, 3:57 PM

    If you're going to do this then make sure you use 302 redirects (temporary) and not 301 (permanent). Otherwise browsers (and Google) will cache the redirect, then if your fuzzy matched URL one day becomes a real URL, people might not be able to access it.

    Someone could seriously mess up your site by simply publishing their own page with many invalid links to your site, basically a dictionary attack, and if Google was to crawl those links, they'll cache all the redirects, and you'll have a hard time rectifying that if you were wanting to then publish pages on those URLs.

    Also to reiterate other suggestions - your idea is not great for many reasons already stated (even with 302s). As suggested, just simply have a 404 page with a "Did you mean [x]?" instead. Use your same logic to present [x] in that example, rather than redirect to it.

  • by david422 on 6/13/24, 6:37 PM

    This just seems like it's co-opting the url into a "search". Just have a normal search page where users can type in what they want if they want fuzzy matching. Or have the 404 page contain a search page. I don't think using the url as a "fuzzy search" is the right application.
  • by Levitating on 6/14/24, 8:32 AM

    This doesn't prevent linkrot but it can definitely cause it. If the author ever so slightly changes his algorithm URLs that previously worked (but were incorrect) could stop working.

    I don't see the problem that this is a solution for but I can see a couple of problems that this solution causes.

  • by cantSpellSober on 6/13/24, 7:46 PM

    Oops, I fat-fingered a URL for serotonin and ended up with a sleeping pill instead:

    https://pillser.com/supplements/merotonin

    I am surprised that more websites do not implement this!

  • by brianpan on 6/13/24, 6:48 PM

    Except that URLs are an API surface and congratulations, now you are supporting an infinite API forever.

    Vendors change product names, hyperlinks break! Fix bugs or change behavior, hyperlinks break! Do nothing, believe it or not, hyperlinks break!

  • by bena on 6/13/24, 7:27 PM

    I half like the idea.

    I do think that it's ok for several URLs to point to the same content. In his example all three are fine. I also tried the product code (6066) without any of the text and it worked fine as well.

    I've also noticed Lego's site does a version of this as well. https://www.lego.com/en-us/product/10334 will take you to the product page for the Retro Radio. However, I think Lego's site is just keying in on the product id as "retro-radio" doesn't work, but "ret-rad-10334" does.

    But there are limits.

    I put in the URL "https://pillser.com/supplements/go-fuck-yourself" to see what would happen. Now, I chose an offensive phrase to increase my chances of not coming close to any real product. I believe that URL should 404, but it took me to the page for the supplement "On Your Game" instead. If I had tried a real name and got taken to something with only the barest resemblance to the name I tried, I wouldn't be thinking "This must be the closest match". I'd think the site did something messed up or I typed something wrong or something malicious had happened.

  • by dewey on 6/13/24, 7:26 PM

    That seems like a solution for a problem that shouldn't exist in the first place.

    > a product is renamed, or the logic used to generate the URL changes.

    In that case you should store both urls and have a redirect_to_id or something similar to give search engines and users a proper 301. I don't see a use case for this fuzzy matching which will just make things not very explicit and unpredictable.

  • by yiiii on 6/14/24, 8:52 AM

    Well nice proof of concept, but probably this is not what you intended? https://pillser.com/supplements/fuck Maybe better create a 404 page that shows some "Did you mean..." content?
  • by victorbjorklund on 6/13/24, 6:46 PM

    Bad idea. Better would be to tweak it and instead serve a 404 with a recommendation "Maybe you meant X"
  • by zamadatix on 6/13/24, 6:33 PM

    I think this is a good idea for a site like this focused around searching for things because if you're going here to look up "revive" you're already expecting a search result not necessarily expecting to find a precise "revive" out of the gate. Adding this to the rest of the site or for other non-search focused sites showing something the user didn't ask for just because it's similar could be a lot worse experience than signaling "not found!". Especially for how rare a plain incorrect URL is a problem vs an expired resource.
  • by spaceywilly on 6/13/24, 6:50 PM

    I have to question how many people in 2024 are navigating the internet by going to the address bar and typing “foo.com/thing-im-looking-for”
  • by bombcar on 6/13/24, 6:24 PM

    This is both a good and bad idea.

    It is good because it will help most people and work for them.

    It's bad because sometimes it will make people think they've found what they were looking for when what they had doesn't exist at all - but it gave them something that sounds similar.

    I would at least have a "redirected from" banner at the top of the page when it triggers.

  • by vizualbod on 6/15/24, 6:44 AM

    Don’t listen to people telling you this is a bad idea. 301 similar 404 to the correct page and be done with it. Migrations are all too common. Google webmaster guidelines ruled and restriced creativity for long enough, so damn Google. If getting completely deindexed in Google would floor your business you got it all wrong anyway. Focusing on Google too much will get you exactly in a position where you don’t want to be. It’s their job to be able to index correctly all sorts of server configuration and assign link value correctly so their algorithms get it right. You are doing it well. Let’s just stop Google pleasing and focus on the user and good marketing
  • by bestest on 6/14/24, 6:09 AM

    I only redirect urls with ids. E.g. the canonical is "/:id-some-slug", so if the url matches the id but not the slug — I redirect it to the canonical.

    For everything else you can just have a nice 404 with suggestions of links that probably are a match.

  • by dev2point0 on 6/14/24, 4:50 PM

    I did something very similar to this but I have since change the way my website works. I pulled everything from a rss after page load then I came up with a way to show a n amount of results the user might have wanted to navigate. I think my approach was cool but ultimately I think it’s better to give a 404 error then to redirect someone. Here’s the post if anyone cares https://decode.sh/redirecting-users-to-the-correct-page-afte...
  • by temporarely on 6/13/24, 7:15 PM

    I think this confuses the distinction between a database and a website. What the OP describes is a ~NLP interface to a database of resources, not a "web of resources".

    The web by definition is a lazily-materialized query response graph.

  • by everythingabili on 6/13/24, 6:56 PM

    This really isn't crowing but we created a dynamic site that used HyperCard as a CGI and we did that in 1984. Not kidding. Mosaic'd up to the hilt.

    Still, it's a good idea.

    You can further this idea (especially when the slug returns nothing) by having this page also list "Best Bets" or what people most often come to your site for (regardless of any search query, perhaps, with their referrer, or on this day of the week etc)

    And additionally, put the slug (bar the dashes) into a search box so it might be ammended (but tell them that you didn't find anything and they need to try something else).

  • by hughesjj on 6/13/24, 6:39 PM

    I'd definitely cache+throttle that if possible. Cache the 404 response, and if using a shared dB make a custom user/resource group/workload queue and put this service at a lower priority

    Not sure how to best do that in postgres though, closest I can find is reserved connections per user. Idk maybe there's an extension or it's easy to do it in the webserver

    https://www.postgresql.org/docs/current/runtime-config-conne...

  • by evilc00kie on 6/14/24, 9:14 AM

    I'm a huge fan of "do what I say" and try to avoid "do what I mean". Sometimes you want magic but most times I want just static and plain logic over which I can reason about.

    Slightly OT but came into my mind when thinking about designing website related stuff: https://www.w3.org/Provider/Style/URI

  • by initramfs on 6/14/24, 6:27 AM

    I dislike when websites have an "access denied" page when accidentally clicking on an inaccessible or non-existent page. It's like a door hitting you on the way out of a store.

    Here is a partially fixed issue https://hatonthecat.github.io/Hurl/404.html

  • by a_imho on 6/14/24, 10:25 AM

    On a not too outdated and fairly vanilla and permissive ubuntu/FF+ublock setup I can see the page load then everything clears. Not even reader mode fixes it. Probably unrelated, but the very basics not working makes me want to dismiss design suggestions right off the bat.
  • by potocnik on 6/14/24, 9:03 AM

    Having a small website and query the databse each time some spider comes to the wrong url can be done. But having a large website with milions of different urls, its impossible to query the DB with some similarity function.
  • by amadeuspagel on 6/14/24, 5:15 AM

    If spore-probiotic is enough to identify the page, why use an id in the slug at all?
  • by rmbyrro on 6/13/24, 7:51 PM

    if you're reading this, please don't replicate the idea

    this is not going to end well...

  • by llmblockchain on 6/14/24, 3:21 PM

    I think a more honest solution would be to 404, and display a page with potential matches (like the top 5-10 similar-ly named URLs).

    "Hey that page doesn't exist, but there are some similar pages..."

  • by tiffanyh on 6/14/24, 11:57 AM

  • by dsr_ on 6/13/24, 7:25 PM

    It definitely has a 404 response, though:

    https://pillser.com/engineering/failure%20mode

  • by kragen on 6/14/24, 6:07 AM

    this is a terrible thing to do. not only does it mean a link to the site will unpredictably change its referent, it's also impossible to archive in a way you can restore
  • by wavemode on 6/13/24, 7:03 PM

    I'm struggling to understand the real-world benefit to this. Do most people even manually type URLs (especially long ones which would lend themselves to mistyping)?
  • by riiii on 6/14/24, 11:21 AM

    No. Do not do this.

    You show me the links you think I want, on the 404 page.

  • by jslakro on 6/14/24, 3:41 PM

    Before reading the post I thought the strategy would be similar to those wikis that allow to create new pages when not found the link
  • by mike-the-brain on 6/13/24, 7:09 PM

    pillser.com/supplements/vitamin-1973-omg-i-can-type-anything-here-and-it-still-works-i-dont-think-this-is-good-idea

    clickable:

    https://pillser.com/supplements/vitamin-1973-omg-i-can-type-...

  • by account42 on 6/14/24, 10:02 AM

    Apparently the author also likes websites without content because it just shows a white screen.
  • by ceving on 6/14/24, 7:40 AM

    If you solve the problem in the database you may bypass restrictions implemented in the server.
  • by theanonymousone on 6/14/24, 9:17 AM

    Am I the only one who thinks this is a recipe for disaster?
  • by 4ndrewl on 6/14/24, 6:19 AM

    These are 303s then?
  • by btbuildem on 6/13/24, 7:49 PM

    > https://pillser.com/engineering/yeah%20right

    404 Not Found Nothing to see here.

    Pillser