from Hacker News

A look at Cloudflare's AI-coded OAuth library

by itsadok on 6/8/25, 8:50 AM with 164 comments

  • by afro88 on 6/8/25, 11:15 AM

    > What this interaction shows is how much knowledge you need to bring when you interact with an LLM. The “one big flaw” Claude produced in the middle would probably not have been spotted by someone less experienced with crypto code than this engineer obviously is. And likewise, many people would probably not have questioned the weird choice to move to PBKDF2 as a response

    For me this is the key takeaway. You gain proper efficiency using LLMs when you are a competent reviewer, and for lack of a better word, leader. If you don't know the subject matter as well as the LLM, you better be doing something non-critical, or have the time to not trust it and verify everything.

  • by sarchertech on 6/8/25, 12:33 PM

    I just finished writing a Kafka consumer to migrate data with heavy AI help. This was basically best case a scenario for AI. It’s throw away greenfield code in a language I know pretty well (go) but haven’t used daily in a decade.

    For complicated reasons the whole database is coming through on 1 topic, so I’m doing some fairly complicated parallelization to squeeze out enough performance.

    I’d say overall the AI was close to a 2x speed up. It mostly saved me time when I forgot the go syntax for something vs looking it up.

    However, there were at least 4 subtle bugs (and many more unsubtle ones) that I think anyone who wasn’t very familiar with Kafka or multithreaded programming would have pushed to prod. As it is, they took me a while to uncover.

    On larger longer lived codebases, I’ve seen something closer to a 10-20% improvement.

    All of this is using the latest models.

    Overall this is at best the kind of productivity boost we got from moving to memory managed languages. Definitely not something that is going to replace engineers with PMs vibe coding anytime soon (based on rate of change I’ve seen over the last 3 years).

    My real worry is that this is going to make mid level technical tornadoes, who in my experience are the most damaging kind of programmer, 10x as productive because they won’t know how to spot or care about stopping subtle bugs.

    I don’t see how senior and staff engineers are going to be able to keep up with the inevitable flood of reviews.

    I also worry about the junior to senior pipeline in a world where it’s even easier to get something up that mostly works—we already have this problem today with copy paste programmers, but we’ve just make copy paste programming even easier.

    I think the market will eventually sort this all out, but I worry that it could take decades.

  • by aiono on 6/8/25, 11:25 AM

    I agree with the last paragraph about doing this yourself. Humans have tendency to take shortcuts while thinking. If you see something resembling what you expect for the end product you will be much less critical of it. The looks/aesthetics matter a lot on finding problems with in a piece of code you are reading. You can verify this by injecting bugs in your code changes and see if reviewers can find them.

    On the other hand, when you have to write something yourself you drop down to slow and thinking state where you will pay attention to details a lot more. This means that you will catch bugs you wouldn't otherwise think of. That's why people recommend writing toy versions of the tools you are using because writing yourself teaches a lot better than just reading materials about it. This is related to know our cognition works.

  • by throwawaybob420 on 6/8/25, 3:52 PM

    I’ve never seen such “walking off the cliff” behavior than from people who whole heartedly champion LLMs and the like.

    Leaning on and heavily relying on a black box that hallucinates gibberish to “learn”, perform your work, and review your work.

    All the while it literally consumes ungodly amounts of energy and is used as pretext to get rid of people.

    Really cool stuff! I’m sure it’s 10x’ing your life!

  • by ape4 on 6/8/25, 12:07 PM

    The article says there aren't too many useless comments but the code has:

        // Get the Origin header from the request
        const origin = request.headers.get('Origin');
  • by HocusLocus on 6/8/25, 10:29 AM

    I suggest they freeze a branch of it, then spawn some AIs to introduce and attempt to hide vulnerabilities, and another to spot and fix them. Every commit is a move, and try to model the human evolution of chess.
  • by kcatskcolbdi on 6/8/25, 10:29 AM

    Really interesting breakdown. What jumped out to me wasn’t just the bugs (CORS wide open, incorrect Basic auth, weak token randomness), but how much the human devs seemed to lean on Claude’s output even when it was clearly offbase. That “implicit grant for public clients” bit is wild; it’s deprecated in OAuth 2.1, and Claude just tossed it in like it was fine, and then it stuck.
  • by belter on 6/8/25, 10:35 AM

    "...A more serious bug is that the code that generates token IDs is not sound: it generates biased output. This is a classic bug when people naively try to generate random strings, and the LLM spat it out in the very first commit as far as I can see. I don’t think it’s exploitable: it reduces the entropy of the tokens, but not far enough to be brute-forceable. But it somewhat gives the lie to the idea that experienced security professionals reviewed every line of AI-generated code...."

    In the Github repo Cloudflare says:

    "...Claude's output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards..."

    My conclusion is that as a development team, they learned little since 2017: https://news.ycombinator.com/item?id=13718752

  • by keybored on 6/8/25, 12:21 PM

    Oh another one,[1] cautious somewhat-skeptic edition.

    [1] https://news.ycombinator.com/item?id=44205697

  • by kentonv on 6/8/25, 2:34 PM

    Hi, I'm the author of the library. (Or at least, the author of the prompts that generated it.)

    > I’m also an expert in OAuth

    I'll admin I think Neil is significantly more of an expert than me, so I'm delighted he took a pass at reviewing the code! :)

    I'd like to respond to a couple of the points though.

    > The first thing that stuck out for me was what I like to call “YOLO CORS”, and is not that unusual to see: setting CORS headers that effectively disable the same origin policy almost entirely for all origins:

    I am aware that "YOLO CORS" is a common novice mistake, but that is not what is happening here. These CORS settings were carefully considered.

    We disable the CORS headers specifically for the OAuth API (token exchange, client registration) endpoints and for the API endpoints that are protected by OAuth bearer tokens.

    This is valid because none of these endpoints are authorized by browser credentials (e.g. cookies). The purpose of CORS is to make sure that a malicious website cannot exercise your credentials against some other website by sending a request to it and expecting the browser to add your cookies to that request. These endpoints, however, do not use browser credentials for authentication.

    Or to put in another way, the endpoints which have open CORS headers are either control endpoints which are intentionally open to the world, or they are API endpoints which are protected by an OAuth bearer token. Bearer tokens must be added explicitly by the client; the browser never adds one automatically. So, in order to receive a bearer token, the client must have been explicitly authorized by the user to access the service. CORS isn't protecting anything in this case; it's just getting in the way.

    (Another purpose of CORS is to protect confidentiality of resources which are not available on the public internet. For example, you might have web servers on your local network which lack any authorization, or you might unwisely use a server which authorizes you based on IP address. Again, this is not a concern here since the endpoints in question don't provide anything interesting unless the user has explicitly authorized the client.)

    Aside: Long ago I was actually involved in an argument with the CORS spec authors, arguing that the whole spec should be thrown away and replaced with something that explicitly recognizes bearer tokens as the right way to do any cross-origin communications. It is almost never safe to open CORS on endpoints that use browser credentials for auth, but it is almost always safe to open it on endpoints that use bearer tokens. If we'd just recognized and embraced that all along I think it would have saved a lot of confusion and frustration. Oh well.

    > A more serious bug is that the code that generates token IDs is not sound: it generates biased output.

    I disagree that this is a "serious" bug. The tokens clearly have enough entropy in them to be secure (and the author admits this). Yes, they could pack more entry per byte. I noticed this when reviewing the code, but at the time decided:

    1. It's secure as-is, just not maximally efficient. 2. We can change the algorithm freely in the future. There is not backwards-compatibility concern.

    So, I punted.

    Though if I'd known this code was going to get 100x more review than anything I've ever written before, I probably would have fixed it... :)

    > according to the commit history, there were 21 commits directly to main on the first day from one developer, no sign of any code review at all

    Please note that the timestamps at the beginning of the commit history as shown on GitHub are misleading because of a history rewrite that I performed later on to remove some files that didn't really belong in the repo. GitHub appears to show the date of the rebase whereas `git log` shows the date of actual authorship (where these commits are spread over several days starting Feb 27).

    > I had a brief look at the encryption implementation for the token store. I mostly like the design! It’s quite smart.

    Thank you! I'm quite proud of this design. (Of course, the AI would never have come up with it itself, but it was pretty decent and filling in the details based on my explicit instructions.)

  • by djoldman on 6/8/25, 10:28 AM

    > At ForgeRock, we had hundreds of security bugs in our OAuth implementation, and that was despite having 100s of thousands of automated tests run on every commit, threat modelling, top-flight SAST/DAST, and extremely careful security review by experts.

    Wow. Anecdotally it's my understanding that OAuth is ... tricky ... but wow.

    Some would say it's a dumpster fire. I've never read the spec or implemented it.

  • by roxolotl on 6/8/25, 12:51 PM

    > Many of these same mistakes can be found in popular Stack Overflow answers, which is probably where Claude learnt them from too.

    This is what keeps me up at night. Not that security holes will inevitably be introduced, or that the models will make mistakes, but that the knowledge and information we have as a society is basically going to get frozen in time to what was popular on the internet before LLMs.

  • by dweekly on 6/8/25, 12:23 PM

    An approach I don't see discussed here is having different agents using different models critique architecture and test coverage and author tests to vet the other model's work, including reviewing commits. Certainly no replacement for human in the loop but it will catch a lot of goofy "you said to only check in when all the tests pass so I disabled testing because I couldn't figure out how to fix the tests".
  • by max2he on 6/8/25, 2:59 PM

    Interesting to have people submit their promts to git. Do you think it'll be generally an accepted thing or was this just a showcase of how they promt?
  • by epolanski on 6/8/25, 12:31 PM

    Part of me this "written by LLM" has been a way to get attention on the codebase and plenty of free reviews by domain expert skeptics, among the other goals (pushing AI efficiency to investors, experimenting, etc).
  • by ChrisArchitect on 6/8/25, 4:46 PM

    Related:

    I read all of Cloudflare's Claude-generated commits

    https://news.ycombinator.com/item?id=44205697

  • by OutOfHere on 6/8/25, 2:04 PM

    This is why I have multiple LLMS review and critique my specifications document, iteratively and repeatedly so, before I have my preferred LLM code it for me. I address all important points of feedback in the specifications document. To do this iteratively and repeatedly until there are no interesting points is crucial. This really fixes 80% of the expertise issues.

    Moreover, after developing the code, I have multiple LLMs critique the code, file by file, or even method by method.

    When I say multiple, I mean a non-reasoning one, a reasoning large one, and a next-gen reasoning small one, preferably by multiple vendors.

  • by menzoic on 6/8/25, 1:18 PM

    LLMs are like power tools. You still need to understand the architecture, do the right measurements, and apply the right screw to the right spot.
  • by bazhand on 6/9/25, 6:20 AM

    As a non developer, this was incredible useful to understand the prompt structure and apply to my own Claude Code.
  • by m3kw9 on 6/8/25, 5:21 PM

    For the foreseeable future software expertise is a safe job to have.
  • by user9999999999 on 6/8/25, 3:19 PM

    why on earth would you code oauth in ai at this stage?
  • by sdan on 6/8/25, 10:33 AM

    > Another hint that this is not written by people familiar with OAuth is that they have implemented Basic auth support incorrectly.

    so tldr most of the issue the author has is against the person who made the library is the design not the implementation?

  • by CuriouslyC on 6/8/25, 10:07 AM

    Mostly a good writeup, but I think there's some serious shifting the goalposts of what "vibe coded" means in a disingenuous way towards the end:

    'Yes, this does come across as a bit “vibe-coded”, despite what the README says, but so does a lot of code I see written by humans. LLM or not, we have to give a shit.'

    If what most people do is "vibe coding" in general, the current definition of vibe coding is essentially meaningless. Instead, the author is making the distinction between "interim workable" and "stainless/battle tested" which is another dimension of code entirely. To describe that as vibe coding causes me to view the author's intent with suspicion.

  • by jstummbillig on 6/8/25, 11:37 AM

    Note that this has very little to do with AI assisted coding; the authors of the library explicitly approved/vetted the code. So this comes down to different coders having different thoughts about what constitutes good and bad code, with some flaunting of credentials to support POVs, and nothing about that is particularly new.
  • by SiempreViernes on 6/8/25, 10:22 AM

    A very good piece that clearly illustrates one of the dangers with LLS's: responsibility for code quality is blindly offloaded on the automatic system

    > There are some tests, and they are OK, but they are woefully inadequate for what I would expect of a critical auth service. Testing every MUST and MUST NOT in the spec is a bare minimum, not to mention as many abuse cases as you can think of, but none of that is here from what I can see: just basic functionality tests.

    and

    > There are some odd choices in the code, and things that lead me to believe that the people involved are not actually familiar with the OAuth specs at all. For example, this commit adds support for public clients, but does so by implementing the deprecated “implicit” grant (removed in OAuth 2.1).

    As Madden concludes "LLM or not, we have to give a shit."