from Hacker News

Show HN: Repogather – copy relevant files to clipboard for LLM coding workflows

by grbsh on 9/12/24, 2:03 PM with 33 comments

Hey HN, I wanted to share a simple command line tool I made that has sped up and simplified my LLM assisted coding workflow. Whenever possible, I’ve been trying to use Claude as a first pass when implementing new features / changes. But I found that depending on the type of change I was making, I was spending a lot of thought finding and deciding which source files should be included in the prompt. The need to copy/paste each file individually also becomes a mild annoyance.

First, I implemented `repogather --all` , which unintelligently copies all sources files in your repository to the clipboard (delimited by their relative filepaths). To my surprise, for less complex repositories, this alone is often completely workable for Claude — much better than pasting in the just the few files you are looking to update. But I never would have done it if I had to copy/paste everything individually. 200k is quite a lot of tokens!

But as soon as the repository grows to a certain complexity level (even if it is under the input token limit), I’ve found that Claude can get confused by different unrelated parts / concepts across the code. It performs much better if you make an attempt to exclude logic that is irrelevant to your current change. So I implemented `repogather "<query here>"` , e.g. `repogather "only files related to authentication"` . This uses gpt-4o-mini with structured outputs to provide a relevance score for each source file (with automatic exclusions for .gitignore patterns, tests, configuration, and other manual exclusions with `--exclude <pattern>` ).

gpt-4o-mini is so cheap and fast, that for my ~8 dev startup’s repo, it takes under 5 seconds and costs 3-4 cents (with appropriate exclusions). Plus, you get to watch the output stream while you wait which always feels fun.

The retrieval isn’t always perfect the first time — but it is fast, which allows you to see what files it returned, and iterate quickly on your command. I’ve found this to be much more satisfying than embedding-search based solutions I’ve used, which seem to fail in pretty opaque ways.

https://github.com/gr-b/repogather

Let me know if it is useful to you! Always love to talk about how to better integrate LLMs into coding workflows.

  • by faangguyindia on 9/12/24, 3:51 PM

    I usually only edit 1 function using LLM on old code base.

    On Greenfield projects. I ask Claude Soñnet to write all the function and their signature with return value etc..

    Then I've a script which sends these signature to Google Flash which writes all the functions for me.

    All this happens in paraellel.

    I've found if you limit the scope, Google Flash writes the best code and it's ultra fast and cheap.

  • by mrtesthah on 9/13/24, 4:01 AM

    This symbolic link broke it:

    srtp -> .

      File "repogather/file_filter.py", line 170, in process_directory
        if item.is_file():
           ^^^^^^^^^^^^^^
    OSError: [Errno 62] Too many levels of symbolic links: 'submodules/externals/srtp/include/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp/srtp'
  • by reacharavindh on 9/12/24, 3:14 PM

    Do you literally paste a wall of text (source code of the filtered whole repo) into the prompt and ask the LLM to give you a diff patch as an answer to your question?

    Example,

    Here is my whole project, now implement user authentication with plain username/password?

  • by reidbarber on 9/12/24, 10:32 PM

    Nice! I built something similar, but in the browser with drag-and-drop at https://files2prompt.com

    It doesn’t have all the fancy LLM integration though.

  • by fellowniusmonk on 9/12/24, 9:50 PM

    This looks very cool for complex queries!

    If your codebase is structured in a very modular way than this one liner mostly just works:

    find . -type f -exec echo {} \; -exec cat {} \; | pbcopy

  • by smcleod on 9/13/24, 8:21 PM

    There's so many of these popping up! Here's mine - https://github.com/sammcj/ingest
  • by jondwillis on 9/13/24, 5:25 AM

    In this thread: nobody using Cursor, embedding documentation, using various RAG techniques…
  • by ukuina on 9/12/24, 3:24 PM

    It's fascinating to see how different frameworks are dealing with the problem of populating context correctly. Aider, for example, asks users to manually add files to context. Claude Dev attempts to grep files based on LLM intent. And Continue.dev uses vector embeddings to find relevant chunks and files.

    I wonder if an increase in usable (not advertised) context tokens may obviate many of these approaches.

  • by faangguyindia on 9/12/24, 3:48 PM

    LLM for coding is bit meh after novelty wears off.

    I've had problems where LLM doesn't know which library version I am using. It keeps suggesting methods which do not exit etc...

    As if LLM are unaware of library version.

    Place where I found LLM to be most effect and effortless is CLI

    My brother made this but I use it everyday https://github.com/zerocorebeta/Option-K