by lurker137 on 6/4/22, 1:20 PM with 108 comments
So my questions are:
- Why isn't diagram generation automated as part of the build process (UML or otherwise)?
- Why aren't code visualization tools more popular? The options out there seem outdated
- Would you want to use these tools? What would be your ideal tool?
Edit: looks like this is a duplicate question https://news.ycombinator.com/item?id=31569646
I can't delete it so feel free to discuss more
by mattm on 6/4/22, 5:32 PM
After trying various diagramming tools and dragging around boxes and lines, I settled on PlantUML which makes diagrams much much easier to create and modify. It cuts out a lot of the pain of diagramming with the mouse which means there is less resistance to creating diagrams and I do it more.
To your question, "Why isn't diagram generation automated as part of the build process" - one thing I've found that would be difficult to solve is the level of detail you need in the diagram. For instance, in a very complex system with many decision branches, a diagram with every branch would not be helpful. There are cases where I want a high-level component overview but don't want to clutter up the diagram with lots of details. And yet there may b cases where I do want some more detail but may be only in a certain section of the code. I think this judgement of detail tradeoffs is what would be the hardest problem to solve for diagram generation tools. You want enough detail to be useful but not too much to be overkill.
by HighlandSpring on 6/4/22, 2:37 PM
The closest I found that solves this problem is https://c4model.com/ but you still need the code to turn your code into these markups. Can this be well inferred from code alone without framework specific interpreters? I doubt it.
And then you still need a frontend to zoom and navigate the ridiculous amount of hierarchy found within any modern software architecture, e.g microservices.
It also doesn't help microservices patterns also prescribe that you don't share repositories or code. So now you also need to pattern match untyped references across these codebases.
This is a lot of convention and tooling that I'm not sure exists.
Edit: and this is before even getting into version control and reconciling the target->as-is iterative loop.
by CipherThrowaway on 6/4/22, 5:57 PM
Generation of legible diagrams could be accomplished on a domain or framework basis where code is subject to local patterns and can be structured "for" generation. We see this with things like OpenAPI schema generation.
Ultimately I think diagramming isn't prioritized because diagrams themselves aren't that valuable. They're just a medium for the actually valuable thing: high level representations.
by cheunste on 6/4/22, 4:12 PM
I vaguely recall Visual Studio has this option where you can generate some sort of class diagram. It looked like shit the last time I used it (~2019) especially as your classes get more and more functions built into it. I also can't imagine how shitty it looks for codebases that have a significant coupling problem.
Furthermore, creating a UML diagram is a documentation process rather than something that should be automatically built in. I put it on the same level as writing a document in a word doc or something that's done as the project gets closer to being finished. Some places can live with it, a lot of places (actual software companies) probably do not as they move unreasonably fast (Agile) which does not even allow time for documentation or they just purposely neglect documentation.
> Why aren't code visualization tools more popular? The options out there seem outdated
Because they look like shit. I tried mermaid with markdown, I was not happy with the results, I tried plantUML back in 2019, I hated how it ended up looking, I hated how I have to install java for it, and I gave up on it pretty quickly.
The only code visualization tool I ever use is either draw.io or MS Visio. At lease there's a plugin for that for VS Code.
> Would you want to use these tools? What would be your ideal tool?
Markdown with vim option. It also must have an option to force a top-down flow approach and not freaking forcing it to be a left-right layout
by flohofwoe on 6/4/22, 7:52 PM
In practice it's the same problem as "noodle graph" visual programming. It works well in some niches (e.g. creating shaders in graphics programming, or sometimes describing AI tasks in game programming), but it completely breaks down outside those niches.
by Weidenwalker on 6/4/22, 2:59 PM
At the moment, codeatlas is just the static gallery, but we're only a few weekends away from releasing a Github action that deploys this diagram on github pages for your own repos - if you're interested, feel free to watch this repo: https://github.com/codeatlasHQ/codebase-visualizer-action
OP, how close is this to what you had in mind in your question?
EDIT: fixed broken links :o
by porcoda on 6/4/22, 3:21 PM
People do want it (contrary to the common HN refrain of “well, I don’t want it so clearly nobody wants it”). We’ve had customers where I work specifically ask for these kinds of tools. They’re just harder than they seem to write, not only for the parsing reason I mention above. For many codebases you see a giant ball of spaghetti if you look at the full graph, or the layout algorithm gives you something gigantic and hard to browse. That’s a deficiency in graph visualization tools: again, a hard problem with little good tooling out there.
I’d love to see more work in this area since there do exist people who see value in it, contrary to the skeptics.
by rgoulter on 6/4/22, 2:34 PM
With a manually constructed diagram, I have leverage to handwave irrelevant details away.
Perhaps to compare with documentation: it's easy to automatically describe things like types, and maybe callgraphs, but there's value in having prose which explains details about the interface which the program's type doesn't reveal. - With diagrams to visualise a system, the significance (or incidental nature) of the relationships may be hard to pick automatically.
by sidlls on 6/4/22, 4:29 PM
I wouldn't use these tools anyway, to be honest. They have some limited utility when constrained to small components/parts of an application (e.g., self-contained libraries), but for understanding systems as a whole there is too much to have effective reverse-engineering into a visualization (in my opinion).
by charlieflowers on 6/4/22, 4:37 PM
But, of course, it turns out, someone still needs to understand and be able to debug all the nuances that makes complex logic systems complex, especially when they're cobbled together from many underlying systems.
The real goal should be to take good programmers and magnify what they can do. But since the industry bought so hard into the naive vision, the industry is behind where it should be on a smarter vision.
by dahart on 6/4/22, 3:13 PM
Diagrams are sometimes unnecessary overhead early in a project. Sometimes I’ve used them and seen other people use them for initial design planning, especially if management needs to be involved or approve the plans & schedule. But by a year later, the design has grown and changed, and everyone on board is so familiar with the code, but also so pressed for time and feature delivery, that making diagrams doesn’t make sense: nobody involved at this point needs them. Two years later, when the code is getting complicated and slowing down, and you’re onboarding some new people, that’s when it might help to sketch the flow of code.
FWIW, sometimes a good profiling tool will show you and let you explore call stacks, call graphs, execution charts, etc. I often reach for a profiler when I’m new to a codebase. Flame charts are a fave of mine. You can find flame charts in Chrome’s debug tools, or in compiled language profilers like vtune or valgrind. Here’s a decent article on how to use them https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
Another issue is that well designed code bases diagram themselves by their module structure, while diagrams for poorly designed code bases may not help understand them at all. When code has too many side effects, or things are poorly or misleadingly named, when class boundaries aren’t well defined or the code has a lot of spaghetti, diagrams might not really help.
IMO two things worth doing are: get a mentor in any new codebase any time you can, and 2) start building your own arsenal of code diagramming tools, rather than wondering why or waiting for others to do it. Demonstrate the value of diagramming code to people around you and see if you can get it to catch on.
by rramadass on 6/4/22, 2:11 PM
To answer your question, people do use various tools to extract Class Hierarchies, Call Graphs, Cross-Reference listing etc. The other HN thread that you have linked to contains some details. Lots of people do use them. You can easily add Doxygen/CFlow etc. to your make files to generate the diagrams during every build. The key thing for usage is that do not try to comprehend the entire system as a whole (all but impossible for large systems) but localize your study to a module at a time. Once you have the different pieces mapped out, you can combine them by hand.
by mtkd on 6/4/22, 6:20 PM
In most complex systems the part where the magic happens is likely impossible for a tool to identify so would get lost in the noise of the cruft around it -- even for monoliths using frameworks and especially for anything distributed across microservices, and it's usually that aspect that is of most interest
by phailhaus on 6/4/22, 9:13 PM
This is very hard. And since it's hard, it's not automated. And since it's not automated, it goes out of date very quickly. I think that's the fundamental issue: keeping around evergreen documentation is a lot of overhead. There is no connection between the code and the diagrams, so it's too easy to change the code and not realize that the diagram needs to be updated too.
Another thing is that it's really the most useful for new members. If you've been working on the infra for a while, you already know the structure and you don't need the diagram. So teams tend to just avoid the diagrams altogether.
by photochemsyn on 6/4/22, 2:27 PM
https://docs.microsoft.com/en-us/visualstudio/modeling/map-d...
In many cases it might really be faster and easier to just diagram things out with a pad of paper and a pencil compared to setting up a tool like this and getting all the parts working correctly without any bugs.
That said, a virtual reality 3D tool for visualizing code base dependencies, internal structure, what parts call what other parts, internal exception handling etc. would be pretty cool. Maybe it's an area where AI machine learning could do something.
by Kapura on 6/4/22, 4:28 PM
It's another thing that can break, another element that needs to be maintained. In my experience there are very few pieces of code that will be able to run indefinitely without ever being updated, fixed, or re-examined at some point. The cost of adding more processes is not one-time, and it can be difficult to figure out what the time bounds are.
- Why aren't code visualization tools more popular? The options out there seem outdated
People who are interested in the structure of code are typically engineers, capable of writing and reading the codebase of interest. A UML diagram may be a way to understand an element of the system, but things such as in-line comments in the codebase itself are often more instructive on structure and function.
- Would you want to use these tools? What would be your ideal tool?
When I was in high school, if I didn't want to read, say Crime & Punishment, I could buy the Cliff's Notes version, and get a chapter-by-chapter summary of major characters, events, and literary techniques. In many ways, it contained all of the information of the book without the substance.
But importantly, it took significantly less time to read and fully process than the book, while being written in the same language. In code, it is already extremely easy to look thru header files, or collapse every function in your IDE to get a high-level overview of what data and methods exist. You can then dive in immediately to anything you would like to understand better ("what does 'UpdateSignificanceValue' really mean") and there's no mental overhead in translating from an encoded diagram into whatever your mental model is. This is why I do not personally see value in code visualization -- outside of notes I take that are relevant to any specific problem I am working on.
by mariojv on 6/4/22, 2:36 PM
As someone who is not a very visual person at all, I found it really nice to use to make my design docs more comprehensible to visual learners. I've gotten good feedback about designs every time I've used the tool.
by forinti on 6/4/22, 2:11 PM
I feel that people have just embraced Agile blindly and simply forgot about basic modelling.
by mynegation on 6/4/22, 4:33 PM
The main problem with low-level code visualization was that it did not add much to the well-formatted code representation in most cases. As for the high-level architecture extraction tool, which is more close to the question in the article, many links on the diagram do not just involve header inclusion, module import, method calls etc that are relatively easy to extract (not without its own challenges with virtual and indirect calls though). Users wanted to see Inter process communications (socket, queues, pipes, http connections) and extracting those is an uphill battle though we introduced some of it (lots of custom, platform specific code). Between this and knowing which connections are important and which are less so, automatically extracted diagrams were of limited value.
by prbs23 on 6/5/22, 2:25 AM
Fundamentally I think that the useful kind of software or system diagrams are always abstractions of the actual code. Figuring out the correct abstraction for the intended purpose requires either experience or a lot of trial and error. It may be possible for very specific applications, but I kind of doubt there is an algorithm to generate the content for a useful system diagram from the raw code.
Then there is the problem of rendering and layout out the diagram automatically. We have Graphviz and Mermaid, and probably others I haven't heard of, and while these do an okay job, I've never found their layout algorithms to be particularly great.
Overall, I don't think anything is going to be as useful as a manually drawn diagram, made with a specific intent in mind.
by groffee on 6/4/22, 1:30 PM
by dangoor on 6/4/22, 5:28 PM
This may help with adoption.
by AndyPatterson on 6/4/22, 2:08 PM
For instance, I frequently build small paper diagrams of different code paths through a component and nearly always find leaky abstractions, mixed layers of abstractions, weird cyclical dependencies, etc. etc. and there really is no clear way to diagram this. Instead, you sort of have to make judgements and assumptions to make the diagram concise and understandable; the sort of decisions that machines just aren't that good at.
On the other side, when code is simple and easy to follow then the pay off of building a diagram just isn't there.
by zamalek on 6/4/22, 5:14 PM
In my experience these tools generally exist to facilitate bikeshedding. The academic nature of UML makes it pretty useless in the real-world.
Something that could be useful is having a tool that uses knowledge about code to help you build a mindmap (but does not just puke the whole thing out). Huge bonus points for allowing the user to create late-bound relations and conceptual boundaries. Finally, one of these tools should be able to compare its output with the source, and indicate what has changed (through deletion/addition, or via VCS diff).
by chrismorgan on 6/4/22, 1:59 PM
But more seriously, it depends on how complex the system is and how it’s modelled. The case I was working with then transferred excellently to such diagrams (shallow and deep inheritance, and other forms of composition and linkage, with every box a link) and key-value property sheets about the types and the likes, but I don’t think I’ve encountered another system where anything even vaguely like that would work particularly well.
by giaour on 6/4/22, 2:00 PM
A visualization can be helpful as an artifact for non-technical colleagues, but I always end up hand-rolling those diagrams to highlight a specific aspect of the system and hide irrelevant features.
by trixie_ on 6/4/22, 9:08 PM
by bullen on 6/4/22, 7:49 PM
It has been used for database schemas, game story creation, cutting up sprites among other things...
Lately I made my own node database so I don't need this tool any longer, but I'm sure it will prove useful eventually again!
by AtlasBarfed on 6/4/22, 9:19 PM
Maybe you'd need a three dimensional model (really it's likely n-dimensional/hyperdimensional), 2D might not be enough.
Programming models get so convoluted with regards to state and interactions, both in-memory/in-process state and the stored state in databases/files.
Jurassic Park's 3D filesystem was a pie in the sky idea, what, 30 years ago? Holy crap it was 29 years ago or so. We've had REVOLUTIONS in 3D processing and games, and never even stratched the surface of basic 3D visualizations of code or data or filesystems or machine networks or the like.
And then even if you represent a diagram, it's useless without time visualization/traces, as kind of referred to by the RR debugger post. So for active code, you'd need simulation or actual run data to show what it does visually to be effective.
Really what's being dealt with here is probably related to theory of computation, and various results like the undecidability of the halting problem. The halting problem shows that even for very basic languages that are minimally Turing complete, the complexity shoots VERY QUICKLY to massive degrees of infinity/uncomputability.
So some catch-all visualizer for even general classes of Turing complete languages is probably impossible.
Maybe something like "this is a java spring app with well regimented separationg of data/domain classes and service classes"...
Even then once you get to database persistence ... wow.
And the amount of data you'd need to store for test runs.
Spring + TDD enforces a certain simplicity to a codebase, so perhaps you could make effective classes of visualization and tracing/replay visualization for that.
But it is telling these tools don't really exist, and attempts like UML were largely abandoned.
by tonnydourado on 6/4/22, 7:57 PM
Compare that to something like a call graph, or a module dependency diagram. The last will be more complete, but will convey *much less* information than the later.
This varies with technology, some will be more friendly than others to this kind of tool, I think that the more dynamic, the worse, but even in very static and consistent language, I would not bet on any tool being better than the brain's parser for a long time.
by rsstack on 6/4/22, 4:18 PM
by renox on 6/4/22, 4:04 PM
I was so happy not being the one doing this useless task..
by irrational on 6/4/22, 8:08 PM
by ben30 on 6/4/22, 2:06 PM
I find creation of a sequence diagram with class instances as columns and method names as arrows can help visualise things.
by akomtu on 6/5/22, 1:12 AM
In the future mainstream languages will probably have annotations to describe "role" of various things, specifically to enable diagram generators.
by icedchai on 6/4/22, 7:48 PM
Also I have rarely seen diagrams generated from code, the main exception being database ERDs ("reverse engineering.") Usually, those diagrams are also a mess.
Also, I almost forgot to mention: with "Agile", there usually is no design process. We'll just "fix it in the next sprint."
by prakashqwerty on 6/4/22, 3:06 PM
by everythingabili on 6/4/22, 4:23 PM
https://www.google.com/search?q=Prograph+CPX&rlz=1C5CHFA_enG...
You'd have high level classes, and low-level nitty gritty. You could edit your code as it was running (and then continue).
People prefer text (weirdly).
by dr_kiszonka on 6/5/22, 12:19 AM
I keep bookmarking threads like this one, but I haven't found anything useful for me. The closest one was SourceTrail, which is unfortunately not developed anymore.
With so much hiring and onboarding going on, I am surprised there isn't a market (or an offering) for such tools.
by mtoddsmith on 6/4/22, 5:20 PM
NDepend Dependency Graph https://www.youtube.com/watch?v=23fBxM2v22k
by ataylor284_ on 6/4/22, 9:50 PM
That said, diagrams can either be rare, focused, and useful; or common, unfocused, and distracting. Automated processes tend to generate the latter.
by vidanay on 6/4/22, 1:44 PM
by enos_feedler on 6/4/22, 4:55 PM
by lifeisstillgood on 6/4/22, 4:22 PM
Edit: another way of thinking about time is mutability so perhaps functional languages are more amenable to graphing.
by waynesonfire on 6/4/22, 3:30 PM
by DantesKite on 6/4/22, 2:48 PM
by abathur on 6/4/22, 5:32 PM
I've had a related thought/desire percolating... roughly: I wonder what interesting levers we could build if it was normalized (for both toolchains and projects) to create and publish the plaintext relationship graphs in a common easily-reused format.
I'll self-reply to elaborate a bit.
by abrookewood on 6/5/22, 7:25 AM
by la64710 on 6/4/22, 8:14 PM
http://logan.tw/posts/2015/03/10/trace-source-code-with-vim-...
by pshc on 6/4/22, 6:45 PM
Honestly, freehand diagrams are best. You’ll exercise your own understanding of the code base as you draw.
by rmah on 6/4/22, 4:22 PM
by justsomeuser on 6/4/22, 1:58 PM
One reason it is slower is that it is is difficult to create a map like diagram where you zoom in to get greater detail.
by high_byte on 6/4/22, 9:15 PM
dot graphs are popular with many tools but often barely or not interactive at all.
by billconan on 6/4/22, 3:30 PM
I want to see the big picture, what they can generate are direct translations of the code down to the line level.
by ooedemis on 6/4/22, 3:23 PM
by DriftRegion on 6/5/22, 6:31 AM
Sticking to the spirit of the question, I will categorize the tools mentioned in the comments. They solve very different problems, so the "ideal tool" absolutely depends on the task at hand. I think this rough categorization can help to select the right tool for the job.
The first class of tools are drawing tools. These tools aim to aid design and communication. plantUML http://www.plantuml.com/ is best suited to manual use. I use plantUML when sketching out designs. It's great for thinking about and communicating state machines, system architecture, and server-client or multiprocess interactions. The problems solved by plantUML can also be solved by general GUI diagramming software like Visio or LibreOffice Draw. I'll not dawdle on the pros and cons.
A second class of tools are static analysis tools. These are more of a microscope than they are a sketch pad. They operate on existing code and the user input comes as filters or options to get the desired level of detail. The examples here all produce graphviz .dot files:
bazel query for dependency graphs - https://bazel.build/docs/query-how-to
Doxygen C call graphs and struct inheritance graphs - https://www.doxygen.nl/manual/diagrams.html
radare2 generates call graphs from an .elf file - https://reverseengineering.stackexchange.com/a/9120
A DBC visualizer (CAN bus protocol specification file) - https://github.com/driftregion/dbcview (mine)
These tools are great not only for probing the depths of an unfamiliar codebase but also because they give insights to people already familiar with that codebase. For example: statically generated callgraphs showed that some debugging hooks had been left in. They've also shown duplicate code paths.
A third class of tools mentioned in the comments are dynamic analysis tools: profilers, tracers, debuggers. These are the oscilloscopes and signal analyzers of software engineering. These seem to be outside OP's query.
> Why isn't diagram generation automated as part of the build process (UML or otherwise)?
The output of class 1 and 2 tools is used in builds. Doxygen supports plantUML (class 1) and callgraph generation (class 2). Class 1 is obviously manual, and class 2 is also manual for reasons that others have mentioned here: namely adjusting to get the right level of detail. The text output of class 3 tools is often used in CI as a pass-fail indicator.
> Why aren't code visualization tools more popular? The options out there seem outdated.
My anecdotal experience is that these tools are specialized and therefore have a small audience. They solve problems related to design and analysis of software which is a small part of "real world" (sorry) software development.
edit:formatting
by ooedemis on 6/4/22, 3:23 PM