from Hacker News

Ask HN: What is the best software to visualize a graph with a billion nodes?

by throwaway425933 on 8/1/24, 6:35 PM with 109 comments

Currently I am using GraphViz. But I am not happy with the quality of output (It is writing a postscript file).

I want to be able to zoom in and zoom out. Graph has upto 100B nodes and is directed cyclic graph.

by bane on 8/2/24, 4:14 PM
Visualizing large graphs is a natural desire for people with lots of connected data. But after a fairly small size, there's almost no utility in visualizing graphs. It's much more useful to compute various measures on the graph, and then query the graph using some combination of node/edge values and these computed values. You might subset out the nodes and edges of particular interest if you really want to see them -- or don't visualize at all and just inspect the graph nodes and edges very locally with some kind of tabular data viewer.
It used to be thought that visualizing super large graphs would reveal some kind of macro-scale structural insight, but it turns out that the visual structure ends up becoming dominated by the graph layout algorithm and the need to squash often inherently high-dimensional structures into 2 or 3 dimensions. You end up basically seeing patterns in the artifacts of the algorithm instead of any real structure.
There's a similar, but unrelated desire to overlay sequenced transaction data (like transportation logs) on a geographical map as a kind of visualization, which also almost never reveals any interesting insights. The better technique is almost always a different abstraction like a sequence diagram with the lanes being aggregated locations.
There's a bunch of these kinds of pitfalls in visualization that people who work in the space inevitably end up grinding against for a while before realizing it's pointless or there's a better abstraction.
(source: I used to run an infoviz startup for a few years that dealt with this exact topic)
by viraptor on 8/2/24, 12:17 PM
It really feels like an under defined task. Do you actually need to see those nodes? At that scale, you never want to render 100B of them. Instead you would need some kind of density aggregation when zoomed out and moving to LoD style k-d tree partitioning when zoomed in. That's almost the area of rendering engines like Unreal's Nanite. You can create your own renderer for data like this, but game engines are likely your closest inspiration. Then again, unless you already have x/y coordinates ready, (based on graphviz I'm assuming you don't) even laying out the points will be a very heavy task. (The usual iterative force directed layout would likely take days)
But if you were my coworker I'd really press on why do you want the visualisation and if you can get your answers in some other way. And whether you can create aggregates of your data that reduces it to thousands of groups instead. Your data is a minimum of ~800GB if the graph is a single line (position + 64bit value encoding each edge, no labels), so you're not doing anything real-time with it anyway.
by david_p on 8/2/24, 6:14 PM
As many people already commented, no one actually visualizes graphs of that size at once.
Context: I’m the CTO of a GraphViz company, I’ve been doing this for 10+ years.
Here are my recommendations:
- if you can generate a projection of your graph into millions of nodes, you might be able to get somewhere with Three.js, which is a JS library to generate WebGL graphics. The library is close enough to the metal to allow you to build something large and fast.
- if you can get the data below 1M nodes, your best shot is Ogma (spoiler: my company made it). It scales well thanks to WebGL and allows for complex interactions. It can run a graph layout on the GPU in your browser. See https://doc.linkurious.com/ogma/latest/examples/layout-force...
- If you want to keep your billions of nodes but are OK with not seeing the whole graph at once, my company builds Linkurious. It is an advanced exploration interface for a graph stored in Neo4j (or Amazon Neptune). We believe that local exploration up to 10k nodes on screen is enough, as long as you can run graph queries and full-text search queries against the whole graph with little friction. See https://doc.linkurious.com/user-manual/latest/running-querie...
by CuriouslyC on 8/2/24, 12:25 PM
You don't. generate a hierarchical clustering of the data, then collapse nodes into groups to get under a data set size threshold at any given view distance. That gives you full interaction and the ability to do mouseover info on groups, while being able to zoom in and interact with individual nodes if you want.
by shoo on 8/2/24, 8:42 AM
what decision / downstream process is going to consume the 1B node graph render? is producing a render really necessary for that decision, or is rendering the graph waste?
is there a way you can subsample or simplify or approximate the graph that'd be good enough?
in some domains, certain problems that are defined on graphs can be simplified by pre-processing the graph, to reduce the problem to a simpler problem. e.g. maybe trees can be contracted to points, or chains can be replaced with a single edge, or so on. these tricks are sometimes necessary to get scalable solution approaches in industrial applications of optimisation / OR methods to solve problems defined on graphs. a solution recovered on the simplified graph can be "trivially" extended back to the full original graph, given enough post-processing logic. if such graph simplifications make sense for your domain, can you preprocess and simplify your input graph until you hit a fixed point, then visualise the simplified result? (maybe it contracts to 1 node!)
by surrTurr on 8/2/24, 3:02 PM
Cytoscape JS[1] with canvas rendering. Probably won't be able to do a billion nodes, but the last time I compared graph rendering libraries it was the best one in terms of performance/customizability. If you need even more performance, there's VivaGraphJS[2], which uses webgl to render.
If you want other resources, I also have a GitHub list of Graph-related libraries (visualizations etc.) on GitHub[3].
[1]: https://js.cytoscape.org/ [2]: https://github.com/anvaka/VivaGraphJS [3]: https://github.com/stars/AlexW00/lists/graph-stuff
by simpaticoder on 8/2/24, 1:13 PM
Hilbert curves (or similar) are often used for graphing billions of nodes[1]. However this will not by default show the relationships between nodes in a graph. Depending on your data you may be able to write a function to map from your edge list to a node index that hints at proximity.
Note that visualizations are limited by human perception to ~10000 elements, more usefully 1000 elements. You might try a force directed graph, perhaps a hierarchical variant wherein nodes can contain sub-graphs. Unless you have obvious root nodes, this variant would be interesting in that the user could start from an arbitrary set of nodes, giving different insights depending on their starting point.
1 - An excerpt from "Harder Drive", a rather silly implementation of a unix block device using ping latency with any host that will let him. He visualizes the full ipv4 address space in a hilbert curve at this offset: https://youtu.be/JcJSW7Rprio?si=0AlyMgaZjH7dmh5y&t=363
by Xcelerate on 8/2/24, 2:51 PM
I might be prematurely classifying your question as an instance of the XY problem, but I worked at a company that tried to create something similar — a graph visualization system that could handle 100B nodes as part of our core product and... well... I would caution you not to do so if your purpose is something along those lines.
There's almost never a use case where a customer wants to see a gigantic graph. Or researchers. Or family members for that matter. People's brains just don't seem to mesh with giant graphs. Tiny graphs, sure. Sub-graphs that display relevant information, sure. The whole thing? Nah. Unless it's for an art project, in which case giant graphs can be pretty cool looking.
by IanCal on 8/2/24, 12:15 PM
Datashader is good for rendering large amounts of data, I'd start with that
https://datashader.org/
by oersted on 8/2/24, 11:54 AM
It is somewhat old-school, but Gephi is by far the best graph visualization tool I've used that stays robust and usable at such scales (at least ~10M, but possibly a lot more).
by _flux on 8/2/24, 1:11 PM
I'm also looking for a graph viewing tool, but my wishlist is different (not all of them are hard requirements):
- Deal with 100k node graphs, preferably larger
- Interactive filtering tools, e.g. filtering by node or edge data, transitive closures, highlighting paths matching a condition. Preferably filtering would result in minimally re-layouting the graph.
- Does not need an very sophisticated layout algorithms, if hiding or unranking nodes interactively is easy. E.g. centering on a node could layout other nodes using the selected node as the root.
- Ability to feed live data externally, add/remove nodes and edges programmatically
- Clusters (nodes would tell which clusters they belong in)
I'm actually thinking of writing that tool some day, but it would of course be nicer if it already existed ;). I'm thinking applications like studying TLA+ state traces, visualizing messaging graphs or debug data in real time, visualizing the dynamic state of a network.
Also if you have tips on applicable Rust crates to help creating that, those are appreciated!
by zdimension on 8/3/24, 10:42 PM
I had this question a few years back while working on a social network graph project and trying to render a multi-million node graph. Tried Ogma and it worked quite well but it became too slow when approaching the million. Ended up writing my own renderer in C++ and then Rust. Code here: https://github.com/zdimension/graphrust
Tested it up to 5M nodes, renders above 60fps on my laptop's iGPU and on my Pixel 7 Pro. Turns out, drawing lots of points using shaders is fast.
Though like everybody else here said you probably don't want to draw that many nodes. Create a lower LoD version of the graph and render it instead
by simonsarris on 8/2/24, 1:51 PM
As someone who's made graphing libraries for over a decade: Are you sure you want to visualize 1 billion nodes? What's the essential thing you're trying to see?
Visualizations are great at helping humans parse data, but usually they work best at human scales. A billion nodes is at best looking at clouds, rather than nodes, which can be represented otherwise.
by michaelt on 8/2/24, 1:08 PM
You can visualise a graph with 9 billion nodes on https://www.openstreetmap.org :)
You could copy their design, if you know how you want to project your nodes into 2D. Essentially dividing the visualisation into a very large number of tiles, generated at 18 different zoom levels, then the 'slippy map' viewer loads the tiles corresponding to the chosen field of view.
Then a PostGIS database alongside, letting you run a query to get all the nodes in a given rectangle - such as if you want to find the ID number of a given node.
by sebstefan on 8/2/24, 12:45 PM
Most graphs of social networks done over at /r/dataisbeautiful seem to use Gephi.org and Kumu
by InGoldAndGreen on 8/6/24, 5:52 PM
Oh god I ran into this issue! Fewer nodes, but still.
I created an HTML page that used vis-network to created a force-directed nodegraph. I'd then just open it up and wait for it to settle.
The initial code is here, you should be able to dump it into an LLM to explain: https://github.com/HebeHH/skyrim-alchemy/blob/master/HTMLGra...
I later used d3 to do pretty much the same thing, but with a much larger graph (still only 100,000 nodes). That was pretty fragile though, so I added an `export to svg` button so you could load the graph, wait for it to settle, and then download the full thing. This kept good quality for zooming in and out.
However my nodegraphs were both incredibly messy, with many many connections going everywhere. That meant that I couldn't find a library that could work out how to lay it out properly first time, and needed the force-directed nature to spread them out. For your case of 1 billion nodes, force-directed may not be the way to go.
by jarmitage on 8/2/24, 4:48 PM
Mosaic is designed for scale
https://github.com/uwdata/mosaic
https://idl.uw.edu/mosaic/
by williamdclt on 8/2/24, 12:25 PM
Repeating what others said here: I doubt anyone actually needs to see 1B (or 100B) nodes to make whatever decision they need to make. They probably need to see the X nodes that matter?
If you're fully "zoomed out", is seeing 1B individual nodes the most useful representation? Wouldn't some form of clustering be more useful? Same at intermediate levels.
D3 has all sorts of graphing tooling and is very powerful. It likely wouldn't handle 1B nodes (even if it did, your browser can't) but it has primitives to build graphs
by snickerd00dle on 8/2/24, 5:31 PM
At that size what you're actually looking for is a game engine with a particle system.
by rcarmo on 8/2/24, 12:34 PM
I'd love to see a good solution for this. And it's not just the nodes, it's also the connections between them: https://taoofmac.com/static/graph
by ARothfusz on 8/2/24, 12:34 PM
You could try a hypertree https://en.wikipedia.org/wiki/Hyperbolic_tree but that's usually for acyclic data.
by marcpicaud on 8/2/24, 12:23 PM
Sigma.js is pretty good at rendering a ton of nodes and edges. I haven't tried it with a billion nodes though. https://www.sigmajs.org/
by mro_name on 8/2/24, 12:02 PM
for curiosity - what wisdom do you intend to draw from visualising relations of single gut bacteria? Or is it grains of sand in the sea? How many of them will you zoom into? Maybe clustering may make things feasible.
by macinjosh on 8/2/24, 12:24 PM
I haven’t tried that many, but GraphPU was able to render 100s of millions for me in real time.
https://github.com/latentcat/graphpu
by varjag on 8/2/24, 12:27 PM
I remember Tulip could handle pretty huge ones, though no idea if it can manage billions.
https://tulip.labri.fr/site/
by egberts1 on 8/3/24, 12:26 PM
I was working on a node graph for nftables bison parser.
A blog that covers only failures of large SVG viewers having 10,000+ of nodes.
https://egbert.net/blog/articles/comparison-svg-viewers-larg...
More on https://egbert.net/blog/tags/graphviz.html
by throwaway425933 on 8/2/24, 8:58 PM
Thanks to everybody who replied. I will scale my ambitions for now. How can I visualize a one billion node graph. Lets say I want to visualize transitors in a modern AI chip (around 1B nodes). My original use case was to set color on various compinents on a transistor and visualize it. For example, all flops will have one color, all buffers another color, and then I wanted to visualize their distribution on the semiconductor die.
by withinboredom on 8/3/24, 8:10 AM
A couple of years ago, I had a similar issue. I don't have the code any more, but my output to a 3d model converter in a weekend and then threw it into unreal. Then I put on my VR goggles and walked around the graph. It was much easier to deal with in three dimensions instead of two.
From there I could write better visualizations. I got laid off before the project was completed, though.
by zamalek on 8/2/24, 5:39 PM
Try collapsing cycles into single nodes. In my experience, cycles are extremely low entropy. Those cycle nodes can then be explored on separate diagrams/pages. Explore more dimensions that allow you to collapse nodes. You effectively want to turn you graph into a data cube.
by FL33TW00D on 8/2/24, 12:13 PM
It would be a great thing for open source if someone improved the performance of dot/graphviz!
by insomniacity on 8/2/24, 11:57 AM
Tacking on a related question - what software should one use to interactively create/update/see a small graph?
Thinking specifically about a graph of knowledge, so will be an iterative process.
Just looking for anything more than a text editor really!
by IshKebab on 8/2/24, 1:02 PM
100B is going to require something custom, and tbh I'd be surprised if you can get any useful information from that. But try Gephi. It can at least go into the millions of nodes. Not sure about billions.
by atemerev on 8/2/24, 1:25 PM
For billions of nodes, there are two options: Graphistry (and it might be less than that, 100M is OK), and Pajek, which is weird, but can handle billions of nodes.
Neo4j, cytoscape, etc will not work.
by FrustratedMonky on 8/2/24, 11:49 AM
Forget a billion.
I'm finding even 10's of thousands can be difficult.
Just generally, is there a list of visualization products that is broken down by how many nodes they can handle?
by bee_rider on 8/2/24, 5:25 PM
Maybe you could turn it into a sparse matrix, hit it with a couple different reorderings, do some matvecs, and see if that gives you any insight into it?
by technologia on 8/2/24, 12:52 PM
There was a product I personally liked called graphistry but that isn’t free persay, but its founder is brilliant in this space @lmeyerov
by bjourne on 8/2/24, 11:21 PM
The amount of information in a graph that big is on the order of 10^21. You can't meaningfully "visualize" it.
by randometc on 8/2/24, 12:54 PM
https://www.graphistry.com/ ?
by wslh on 8/2/24, 12:15 PM
https://cytoscape.org/ ?
by gkorland on 8/2/24, 7:17 AM
Do you just hold this number of node in the database or also need to visualize them all in one view?
by hhh on 8/2/24, 1:22 PM
Graphistry
by Flam on 8/2/24, 12:15 PM
Deck.gl’s PointCloudLayer
by tinsane on 8/2/24, 3:22 PM
Dude is casually asking about software to visualize a graph with size comparable to whole internet...
by TZubiri on 8/2/24, 1:04 PM
ArangoDB or Neo4j
by potatoicecoffee on 8/2/24, 1:08 PM
Unformatted csv and you scroll down through it real fast
by bpanon on 8/3/24, 8:45 PM
Statistics
by rockysharma on 8/2/24, 5:46 AM
try dGraph or Aerospike
by thomassmith65 on 8/2/24, 3:21 PM
Your Google Takeout? This sort of thing is why I left /s