by throwaway425933 on 8/1/24, 6:35 PM with 109 comments
I want to be able to zoom in and zoom out. Graph has upto 100B nodes and is directed cyclic graph.
by bane on 8/2/24, 4:14 PM
It used to be thought that visualizing super large graphs would reveal some kind of macro-scale structural insight, but it turns out that the visual structure ends up becoming dominated by the graph layout algorithm and the need to squash often inherently high-dimensional structures into 2 or 3 dimensions. You end up basically seeing patterns in the artifacts of the algorithm instead of any real structure.
There's a similar, but unrelated desire to overlay sequenced transaction data (like transportation logs) on a geographical map as a kind of visualization, which also almost never reveals any interesting insights. The better technique is almost always a different abstraction like a sequence diagram with the lanes being aggregated locations.
There's a bunch of these kinds of pitfalls in visualization that people who work in the space inevitably end up grinding against for a while before realizing it's pointless or there's a better abstraction.
(source: I used to run an infoviz startup for a few years that dealt with this exact topic)
by viraptor on 8/2/24, 12:17 PM
But if you were my coworker I'd really press on why do you want the visualisation and if you can get your answers in some other way. And whether you can create aggregates of your data that reduces it to thousands of groups instead. Your data is a minimum of ~800GB if the graph is a single line (position + 64bit value encoding each edge, no labels), so you're not doing anything real-time with it anyway.
by david_p on 8/2/24, 6:14 PM
Context: I’m the CTO of a GraphViz company, I’ve been doing this for 10+ years.
Here are my recommendations:
- if you can generate a projection of your graph into millions of nodes, you might be able to get somewhere with Three.js, which is a JS library to generate WebGL graphics. The library is close enough to the metal to allow you to build something large and fast.
- if you can get the data below 1M nodes, your best shot is Ogma (spoiler: my company made it). It scales well thanks to WebGL and allows for complex interactions. It can run a graph layout on the GPU in your browser. See https://doc.linkurious.com/ogma/latest/examples/layout-force...
- If you want to keep your billions of nodes but are OK with not seeing the whole graph at once, my company builds Linkurious. It is an advanced exploration interface for a graph stored in Neo4j (or Amazon Neptune). We believe that local exploration up to 10k nodes on screen is enough, as long as you can run graph queries and full-text search queries against the whole graph with little friction. See https://doc.linkurious.com/user-manual/latest/running-querie...
by CuriouslyC on 8/2/24, 12:25 PM
by shoo on 8/2/24, 8:42 AM
is there a way you can subsample or simplify or approximate the graph that'd be good enough?
in some domains, certain problems that are defined on graphs can be simplified by pre-processing the graph, to reduce the problem to a simpler problem. e.g. maybe trees can be contracted to points, or chains can be replaced with a single edge, or so on. these tricks are sometimes necessary to get scalable solution approaches in industrial applications of optimisation / OR methods to solve problems defined on graphs. a solution recovered on the simplified graph can be "trivially" extended back to the full original graph, given enough post-processing logic. if such graph simplifications make sense for your domain, can you preprocess and simplify your input graph until you hit a fixed point, then visualise the simplified result? (maybe it contracts to 1 node!)
by surrTurr on 8/2/24, 3:02 PM
If you want other resources, I also have a GitHub list of Graph-related libraries (visualizations etc.) on GitHub[3].
[1]: https://js.cytoscape.org/ [2]: https://github.com/anvaka/VivaGraphJS [3]: https://github.com/stars/AlexW00/lists/graph-stuff
by simpaticoder on 8/2/24, 1:13 PM
Note that visualizations are limited by human perception to ~10000 elements, more usefully 1000 elements. You might try a force directed graph, perhaps a hierarchical variant wherein nodes can contain sub-graphs. Unless you have obvious root nodes, this variant would be interesting in that the user could start from an arbitrary set of nodes, giving different insights depending on their starting point.
1 - An excerpt from "Harder Drive", a rather silly implementation of a unix block device using ping latency with any host that will let him. He visualizes the full ipv4 address space in a hilbert curve at this offset: https://youtu.be/JcJSW7Rprio?si=0AlyMgaZjH7dmh5y&t=363
by Xcelerate on 8/2/24, 2:51 PM
There's almost never a use case where a customer wants to see a gigantic graph. Or researchers. Or family members for that matter. People's brains just don't seem to mesh with giant graphs. Tiny graphs, sure. Sub-graphs that display relevant information, sure. The whole thing? Nah. Unless it's for an art project, in which case giant graphs can be pretty cool looking.
by IanCal on 8/2/24, 12:15 PM
by oersted on 8/2/24, 11:54 AM
by _flux on 8/2/24, 1:11 PM
- Deal with 100k node graphs, preferably larger
- Interactive filtering tools, e.g. filtering by node or edge data, transitive closures, highlighting paths matching a condition. Preferably filtering would result in minimally re-layouting the graph.
- Does not need an very sophisticated layout algorithms, if hiding or unranking nodes interactively is easy. E.g. centering on a node could layout other nodes using the selected node as the root.
- Ability to feed live data externally, add/remove nodes and edges programmatically
- Clusters (nodes would tell which clusters they belong in)
I'm actually thinking of writing that tool some day, but it would of course be nicer if it already existed ;). I'm thinking applications like studying TLA+ state traces, visualizing messaging graphs or debug data in real time, visualizing the dynamic state of a network.
Also if you have tips on applicable Rust crates to help creating that, those are appreciated!
by zdimension on 8/3/24, 10:42 PM
Tested it up to 5M nodes, renders above 60fps on my laptop's iGPU and on my Pixel 7 Pro. Turns out, drawing lots of points using shaders is fast.
Though like everybody else here said you probably don't want to draw that many nodes. Create a lower LoD version of the graph and render it instead
by simonsarris on 8/2/24, 1:51 PM
Visualizations are great at helping humans parse data, but usually they work best at human scales. A billion nodes is at best looking at clouds, rather than nodes, which can be represented otherwise.
by michaelt on 8/2/24, 1:08 PM
You could copy their design, if you know how you want to project your nodes into 2D. Essentially dividing the visualisation into a very large number of tiles, generated at 18 different zoom levels, then the 'slippy map' viewer loads the tiles corresponding to the chosen field of view.
Then a PostGIS database alongside, letting you run a query to get all the nodes in a given rectangle - such as if you want to find the ID number of a given node.
by sebstefan on 8/2/24, 12:45 PM
by InGoldAndGreen on 8/6/24, 5:52 PM
I created an HTML page that used vis-network to created a force-directed nodegraph. I'd then just open it up and wait for it to settle.
The initial code is here, you should be able to dump it into an LLM to explain: https://github.com/HebeHH/skyrim-alchemy/blob/master/HTMLGra...
I later used d3 to do pretty much the same thing, but with a much larger graph (still only 100,000 nodes). That was pretty fragile though, so I added an `export to svg` button so you could load the graph, wait for it to settle, and then download the full thing. This kept good quality for zooming in and out.
However my nodegraphs were both incredibly messy, with many many connections going everywhere. That meant that I couldn't find a library that could work out how to lay it out properly first time, and needed the force-directed nature to spread them out. For your case of 1 billion nodes, force-directed may not be the way to go.
by jarmitage on 8/2/24, 4:48 PM
by williamdclt on 8/2/24, 12:25 PM
If you're fully "zoomed out", is seeing 1B individual nodes the most useful representation? Wouldn't some form of clustering be more useful? Same at intermediate levels.
D3 has all sorts of graphing tooling and is very powerful. It likely wouldn't handle 1B nodes (even if it did, your browser can't) but it has primitives to build graphs
by snickerd00dle on 8/2/24, 5:31 PM
by rcarmo on 8/2/24, 12:34 PM
by ARothfusz on 8/2/24, 12:34 PM
by marcpicaud on 8/2/24, 12:23 PM
by mro_name on 8/2/24, 12:02 PM
by macinjosh on 8/2/24, 12:24 PM
by varjag on 8/2/24, 12:27 PM
by egberts1 on 8/3/24, 12:26 PM
A blog that covers only failures of large SVG viewers having 10,000+ of nodes.
https://egbert.net/blog/articles/comparison-svg-viewers-larg...
by throwaway425933 on 8/2/24, 8:58 PM
by withinboredom on 8/3/24, 8:10 AM
From there I could write better visualizations. I got laid off before the project was completed, though.
by zamalek on 8/2/24, 5:39 PM
by FL33TW00D on 8/2/24, 12:13 PM
by insomniacity on 8/2/24, 11:57 AM
Thinking specifically about a graph of knowledge, so will be an iterative process.
Just looking for anything more than a text editor really!
by IshKebab on 8/2/24, 1:02 PM
by atemerev on 8/2/24, 1:25 PM
Neo4j, cytoscape, etc will not work.
by FrustratedMonky on 8/2/24, 11:49 AM
I'm finding even 10's of thousands can be difficult.
Just generally, is there a list of visualization products that is broken down by how many nodes they can handle?
by bee_rider on 8/2/24, 5:25 PM
by technologia on 8/2/24, 12:52 PM
by bjourne on 8/2/24, 11:21 PM
by randometc on 8/2/24, 12:54 PM
by wslh on 8/2/24, 12:15 PM
by gkorland on 8/2/24, 7:17 AM
by hhh on 8/2/24, 1:22 PM
by Flam on 8/2/24, 12:15 PM
by tinsane on 8/2/24, 3:22 PM
by TZubiri on 8/2/24, 1:04 PM
by potatoicecoffee on 8/2/24, 1:08 PM
by bpanon on 8/3/24, 8:45 PM
by rockysharma on 8/2/24, 5:46 AM
by thomassmith65 on 8/2/24, 3:21 PM