from Hacker News

Sparse Matrices (2019)

by tremguy on 1/1/21, 6:11 AM with 19 comments

  • by Const-me on 1/1/21, 10:40 AM

    Another useful representation for them is B+ tree https://en.wikipedia.org/wiki/B%2B_tree

    Can be a single tree for the matrix where keys are tuples of two integers. Or one tree per row, where keys are single integers, column index.

    Unlike CSR, inserting elements is Log(n). RAM overhead is larger than CSR but still reasonable, much smaller than hash maps or red-black trees would be. Similar to CSR, reading rows is mostly sequential RAM reads i.e. pretty fast.

  • by layoutIfNeeded on 1/1/21, 2:38 PM

    “If the ratio of Number of Non-Zero (NNZ) elements to the size is less than 0.5, the matrix is sparse. While this is the mathematical definition, I will be using the term sparse for matrices with only NNZ elements and dense for matrices with all elements.”

    No, this is not the mathematical definition. An n-by-n matrix is usually considered sparse if the number of nonzero elements is O(n). Which means that the ratio of nonzero elements goes to zero as the matrix grows.

  • by mihevc on 1/1/21, 9:33 AM

    Nice post! For completeness: Apache Arrow is also adding sparse tensors in c++ [1] and wrapping them with python [2] (although the documentation for Python might be a bit lacking at the moment).

    [1] https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=...

    [2] https://github.com/apache/arrow/blob/master/python/pyarrow/t...

  • by eggie5 on 1/1/21, 12:08 PM

    I've gotten the question in ML eng interviews 3 times: implement sparse vector and the respective dot product op. I always go for the handy DOK method.
  • by golovashkin on 1/1/21, 1:52 PM

    for a decent introduction to sparse matrix computations, see Tim Davis lectures

    https://www.youtube.com/watch?v=1dGRTOwBkQs

  • by mihevc on 1/1/21, 7:44 PM

    Yet another format is CSF (Compressed Sparse Fiber): http://glaros.dtc.umn.edu/gkhome/node/1177

    It's a generalized CSR/CSC (to ndim > 2) format that uses a tree data structure where each path from root to leaf encodes a nonzero element.

  • by mindv0rtex on 1/1/21, 2:44 PM

    Not mentioned in the article, the ELL format is very efficient for banded matrices.