lmeyerov 14 days ago

Extra fun: We find most enterprise/gov graph analytics work only requires 1-2 attributes to go along with the graph index, and those attributes often are already numeric (time, $, ...) or can be dictionary-encoded as discussed here (categorical, ID, ...). The result is that even 'tough' billion scale graphs are fine on 1 gpu.

Ex: Most of our users are happy to do 100K-100M edges quickly on a single node, imagine a quick parquet read, so goes surprisingly far. For $0.5/hr, you can get 16GB of GPU RAM on aws, stuff in 100M+ edges comfortably, and have room to compute all sorts of things. With a more typical one, 1B is easy.

Early, but that's been the basic thinking into our new GFQL system: slice into the columns you want, and then do all the in-GPU traversals you want. In our V1, we keep things dataframe-native, including the in-GPU data representation, and we are now working on adding the first extensions to support on-the-fly switching to more graph-native indexing as needed.

Ex: https://github.com/graphistry/pygraphistry/blob/master/demos...

fl0ki 14 days ago

I've been doing this for years in projects internal to my employers. It's nice to see it catching on, because it'll be easier to explain to people if I can point to a blog post saying the same thing.

Other tips worth trying:

* You'll do even better by optimizing your ID type. Even if it has to be a string, it can at least be a string with SSO, or a long-lived string reference, or an interned string.

* Try the ahash crate instead of the default cryptogically secure hash function in Rust.

* Consider HashSet<ByRef<Arc<T>>> instead of having to take keys to a different map to resolve the actual object.

  • o11c 14 days ago

    * You can use a Vector-like for the `uid_to_did` direction (be sure it doesn't copy on resize, and you may also want to prevent invalidation, so a 2-level vector is often a reasonable choice). Depending on design decisions the elements might want to be `Option<String>` instead of just `String`.

    * You can get away with uint32 indexes. The interned indexes should local so this shouldn't be a problem even if you think you'll have more than 4 billion global users. (note: all social media combined only has that currently)

    * For interned strings specifically it's embarrassingly simple to use a bump allocator for the string data itself.

cryptonector 13 days ago

> If we convert each DID into a uint64 (a process referred to as “interning”), we can significantly compress the size of our graph and make it faster since our hashing functions will have fewer bytes they need to work with.

I've done something like this for a production authorization system where one of the key steps is to check if an intersection of two [ordered] sets of 64-bit integers is empty or not. This operation first looks at the sizes of the two sets, and if one is much larger than the other then it iterates over the smaller set doing a binary search on the other for each item in the smaller set. If neither set is much larger than the other then the implementation walks a cursor through each set looking for equal items (and if at some point the remainder of one set is much smaller than the remainder of the other then it switches to the other algorithm). In most cases this deals with pairs of sets where one has thousands of elements and the other has tens to hundreds of elements, and it's blazingly fast. I could switch to 32-bit integers to make this go faster.

082349872349872 14 days ago

> the feature ... that shows you how many of your friends also follow this user.

Propaganda in the 1920s targeted social clique by social clique, but as far as I know (cards having been invented for a few decades) it was largely a manual wetware-driven process. With our progress over the last century, we can now target "all your friends are saying" campaigns completely automatically.

kohlerm 14 days ago

https://eclipse.dev/mat/ can handle very large graphs of objects using a similar approach. It also does implement some kind of paging, such that you do not have to load the complete graph into memory when running some of the graph algorithms.

gpderetta 14 days ago

Depending on the relative sizes of the sets and the number of times the intersections are computed, it might be more efficient to store the sets as ordered lists and do a linear merge pass to compute the intersection.

adultSwim 14 days ago

The online graph algorithm in the post reminded me of Frank McSherry's Differential Dataflow.

bionhoward 14 days ago

rather than three KVs with the same keys, what’s the memory usage if you use a Polars DataFrame or an Arrow RecordBatch where each u64 key is only written exactly once? How does that impact the lookup times?

victorbjorklund 14 days ago

Really nice writeup. Suprised it was so efficient.

  • philipkglass 14 days ago

    It concludes "There’s further optimization left to be made by locking individual HashSets instead of the entire follows or following set, but we can leave that for a later day."

    The other optimization that I would try is replacing those u64 keys with u32. There are only a few million accounts at present so there's a lot of growth room before 32 bits become inadequate. RAM is cheap, but making values smaller allows you to fit more of them into CPU cache, which often gives worthwhile speedups.

    • unrealhoang 14 days ago

      After convert the keys to u32, they can use more specialized data structure like croaring to get an order of magnitude faster.

rand0mstring 13 days ago

this makes no sense. it's the wrong tool for the job. you want to be using a sparse adjacency matrix to store your data, and then operate on it. will be orders of magnitude faster than this.