Sharding, replication and performance at scale

Most databases tackle the issue of (poor) performance at scale by scaling up using replication/sharding strategies. While these techniques are definitely useful and they are planned for agdb they should be avoided as much as possible. The increase in complexity when using replication and/or sharding is dramatic and it has adverse performance impact meaning it is only worth it if there is no other choice.

The agdb is designed so that it performs well regardless of the data set size. Direct access operations are O(1) and there is no limit on concurrency. Write operations are O(1) amortized however they are exclusive - there can be only one write operation running on the database at any given time preventing any other read or write operations at the same time. You will still get O(n) complexity when searching the (sub)graph as reading a 1000 connected nodes will take 1000 O(1) operations = O(n) same as reading 1000 rows in a table. However if the data does not indiscriminately connect everything to everything one can have as large data set as the hardware can fit without performance issues. The key is querying only subset of the graph (subgraph) since your query will have performance based on that subgraph and not all the data stored in the database.

The point here is that scaling has significant cost regardless of technology or clever tricks. Only when the database starts exceeding limits of a single machine they shall be considered because adding data replication/backup will mean huge performance hit. To mitigate it to some extent caching can be used but it can never be as performant as local database. The features "at scale" are definitely coming you should avoid using them as much as possible even if available.

For real world performance see dedicated documentation.

Single file