CausalMesh: A Formally Verified Causally Consistent Distributed Cache with Support for Client Migration
Abstract
Cloud applications often insert a caching lay\-er in front of a database in order to reduce I/O latency and improve throughput. One complication occurs when a client fetches some data from one cache node, then migrates to another (e.g., due to failures, load balancing, or client mobility), where it fetches the remaining data. If the data in the cache nodes is inconsistent, the client could observe states that undermine the application's correctness. One example of a situation where this is common is stateful serverless workflows, which consist of multiple serverless functions that access state in a remote database. In serverless, functions in the same workflow may be scheduled to different nodes with different caches, resulting in the migration pattern described above -- the same client (the workflow) reads some data from one cache and other data from another. To address this issue, this paper presents CausalMesh, a novel approach to causally consistent distributed caching in environments where computations may migrate between machines. CausalMesh is the first cache system to support coordination-free, abort-free read/write operations and read transactions when clients migrate across multiple servers. CausalMesh also supports read-write transactional causal consistency in the presence of client migration, but at the cost of abort-freedom. Our experimental evaluation shows that CausalMesh has lower latency and higher throughput than existing proposals. Finally, we have formally verified the correctness of \sys's protocol in Dafny.