Loading…
Friday, May 7 • 11:50 - 12:25
Better Scalability and More Isolation? The Cortex “Shuffle Sharding” Story - Tom Wilkie, Grafana Labs

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Cortex is a horizontally-scalable, highly-available and multi-tenant Prometheus-compatible time series database. For many years it has been possible to scale Cortex clusters to hundreds of replicas. The relatively simple Dynamo-style replication relies on quorum consistency for reads and writes. As such, a dual-replica failure can lead to an outage for all tenants. To address this we implemented a technique called “Shuffle Sharding” in Cortex. Shuffle Sharding lets you automatically pick a random “replica set” for each tenant, allowing you to isolate tenants and reduce the chance of an outage. In this talk we’ll show you how shuffle sharding achieves better scalability and more isolation, both in theory and in practice. We’ll walk you through the design on both the read and write path of Cortex. Finally we’ll do a live demo of shuffle sharding and how you can “take out” multiple replicas without affecting all tenants.

Speakers
avatar for Tom Wilkie

Tom Wilkie

VP Product, Grafana Labs
Tom is VP Product at Grafana Labs, a member of Prometheus team and one of the original authors of the Cortex and Loki projects. In his spare time he builds 3D printers and make craft beer.



Friday May 7, 2021 11:50 - 12:25 CEST
Observability Theater