Name: Better Scalability and More Isolation? The Cortex “Shuffle Sharding” Story - Tom Wilkie, Grafana Labs
Start: 2021-05-07T11:50:00+0200
End: 2021-05-07T12:25:00+0200

Back To Schedule

Better Scalability and More Isolation? The Cortex “Shuffle Sharding” Story - Tom Wilkie, Grafana Labs

Feedback form is now closed.

LINK TO VIDEO RECORDING

Cortex is a horizontally-scalable, highly-available and multi-tenant Prometheus-compatible time series database. For many years it has been possible to scale Cortex clusters to hundreds of replicas. The relatively simple Dynamo-style replication relies on quorum consistency for reads and writes. As such, a dual-replica failure can lead to an outage for all tenants. To address this we implemented a technique called “Shuffle Sharding” in Cortex. Shuffle Sharding lets you automatically pick a random “replica set” for each tenant, allowing you to isolate tenants and reduce the chance of an outage. In this talk we’ll show you how shuffle sharding achieves better scalability and more isolation, both in theory and in practice. We’ll walk you through the design on both the read and write path of Cortex. Finally we’ll do a live demo of shuffle sharding and how you can “take out” multiple replicas without affecting all tenants.

Speakers

Tom Wilkie

VP Product, Grafana Labs

Tom is VP Product at Grafana Labs, a member of Prometheus team and one of the original authors of the Cortex and Loki projects. In his spare time he builds 3D printers and make craft beer.

Plain Text Transcript 39 STYC 0548

Friday May 7, 2021 11:50 - 12:25 CEST
Observability Theater

Observability

Content Experience Level Intermediate (Mid-level experience)

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Tom Wilkie