added by Mo Shafieeha ยท updated 2y ago
Delivering Large-Scale Platform Reliability - Roblox Blog
- Anticipate Failure Anything that could improve connectivity is a positive thing to look forward to. The key is keeping client latency low and ensuring clients are prepared for very high traffic with cache control, sidecars, timeouts, circuit breakers, and retries. In addition to improving healthy signals from the services, it's critical to understa... See more
from Delivering Large-Scale Platform Reliability - Roblox Blog by Roblox
Mo Shafieeha added 2y ago
- The Power of Right Measurement
- SLO (Service Level Objective) is the reliability objective that our team aims for (i.e. 99.999%).
- SLI (Service Level Indicator) is the achieved reliability given a timeframe (i.e. 99.975% last February).
- SLA (Service Level Agreement) is the reliability agreed to deliver and be expected by our consumers at a given timefr
from Delivering Large-Scale Platform Reliability - Roblox Blog by Roblox
Mo Shafieeha added 2y ago
- Bring Structure to Chaos
from Delivering Large-Scale Platform Reliability - Roblox Blog by Roblox
Mo Shafieeha added 2y ago
- Building a trustworthy platform is hard, but it's necessary if you want it. These practices are already paying off daily. Adopting quality in our culture is the most crucial and decisive factor in getting there. All parts of our platform are affected by our reliability culture.
from Delivering Large-Scale Platform Reliability - Roblox Blog by Roblox
Mo Shafieeha added 2y ago
- A service's "Success Ratio" (SLI) is the percentage of fulfilled requests. A successful response is when a request is dispatched promptly and adequately, meaning there are no problems with connectivity, service, or unexpected errors. We're measuring the end-to-end experience delivered to customers to ensure SLAs are met. If we don't, we'll get a fa... See more
from Delivering Large-Scale Platform Reliability - Roblox Blog by Roblox
Mo Shafieeha added 2y ago
- Why Reliability Matters Reliability means you won't have any problems with your services, no matter how complex and dependent. In typical reliability cases, availability is the focus, but sometimes terms get mixed up and misused. Distribution systems can only guarantee two out of three things - availability, fault tolerance, and consistency - so co... See more
from Delivering Large-Scale Platform Reliability - Roblox Blog by Roblox
Mo Shafieeha added 2y ago