added by Mo Shafieeha · updated 2y ago
Delivering Large-Scale Platform Reliability - Roblox Blog
Making systems resilient is fundamentally at odds with optimization, because optimizing a system means taking out any slack. A truly optimized, and thus efficient, system is only possible with near-perfect knowledge about the system, together with the ability to observe and implement a response. For a system to be reliable, on the other hand, there
... See morefrom Against Optimization by Mandy Brown
Keely Adler added
High-performing IT organizations are able to achieve both high throughput, measured in terms of change lead time and deployment frequency, and high stability, measured as the time to restore service after an outage or an event that caused degraded quality of service. High-performing IT organizations also have 50% lower change fail rates than medium
... See morefrom Lean Enterprise: How High Performance Organizations Innovate at Scale by Joanne Molesky
Building and Operating a Pretty Big Storage System Called S3
Andrés and added