GitHub - hatchet-dev/hatchet: A distributed, fault-tolerant task queue
High-performing IT organizations are able to achieve both high throughput, measured in terms of change lead time and deployment frequency, and high stability, measured as the time to restore service after an outage or an event that caused degraded quality of service. High-performing IT organizations also have 50% lower change fail rates than medium
... See moreJoanne Molesky • Lean Enterprise: How High Performance Organizations Innovate at Scale

Sorry, we’re unable to display this type of content.
Jez Humble • The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations

What sets distributed systems engineering apart is the probability of failure and, worse, the probability of partial failure. If a well-formed mutex unlock fails with an error, we can assume the process is unstable and crash it. But the failure of a distributed mutex’s unlock must be built into the lock protocol.