
Site Reliability Engineering Essentials
English | Tutorial | Size: 905.72 MB
Learn the SRE Principles and Practices to Run Production Systems Effectively
– Define and describe the role of Site Reliability Engineers in the real world
– Explain the tenets of Site Reliability Engineering using practical examples
– Learn to cultivate Site Reliability Engineering culture in your organization
Site Reliability Engineering (SRE) is a discipline that deals with running production systems effectively. It integrates software engineering into what is traditionally known as Systems Administration. Site Reliability Engineers are responsible for the availability, performance, and end-user experience of services. They handle incident management and ensure service-level objectives are met. They also set up monitoring and alerting to proactively catch issues before end users are impacted. With the advent of modern microservices architectures, distributed systems, and cloud architectures, SRE has become a critical part of every organization.
In this course, you will learn the fundamentals of SRE based on real-world examples, how to differentiate DevOps and SRE, and how SRE implements DevOps principles. You will learn how to set up monitoring and alerting using monitoring tools and Service Level Indicators (SLI), conduct a blameless postmortem, learn the various roles utilized in an incident command center, and study the tools and templates for effective postmortems. You’ll examine toil (manual and repetitive work) and learn how to reduce it in a SRE organization. Finally, you will learn industry best practices for system design that emphasize reliability.
By the end of the live online course, you’ll understand:
– The differences between SRE and DevOps
– The basics of Service Level Indicators (SLI), Service Level Objectives (SLO) and Service Level Agreements (SLA)
– How to embrace automation to reduce mundane, monotonous manual work
– Ways to monitor the production applications and create meaningful alerts (pages vs. emails)
– How to review the anatomy of a blameless postmortem report and industry-standard templates for writing postmortems
And you’ll be able to:
– Develop meaningful SLO and SLI and measure them accurately
– Put in place an effective monitoring and alerting solution
– Create postmortems of incidents and conduct blameless reviews
– Implement best practices in systems architecture to improve reliability
– Devise an on-call rotation and set up process to avoid burn-out
TURBOBIT:
https://trbt.cc/1bgvdfxxi5v7/Site_Reliability_Engineering_Essentials.rar.html
Leave a Reply