Sr. Manager, Site Reliability Engineering
Information System
Dallas, TX
August 27, 2025
Sr. Manager, Site Reliability Engineering
Coppell, TX
What does it mean to be a BrinkerHead? We play like a team, take pride in our culture and seek every opportunity to make people feel special. Life is short. Work happy.
At Brinker, we connect, serve and give to create the best life for our Team Members, Guests and community. Through our cultural beliefs, Brinker empowers its Team Members to positively impact our 4 Key Results: Engaging Team Members, Bringing Back Guests, Growing Sales and Increasing Profits.
Brinker International is an equal opportunity employer; we foster an inclusion environment that promotes respect, diversity of thought and success for all.
Job Summary
We are seeking a highly skilled and motivated Sr. Manager, Site Reliability Engineer to join our team. As Sr. Manger, Site Reliability Engineer, you will play a crucial role in ensuring the reliability, performance, and scalability of our systems and services. You will be responsible for building and leading a team of talented engineers, driving initiatives to enhance reliability for our technology systems, streamline operations, and minimize downtime. Your technical expertise, coupled with strong communication skills and strategic thinking, will be instrumental in fostering collaboration across teams and implementing best practices. You will work closely with our development and operations teams to build and maintain robust infrastructure, automate processes, and improve overall system reliability.
Objectives
- Assist or build software and systems to manage platform infrastructure and applications.
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with focus toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
Your Key Job Functions
- Build, lead and mentor a team of Site Reliability Engineers, providing guidance and support, while also implementing best practices and resolving complex technical challenges.
- Collaborate with cross-functional teams to define reliability requirements, establish service level objectives (SLOs), and develop a strategic vision along with defined action items to hold accountability among the team.
- Monitor system performance, conduct root cause analysis of incidents, implement and document solutions to prevent recurrence, identify bottlenecks, and proactively address issues to ensure high availability and reliability.
- Design, implement, and maintain scalable and reliable infrastructure to support our applications and services.
- Develop and maintain automation tools to streamline deployment, monitoring, and incident response processes.
- Collaborate across the IT department, but specifically with development teams to ensure best practices for software development, testing, and deployment.
- Conduct root cause analysis of incidents and implements corrective actions to prevent recurrence.
- Continuously improve system reliability, performance, and scalability through monitoring, testing, and optimization.
- Gather and analyze metrics from operating systems, logs, as well as applications to assist in performance tuning and fault finding.
- Partner with development teams to improve services through rigorous testing and release procedures.
- Participate in system design consulting, platform management, and capacity planning.
- Balance feature development speed and reliability with well-defined service-level objectives.
What You Bring to the Team
- Master's degree and/or bachelor's degree in combination with equivalent experience in Computer Science, Engineering, or related field.
- 5+ years as a Site Reliability Engineer or similar role, with a demonstrated track record of successfully managing reliability and scalability of large-scale systems.
- Strong knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes).
- Proficiency in scripting and automation languages (e.g., Python, Bash, Ansible).
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
- Demonstrated leadership experience, with a passion for mentoring and developing team members.
- Excellent problem-solving skills and the ability to work under pressure.
- Proven ability to solve complex issues in a timely fashion.
- Proven ability to quickly adapt and flex to a dynamic environment by being a "self-starter".
- Strong communication and collaboration skills.
- Strong project management skills.
- Strong documentation skills.
- Solid understanding of networking, security, and system administration.
- Experience with infrastructure as code (IaC) tools (e.g., Terraform, CloudFormation).
- Knowledge of CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).
- Familiarity with database management systems (e.g., MySQL, PostgreSQL).
Why Brinker
We offer competitive benefits package including medical/dental/vision, life insurance, paid vacation/holidays, and 401(k) with company match and generous dining discounts. Every team member working at the Restaurant Support Center (aka Brinker headquarters) is eligible for annual bonus potential.
Our campus includes an onsite gym plus opportunities to increase your wellbeing with onsite Yoga and boot camp programs. Work/Life/Fun balance in a casual and collaborative work environment! Team members enjoy company-wide events and celebrations. Regular volunteer opportunities with our community give back programs.