Sr. Mgr Site Reliability & DevOps Engineering

Overall Job Summary

As a Manager of Site Reliability and DevOps Engineering you will manage and oversee the engineering teams supporting a large-scale distributed application portfolio across on-prem and Cloud environments. The role will focus on increasing efficiency, eliminating downtime, optimizing cost, and managing performance at scale while providing leadership cloud infrastructure management, continuous integration / continuous delivery.  You will provide leadership and oversight across multiple data integration and API application establishing and maintaining processes supporting frequent code releases into the data integration platforms. 
 

Essential Duties and Responsibilities (Min 5%)

  • Manages end-to-end availability and reliability of API and data integration services, systems, platforms, and infrastructure and ensure they are designed and operated in an optimal manner
  • Maintains security and performance of mission-critical applications and services that are part of the integration ecosystem
  • Manages CI/CD pipelines, and improves build and deployment processes, reducing deployment times, and enabling continuous delivery.
  • Maintains source control branching strategies, branch merge processes and configuration management 
  • Partners with Information Security with managing application security, vulnerabilities fix remediation, and site compliance
  • Partners with vendors and internal teams to ensure all critical patches are tested and applied in all environments
  • Partners with Cloud and Infrastructure teams to build and maintain environments, optimize usage and cost with optimal scaling strategy 
  • Manages the performance strategy, test executions and remediation of critical site findings
  • Establishes application and synthetic monitoring, alerting and execution of failover capabilities and automated self-healing and recovery.
  • Owns day-to-day health, uptime, monitoring and reliability of the website and related services
  • Owns deployment architecture and environment/Infrastructure sizing in a Cloud based environment
  • Ensures day-to-day support for multiple environments, ensuring readiness for project development and test activities
  • Drives changes and release activities supporting site stability partnering with shared service and change management teams
  • Partners with and influences managers and architects across the organization defining and executing the performance strategy for all API and data integration systems and interfaces
  • Creates and maintains the framework to manage and execute continuous application development and delivery with multiple parallel project initiatives
  • Employs strong DevOps and SRE principles and practices, and continuous improvement of processes via automation
  • Ensures non-production environments are setup with product configuration and date needs for various project needs
  • Ensures 24x7 on call rotations for Site Reliability and DevOps Engineering
     

Required Qualifications

Experience: 

  • 12+ years of experience in enterprise software design, development, and deployments. 
  • 10+ year of experience around performance engineering & application monitoring for an organization with large and complex information systems is preferred.
  • 7+ experience in DevOps, build engineering, release management, and automation


Education:  Bachelor’s degree in Computer Science or related field is required. Any suitable combination of education and experience will be considered.
 

Preferred knowledge, skills or abilities

  • Strong experience in developing and implementing comprehensive monitoring solutions to provide full visibility to the different platform and application components using tools and services like Prometheus, Grafana, ECK/ELK, Dynatrace, Rigor, Quantum Metrics, and other similar tools. 
  • Extensive experience implementing security best practices into the development and deployment process
  • Evaluate Performance trends and expected changes in demand and capacity and establish the appropriate scalability Plans.
  • Evaluate production traffic pattern and tune the performance test workload mix and strategy to keep the systems and application in continuous readiness mode. 
  • Experience with Kubernetes, AKS & Azure Cloud platform design, implement & maintain though cost efficient models.
  • Experience with containerization, certificates management, Kafka, Zookeeper & Vaults & pipeline automation, Fisheye, Crucible, Performance & QA Test Tool Integrations.
  • Strong Experience with cloud PaaS/IaaS environments Azure. 
  • Experience with API or Microservices applications in an open-source environment
     

Working Conditions

  • Hybrid / Flexible working conditions
  • Must be able to work some nights and weekends
  • Occasional travel required
  • Repetitive wrist, hand or finger movement

Physical Requirements

  • Sitting
  • Standing (not walking)
  • Walking
  • Kneeling/Stooping/Bending
  • Reaching overhead
  • Lifting up to 20 pounds

Disclaimer

This job description represents an overview of the responsibilities for the above referenced position.  It is not intended to represent a comprehensive list of responsibilities.  A team member should perform all duties as assigned by his/ her supervisor.
 

ALREADY A TEAM MEMBER?

You must apply or refer a friend through our internal portal

CONNECTION

Our Mission and Values are more than just words on the wall - they’re the one constant in an ever-changing environment and the bedrock on which we build our culture. They're the core of who we are and the foundation of every decision we make. It’s not just what we do that sets us apart, but how we do it.

Learn More

EMPOWERMENT

We believe in managing your time for business and personal success, which is why we empower our Team Members to lead balanced lives through our benefits total rewards offerings. fot full-time and eligible part-time TSC and Petsense Team Members. We care about what you care about!

Learn More

OPPORTUNITY

A lot of care goes into providing legendary service at Tractor Supply Company, which is why our Team Members are our top priority. Want a career with a clear path for growth? Your Opportunity is Out Here at Tractor Supply and Petsense.

Learn More


Nearest Major Market: Nashville