Engineer, Site Reliability
Overall Job Summary
As a Site Reliability Engineer you will participate in implementing modern Engineering and DevOps techniques operating a large-scale distributed application portfolio across on-premises and cloud to increase efficiency, eliminate downtime, optimize cost, and maintain performance at scale. You will provide hands on technical expertise to design, deploy, secure, and optimize cloud services and deliver the best customer experience.
Essential Duties and Responsibilities (Min 5%)
- Assists with end-to-end availability, security and performance of mission-critical applications and services.
- Coordinates change and release activities with other teams (internal and external), partnering with the Change Management group to ensure smooth and trouble-free roll out of releases and changes.
- Assists with managing application security, vulnerabilities fix remediation, and compliance activities with other teams (internal and external)
- Partners with vendors to ensure all critical patches are tested and applied in both Non-Production and Production environment in time to avoid any business and customer impacts.
- Coordinates performance test activities with other teams (internal and external), partners with QA Performance Test Engineers to ensure all changes are tested in both Non-Production and Production to avoid any business and customer impacts.
- Assists with managing and maintaining of performance environments, ensuring that these environments are properly setup, configured, and highly available for each project as scheduled.
- Supports day-to-day health, uptime, monitoring and reliability of the website and related services
- Share a 24x7 On-Call Production support rotation with your team and respond to service incidents.
Qualifications
High Demand IT Specialized Skills
Platform Knowledge
Preferred knowledge, skills or abilities
- Strong experience with IBM/HCL WebSphere Commerce, IBM Sterling Commerce, SOLR and related build and deployment processes. HCL Commerce Version 9 Experience is a plus.
- Strong experience with IBM Http Server, IBM WebSphere Application Server, IBM MQ & Deployment manager ND/Liberty software.
- Hands-on experience in developing and implementing comprehensive monitoring solutions to provide full visibility to the different platform and application components using tools and services like Kubernetes, Prometheus, Grafana, ECK/ELK, Dynatrace, Rigor, Quantum Metrics, and other similar tools.
- Hands-on experience in Identifying and troubleshoot any availability and performance issues at multiple layers of deployment from Infrastructure, operating Environment, Network, application, and Integration System and solve customer issues on production deployments.
- Evaluate Performance trends and expected changes in demand and capacity and establish the appropriate scalability Plans.
- Evaluate production traffic pattern and tune the performance test workload mix and strategy to keep the systems and application in continuous readiness mode.
Working Conditions
Physical Requirements
Disclaimer
This job description represents an overview of the responsibilities for the above referenced position. It is not intended to represent a comprehensive list of responsibilities. A team member should perform all duties as assigned by his/ her supervisor.
Nearest Major Market: Nashville