Senior Site Reliability Engineer / Devops

Toronto, ON, Canada

Role: Senior DevOps & Site Reliability Engineer
Location: Toronto, ON
Interview Mode - Virtual
Hybrid

Key Responsibilities:

Oversee the reliability, availability, and performance of Apigee Hybrid and Google Distributed Cloud environments, ensuring robust SRE practices.
Manage and automate certificate management processes, including renewals, deployments, and compliance checks.
Plan and execute upgrades and maintenance activities for Apigee Hybrid and distributed cloud infrastructure, minimizing downtime and ensuring seamless transitions.
Implement and maintain monitoring solutions using Dynatrace and Splunk, proactively identifying and resolving issues to ensure system health and performance.
Troubleshoot complex production incidents, perform root cause analysis, and drive incident resolution to restore service quickly and prevent recurrence.
Develop and maintain automation scripts and Ansible playbooks for operational efficiency, including tasks such as Kubernetes context retrieval, proxy configuration, and container management.
Collaborate with cross-functional teams to ensure security, compliance, and best practices are followed across all SRE activities.
Mentor and guide team members in SRE methodologies, fostering a culture of continuous improvement and operational excellence.

Required Skills for this role:

3+ years of experience in Site Reliability Engineering or related roles.
Experience with Apigee Hybrid, Google Distributed Cloud, Azure, GCP, and Kubernetes.
Advanced DevOps and SRE skills: CI/CD, automation, monitoring, infrastructure as code.
Certificate management scripting and automation.
Proficiency with Ansible for configuration management and orchestration.
Experience with APM tools such as Dynatrace, Splunk
Programming experience with python

Ansible (Software), Apigee Hybrid, API Management, Azure Kubernetes Service (AKS), CI/CD, Dynatrace APM, Google Anthos, Kubernetes, Public Key Infrastructure, Python (Programming Language), Red Hat Enterprise Linux (RHEL), Site Reliability Engineering, Splunk, Terraform, VMware

Senior Site Reliability Engineer / Devops

Share This Job