Senior Site Reliability Engineer
City : TORONTO, Ontario, Canada
Category : Technology | Analytics | Research
Industry : Financial/Banking
Employer : RBC
Come Work with Us!
At RBC, our culture is deeply supportive and rich in opportunity and reward. You will help our clients thrive and our communities prosper, empowered by a spirit of shared purpose.
Whether you’re helping clients find new opportunities, developing new technology, or providing expert advice to internal partners, you will be doing work that matters in the world, in an environment built on teamwork, service, responsibility, diversity, and integrity.
What is the opportunity?
The Enterprise DevOps SRE team is undertaking multiple complex enterprise-wide initiatives as part of RBC’s ongoing plan to improve and standardize application releases for Cloud, Distributed, Mainframe etc. This role will be responsible for the development, implementation, administration, and support of Site Reliability Engineering (SRE) solutions for the Enterprise DevOps tools and CI/CD pipeline supported by the Enterprise DevOps SRE Team.
What will you do?
- Champion Stability and Reliability across DevOps applications and services
- Develop SRE solutions (monitoring, alerting, self-healing and reliability testing)
- Building automated solutions to remove toil.
- Explore & evaluate new technologies and drive innovation by designing/implementing new practices/processes.
- Implement and drive proactive monitoring solutions for internally hosted applications
- Own and develop reports for SRE Metrics (including incident metrics) - gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.
- Identify and establish SLOs and error budgets for DevOps applications
- Assist in incident management and problem management for applications in scope
- Evaluate & iterate continuously – what went well, what went wrong, what can be done to improve and prevent in future
- Spear head blameless post-mortems for the high impact incidents
- Collaborate and contribute on cross-functional enterprise initiatives and manage the effective implementation of assigned deliverables
- Work with necessary stakeholders to mature processes and ensure SRE and ITSM processes are effective and understood
- Identify potential issues, conflicts, and risks. Analyze, mitigate, and escalate to management where appropriate
- Provide guidance to other team members on managing end-to-end availability and performance of mission critical services, on building automation to prevent problem recurrence, and on building automated responses for non-exceptional service conditions
What do you need to succeed?
- Strong problem solving and analytical skills to triage issues
- Experience in Configuration Management (config as code) using Ansible or Terraform
- Have a well-rounded understanding of Linux operating system including command line, firewalls, certificates, PGP encryption and various file transfer protocols e.g., SFTP, AS2, Connect: Direct etc
- Thorough understanding of SRE principles
- Extensive experiencing working with APIs (REST and/ or SOAP endpoints).
- Ability to quickly pick up new tools, programming languages, libraries, frameworks, and other technical concepts as needed.
- Hands-on experience in a variety of Industry standard SRE tools (Ansible, Dynatrace, Moogsoft, PagerDuty, ServiceNow, Slack, Elastic Stack, CatchPoint)
- Deadline-driven and results-oriented; able to meet consistently high-quality standards while handling a variety of tasks and deadlines simultaneously.
- Excellent written and verbal communication skills: ability to deal with key partners across the organization: Business, Operations, Application Development, Maintenance, and Infrastructure Teams
- Computer Engineering, Computer Science, related (technical) degree/diploma, or related breadth of experience
- Exposure to Docker, Kubernetes, Openshift, GitHub, JFrog Artifactory, JFrog Xray, NexusRepo & IQ, DevSecOps, IBM Urbancode Deploy, Jenkins, MongoDB, Jira, Confluence, Jira Service Desk, Databases, PagerDuty.
- Experience in Vendor Management, application development, database, system engineering and/or systems analysis
- Understanding of banking/financial services industry.
- Experience working as a member of an Agile development team.
What’s in it for you?
We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.
- A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
- Leaders who support your development through coaching and managing opportunities
- Ability to make a difference and lasting impact
- Work in a dynamic, collaborative, progressive, and high-performing team
- A world-class training program in financial services
- Flexible work/life balance options
- Opportunities to do challenging work
Inclusion and Equal Opportunity Employment
At RBC, we embrace diversity and inclusion for innovation and growth. We are committed to building inclusive teams and an equitable workplace for our employees to bring their true selves to work. We are taking actions to tackle issues of inequity and systemic bias to support our diverse talent, clients and communities.
We also strive to provide an accessible candidate experience for our prospective employees with different abilities. Please let us know if you need any accommodations during the recruitment process.
Join our Talent Community
Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.
Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at rbc.com/careers.