REQ-675 Application Reliability Specialist (Open)
City : Toronto
Category : Full time
Industry : Financial Services
Employer : Interac
Application Reliability Specialist
You are a hands-on developer responsible for maintaining and supporting Interac’s highly distributed, high-performance payment system. You will diagnose and resolve infrastructure and application issues to ensure optimal performance and availability of all our IT applications and provide root cause analysis with recommendations for improvements. You will also focus on Site (Application) Reliability Engineering activities, including proactive monitoring, responding to alerts and automation. You will be required to lead the investigation calls and be available on call. As part of this role, you will work with senior team members to gather monitoring requirements from stakeholders and deliver solutions utilizing the enterprise monitoring toolset, paving the way to the SRE-based next-generation Interac platform.
You’re great at…
Understanding large-scale Java applications, database architectures, application monitoring and fault management.
Understanding of WebSphere, Docker, Kubernetes, and Cloud Architecture.
Troubleshooting applications by leveraging APM tools like AppDynamics/Dynatrace.
Application support and maintenance of Java/JEE applications.
Having an SRE mindset toward ensuring Application Availability.
Identifying the application monitoring needs or performance issues and instrumenting them appropriately in AppDynamics and Splunk.
Designing and instrumenting AppDynamics monitoring and tuning (health rules, alerts) for various applications.
Understanding of Networking concepts and Network hardware.
Investigate and find root causes through logs on Linux servers.
Be able to write queries and pull reports on Database and Splunk.
Integrations between AppDynamics, Splunk, PagerDuty, and ServiceNow.
Identifying areas of automation for building self-remediation needs.
Creating or updating performance analysis reports & dashboards, Knowledge base.
Proposing and implementing solutions to improve application availability and reliability.
API & Microservices technologies and containers
Who are you?
You have a University Degree in Computer Science Engineering or an equivalent combination of education and experience.
You are experienced in Core Java Object Oriented programming and understand basic Enterprise Integration Patterns.
Overall, you have 5+ years of software development (Java)/ maintenance experience, preferably with experience in payment systems or the banking domain and can test and devise solutions.
You must have a minimum of 3 years’ experience in Application Support, focusing on improving application reliability by enhancing application monitoring, designing, and instrumenting dashboards and tuning alerts, preferably with AppDynamics, and Splunk.
Must be eligible to work for Interac Corp. in Canada in a Full Time Capacity
You have experience in network troubleshooting and Infrastructure tuning.
You have strong SRE Background and focus on improving the reliability and stability of the infrastructure.
You are a strong team player and can communicate across different teams efficiently - verbal and written (technical documentation).
Experience in using CI/CD platform for deployment.
You have debugging expertise in the Java tool stack and good knowledge of REST APIs.
Excellent understanding of ITIL service management processes.
You have experience in supporting high-throughput, low-latency systems.
You have experience in maintaining high-performance, service-oriented architectures.
You are experienced in scripting tools such as Power Shell, BASH, Python, and Ansible.
You must have RDBMS expertise - Oracle or DB2.
You have a solid understanding of different open-source packages, preferably Spring, Apache and data transformation (jaxb2, JSON, XML).
You have participated in the overall delivery of software components as part of an agile development process.
Strong communication and interpersonal skills, with an ability to communicate effectively and professionally.
Willingness to learn new technologies and maintain industry knowledge.
You have a demonstrated ability to achieve successful outcomes when handling difficult situations & customers.
You have a demonstrated ability to manage multiple priorities & follow through on projects to completion.
You can stay organized and deal with information from different sources simultaneously.
Open to new ideas and change initiatives, with an ability to modify the current approach in the face of new demands.
SRE (Site Reliability Engineering) expertise is nice to have.
How we work
We know that exceptional people have great ideas and are passionate about their work. Our culture encourages excellence and actively rewards contributions with:
Connection: You’re surrounded by talented people every day who are driven by their passion of a common goal.
Core Values: They define us. Living them helps us be the best at what we do.
Compensation & Benefits: Pay is driven by individual and corporate performance and we provide a multitude of benefits and perks.
Education: To ensure you are the best at what you do we invest in you