(CAN) GT – Principal Site Reliability Developer (Service Management)
City : Toronto
Category : Regular/Permanent Technology
Industry : Retail
Employer : Walmart Canada
Position Summary...This position is responsible for the overall site operations of Walmart Global Tech’s Platform Services, and the Overall Health, Stability, and Visualization of the Platforms.
Building the right technology foundation for Infrastructure & platforms is vital to success at the scale of Walmart. Our team builds and maintains the foundational technologies that support the tech organization. Included in this are data platforms, enterprise architecture, DevOps, cloud computing, and infrastructure. All of these products and services are supported by scalable and powerful infrastructure, ensuring a secure and seamless employee and customer experience across stores, digital channels, and distribution centers.
What you'll do...
• Providing timely and detailed reports of infrastructure changes, service outages, or degradation of services
• Provide engineering support and assist in technical discussions, proposals, development, & management in the managed services arena and improve the incident management metrics (MTTD, MTTE, MTTM, MTTR)
• Prepare detailed solution design documents.
• Creates and delivers weekly reports to stakeholders describing network status, trends, recommendations, and opportunities for improvements.
• Ensures that all Service Level Agreements (SLAs)/ Objectives (SLO) are met.
• Ensures the stakeholders are receiving all Root Cause Analyses (RCAs) and supporting post-mortem for support issues.
• Acts as the support escalation point for their platform services overseeing resolution and client satisfaction.
• Understands and interprets data points collected in various systems, including but not limited to observability, dashboard, slack channels, Xmatters, ServiceNow
• Attends and presents during health-check meetings.
• Creates specific knowledge base, and runbooks and ensures that it is maintained
• Service Delivery management Deck - The Service Delivery Deck is produced on a weekly/monthly basis and includes KPIs, key messages, and scorecards regarding application health, incidents and change management, infrastructure, application, and data service quality.
• Relationship Management Lead is expected to work side-by-side with their Sr Engineers
• Production Change Quality Adherence Engineer Lead is responsible for assessing change impact and quality across the production environment.
• Champions process improvements to reduce customer impacts, improves system availability/resiliency, and monitors capabilities for change success and cost efficiencies.
• Ensures business and technology initiatives are planned and executed to meet our business and IT objectives.
• Specify and develop enhancements of current service delivery platform services for future platforms and integration with an in-vehicle connectivity solution
• Manage customer experience strategies, proposing improvements to business requirements, functional specifications, and managing the successful market launch
• Creates platform-specific support processes.
• Proficiency in Microsoft Windows Operating Systems including supporting technologies such as Active Directory and Group Policy
• Strong customer service background and the ability to communicate highly technical topics to both technical and non-technical stakeholders
• Continuous integration and automated testing of developed code
• Configuration, management, and performance tuning of App/Web tier (Apache, JBoss, Tomcat)
• Strong technical background in Go Lang and JAVA
• Encourage peers to reach to achieve their greatest potential
• Must have in-depth knowledge with at least 5 years minimum experience in network management and customer support field
• Experience building Cloud Management, DevOps, and/or ITOM (IT operations management) software
What you'll bring:
• 10 years of experience in site reliability engineering, site and system administration, infrastructure management, or related area.
• Google SRE certification (for example, IBM Cloud Site Reliability Engineer), ITIL V4, RHCE, Google, Microsoft Azure, RHCSA, MCITP, MCSE, CCNP
• 10+ years’ experience working with configuration and systems management processes including operating system imaging, application packaging, software distribution, patch management, and scripting.
• 10+ years’ experience with Traffic Shaping, Failover, Resiliency, Failover, and Disaster Recovery
• 10+ years of experience in service-oriented, automated incident, problem, and change management.
• 10+ years of experience in Event Management, Anomaly detection, Health Status Dashboard, Intelligent Alerting, Investigation, and RCA
• Expertise in Platform Service Deployment/Monitoring/Rollback Practice
• Expertise in Production deployment update (Canary-based), Traffic Shaping, Failover, and Load Balancing
• 5+ years in Linux, Containerization, and Kubernetes skills
• 5+ years of experience in Azure, and Google Cloud Infrastructure services
• 5+ years of experience in Data Platform services (Kafka, Cassandra, Azure SQL, MS SQL Server, Mega cache)
• 5+ years of experience in DevOps, CI/CD
• 5+ years of experience in observability (Metric, Logs, Traces) – Prometheus, Grafana, Splunk
About Walmart Global Tech
Imagine working in an environment where one line of code can make life easier for hundreds of millions of people and put a smile on their face. That’s what we do at Walmart Global Tech. We’re a team of 15,000+ software engineers, data scientists and service professionals within Walmart, the world’s largest retailer, delivering innovations that improve how our customers shop and empower our 2.3 million associates. To others, innovation looks like an app, service, or some code, but Walmart has always been about people. People are why we innovate, and people power our innovations. Being human-led is our true disruption. We train our team in the skillsets of the future and bring in experts like you to help us grow. We have roles for those chasing their first opportunity as well as those looking for the opportunity that will define their career. Here, you can kickstart a great career in tech, gain new skills and experience for virtually every industry, or leverage your expertise to innovate at scale, impact millions and reimagine the future of retail.
Flexible, hybrid work
We use a hybrid way of working that is primarily virtual, while remaining near the locations Global Tech calls home. This approach helps us make quicker decisions, remove location barriers across our global team, be more flexible in our personal lives and spend less time commuting. Of course, being together in person is an important part of our culture and shared success. We use our campuses to collaborate and be together in person, as business needs require and for development and networking opportunities.
Beyond our great compensation package, you can receive incentive awards for your performance. Other great perks include 401(k) match, stock purchase plan, paid maternity and parental leave, PTO, multiple health plans, and much more.
Equal Opportunity Employer:
Walmart, Inc. is an Equal Opportunity Employer – By Choice. We believe we are best equipped to help our associates, customers and the communities we serve live better when we really know them. That means understanding, respecting and valuing diversity- unique styles, experiences, identities, ideas and opinions – while being inclusive of all people.
The above information has been designed to indicate the general nature and level of work performed in the role. It is not designed to contain or be interpreted as a comprehensive inventory of all responsibilities and qualifications required of employees assigned to this job. The full Job Description can be made available as part of the hiring process.
Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.Age – 16 or older
Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
Walmart will accommodate the disability-related needs of applicants and associates as required by law.
Primary Location…1940 Argentia Rd, Mississauga, ON L5N 1P9, Canada