Senior Site Reliability Engineer (Distributed Systems)
One Model
Your work days are brighter here.
At Workday, it all began with a conversation over breakfast. When our founders met at a sunny California diner, they came up with an idea to revolutionize the enterprise software market. And when we began to rise, one thing that really set us apart was our culture. A culture which was driven by our value of putting our people first. And ever since, the happiness, development, and contribution of every Workmate is central to who we are. Our Workmates believe a healthy employee-centric, collaborative culture is the essential mix of ingredients for success in business. That’s why we look after our people, communities and the planet while still being profitable. Feel encouraged to shine, however that manifests: you don’t need to hide who you are. You can feel the energy and the passion, it's what makes us unique. Inspired to make a brighter work day for all and transform with us to the next stage of our growth journey? Bring your brightest version of you and have a brighter work day here.
About the Team
The Data Platform and Observability team is based in Pleasanton,CA; Boston,MA; Atlanta, GA, Dublin, Ireland and Chennai, India. Our focus is on the development of large scale distributed data systems to support critical Workday products and provide real-time insights across Workday’s platforms, infrastructure and applications.The team provides platforms that process 100s of terabytes of data that enable core Workday products and use cases like core HCM, Fins, AI/ML skus, internal data products and Observability. If you enjoy writing efficient software or tuning and scaling large distributed systems you will enjoy working with us.
Do you want to tackle exciting challenges at massive scale across private and public clouds for our 10000+ global customers? Do you want to work with world class engineers and facilitate the development of the next generation Distributed systems platforms? If so, we should chat.
About the Role
The Messaging, Streaming and Caching team is a full-service Distributed Systems Engineering team. We architect and provide async messaging, streaming, and NoSQL platforms and solutions that power the Workday products and SKUs ranging from core HCM, Fins, Integrations, and AI/ML. We develop client libraries and SDK’s that make it easy for teams to build Workday products. We develop automation to deploy and run hundreds of clusters, and we also operate and tune our clusters as well. As a team member you will play a key role in improving our services and encouraging their adoption within Workday's infrastructure both in our private cloud and public cloud. As a member of this team you will design and build new capabilities from inception to deployment to exploit the full power of the core middleware infrastructure and services, and work hand in hand with our application and service teams!
About You
You are a Site Reliability Engineering with a distributed systems background and significant experience in platform technologies like Kafka/RabbitMQ, Redis, Cassandra etc. You have independently led product features and deployed large scale distributed systems clusters.
Basic Qualifications
4-12 years of software engineering experience using one or more of the following: Java/Scala, Golang.
3+ years of development and DevOps experience in designing and operating large-scale deployments of distributed NoSQL & messaging systems.
1+ year of leading a NoSQL technology related product right from conception to deployment and maintenance.
Preferred Qualifications
a consistent track record of technical project leadership and success involving collaborators and interested partners across the enterprise.
expertise in developing distributed system software and deployments that perform well and degrade gracefully under excessive load.
hands-on experience with atleast one or more distributed systems technologies like Kafka/RabbitMQ, Redis, Cassandra
experience learning complex open source service internals via code inspection.
extensive experience with modern software development tools including CI/CD and methodologies like Agile
expertise with configuration management using Chef and service deployment on Kubernetes via Helm and ArgoCD.
experience with Linux system internals and tuning.
experience with distributed system performance analysis and optimization.
strong written and oral communication skills and the ability to explain esoteric technical details clearly to engineers without a similar background.
Pursuant to applicable Fair Chance law, Workday will consider for employment qualified applicants with arrest and conviction records.
Workday is an Equal Opportunity Employer including individuals with disabilities and protected veterans.
Are you being referred to one of our roles? If so, ask your connection at Workday about our Employee Referral process!