Site Reliability Engineer
ViralMoment
This job is no longer accepting applications
See open jobs at ViralMoment.See open jobs similar to "Site Reliability Engineer" Techstars.Location: Remote
About ViralMoment:
ViralMoment is an AI social listening platform that analyzes social videos to identify trending topics and provide insights to brands and agencies. Our mission is to help our clients stay ahead of the curve by leveraging cutting-edge AI technology.
About the Role:
We are seeking a Site Reliability Engineer to join our dynamic team at ViralMoment. In this critical role, you will be responsible for optimizing our cloud infrastructure for scaling, and ensuring high reliability, performance, and availability of our AI-driven platform. Reporting directly to the CTO, you will have the opportunity to influence architectural decisions and lead initiatives for a multi-cloud environment.
Key Responsibilities:
- Optimize and manage our cloud infrastructure, focusing on scalability, performance, and reliability
- Develop and enhance observability systems for monitoring and alerting
- Ensure the stability and efficiency of large-scale systems through effective DevOps practices
- Handle multi-cloud environments, primarily AWS, with potential implementations on GCP and Azure
- Collaborate with engineering teams to integrate and optimize backend processes
- Research and implement systematic solutions for large model applications
- Maintain and improve system performance through proactive monitoring and troubleshooting
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related fields from an accredited institution
- At least 5 years of experience in a similar role, focusing on cloud infrastructure and site reliability
- Proficiency in cloud-native technologies and strong understanding of the relevant technology stack
- Expertise in AWS, with additional knowledge of GCP and Azure preferred
- Strong programming skills in Python with the ability to it them proficiently in a professional setting
- Familiarity with infrastructure as code, particularly Terraform
- Experience with large-scale cluster management and cloud-native technologies for log collection, monitoring, and alerting
- Prior experience in constructing and maintaining stability systems for large-scale infrastructures
- Experience with infrastructure as code, especially Terraform
- Proven track record in operating and maintaining large-scale systems
- A pivotal role in a rapidly growing startup at the forefront of AI technology
- Direct impact on the platform's performance and scalability that supports major global brands
- Remote work flexibility with a supportive and dynamic team environment
- Competitive salary and opportunities for advancement and leadership
If you are passionate about optimizing cloud infrastructure and ensuring system reliability, we encourage you to apply. Please submit your resume highlighting your experience with cloud platforms, programming languages, and system reliability.
Powered by JazzHR
8IBASh9g40
This job is no longer accepting applications
See open jobs at ViralMoment.See open jobs similar to "Site Reliability Engineer" Techstars.