The Xsolla DevOps team is looking for a passionate Senior Site Reliability Engineer.
Xsolla Technology Stack: Ubuntu, Kubernetes, Gitlab, Terraform, Terragrunt, Puppet, Nginx, Google Cloud Platform, Prometheus, Grafana, New Relic, ELK, Zabbix, Artifactory and Harbor.
- Ensure high reliability and availability and meet SLAs, SLOs, and SLIs.
- Monitor the system for issues and respond to incidents, ensuring quick resolution to maintain high system availability.
- Drive incident resolution and process improvements to minimize downtime and increase operational transparency.
- Ensure all key services are measured, monitored and raising alerts when needed.
- Develop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like Kubernetes, Prometheus, Grafana, New Relic and others.
- Support services before they go live through activities such as capacity planning, monitoring setup, logging, and production readiness reviews.
- Engage in service capacity planning and demand forecasting, performance analysis, and system tuning.
- Collaborate with the development teams to enhance the product's operational stability.
- Build and drive the automation systems that maintain system health.
- Proven experience as a Site Reliability Engineer, or similar Software Engineering role in a large-scale production environment (3+ years). 6+ years overall in IT area (as Ops or Developer).
- Proficiency in scripting languages such as Python, Bash. Strong understanding of Go and PHP will be a plus.
- Deep knowledge of monitoring systems such as Prometheus, Grafana, New Relic or Datadog.
- Good understanding of continuous integration/continuous delivery processes and platforms (Gitlab preferred). Experience with Helm.
- Experience with Docker, Kubernetes, or other container orchestration systems.
- Familiarity with infrastructure automation tools like Terraform.
- Experience with automation, system administration, and system hardening.
- Experience with Linux-based infrastructures, Linux/Unix administration.
- Demonstrated problem-solving skills, particularly debugging and troubleshooting complex software systems. Ability to work under pressure.
- Excellent communication skills with a capacity to articulate and solve complex technical problems.
- NICE TO HAVE:
- Prometheus Certified Associate (PCA)
- HashiCorp Certifications
- Certified Kubernetes Administrator or Developer
Xsolla is a global video game commerce company with a robust and powerful set of tools and services designed specifically for the video game industry. Since its founding in 2005, Xsolla has helped thousands of game developers and publishers of all sizes fund, market, launch and monetize their games globally and across multiple platforms. As an innovative leader in in-game commerce, Xsolla’s mission is to solve the inherent complexities of global distribution, marketing, and monetization to help our partners reach more geographies, generate more revenue and create relationships with gamers worldwide. Xsolla is headquartered and incorporated in Los Angeles, California, with offices in Berlin, Seoul, and cities worldwide. Xsolla supports major gaming titles like Valve, Twitch, Roblox, Ubisoft, Epic Games, Take-Two, KRAFTON, Nexters, NetEase, Playstudios, Playrix, miHoYo, and more.
For additional information and to learn more, please visit xsolla.com
Longevity Opportunity Vision Enjoy the game.
Last updated on Oct 26, 2023