Browse
Employers / Recruiters

Senior SRE Engineer

Fortanix · 30+ days ago
Negotiable
Full-time
Continue
By pressing the button above, you agree to our Terms and Privacy Policy, and agree to receive email job alerts. You can unsubscribe anytime.

As a Senior Site Reliability Engineer at Fortanix, you will be at the forefront of ensuring the reliability, scalability, and performance of our cutting-edge production environments. You’ll design and build operations as code, architecting automated solutions that enhance system stability. Partnering closely with our product engineering teams, you'll have a hands-on role in continuously improving the reliability of our platforms, ensuring our systems are robust and resilient. You'll develop and implement a comprehensive, actionable monitoring framework that detects and prevents issues before they impact our users.

In this role, you'll be a critical part of our production on-call rotation, responding to incidents with agility and executing post-incident reviews to drive continuous improvement. If you’re passionate about automation, enjoy tackling complex reliability challenges, and thrive in a fast-paced, high-impact environment, this role is for you!

Join us to shape the future of secure computing with a focus on building reliable, scalable, and secure production systems.

Key Responsibilities

  • System Architecture & Design
    • Collaborate with software development teams to design scalable, reliable, and secure systems.
    • Architect and build robust infrastructure to handle growth and ensure system uptime.
  • Automation & Infrastructure as Code (IaC)
    • Automate infrastructure deployment and management using tools like Terraform, Ansible, or CloudFormation.
    • Implement continuous integration and continuous deployment (CI/CD) pipelines for automated testing and deployment.
    • Write automation scripts and code for scaling and self-healing systems.
  • Monitoring & Incident Management
    • Design and implement comprehensive monitoring and alerting solutions to detect anomalies and issues before they impact users.
    • Implement logging and observability tools to gain insight into system health and performance (e.g., Prometheus, Grafana, ELK stack).
    • Manage on-call rotations, ensure timely responses to incidents, and perform root cause analysis and post-mortems.
  • Performance Tuning & Optimization
    • Perform load testing and system benchmarking to identify performance bottlenecks.
    • Optimize application and infrastructure performance, reducing latency and improving response times.
  • Security & Compliance
    • Ensure systems are secure by design, incorporating security best practices (e.g., encryption, firewalls, access controls).
    • Stay up-to-date with security vulnerabilities and patch systems accordingly.
    • Implement compliance standards (e.g., GDPR, HIPAA) where applicable.
  • Collaboration & Mentoring
    • Work closely with developers to ensure that applications are designed for reliability and scalability.
    • Serve as a mentor to junior engineers, fostering a culture of reliability and best practices.
    • Collaborate across teams (DevOps, Development, QA) to enhance system robustness.
  • Disaster Recovery & High Availability
    • Develop and maintain disaster recovery and business continuity plans.
    • Ensure systems are highly available, designing systems that can withstand failures without service disruptions.
  • Capacity Planning & Scalability
    • Forecast future system demand and plan for capacity increases as needed.
    • Design infrastructure that scales automatically to handle increased loads.
  • Continuous Improvement & Reliability Culture
    • Analyze incidents and failures to identify opportunities for improving system reliability.
    • Drive a culture of reliability across the engineering organization, advocating for best practices and SRE principles.
  • Cloud & Hybrid Infrastructure Management
    • Manage cloud infrastructure (AWS, GCP, Azure) and hybrid environments, ensuring optimal usage of cloud resources.
    • Implement cost optimization strategies for cloud resources while maintaining performance and reliability.

This role requires a deep understanding of both software engineering and infrastructure management, as well as strong collaboration and problem-solving skills

Requirements

Technical Experience

Demonstrated expertise in modern enterprise Site Reliability Engineering is essential for this role. In addition, experience in the following areas is highly beneficial:

  • Proficiency in Programming/Scripting Languages - Strong coding skills in languages such as Python, Go, or similar. Familiarity with scripting languages like Bash or PowerShell is also important.
  • Problem Solving - Advanced experience with Linux administration and automation. Experience with production debugging and the ability to implement fast workarounds.
  • CI/CD & Devops - Advanced experience in managing software deployment on Cloud via pipelines (example: bitbucket/Gitlab). Understanding DevOps practices on how modern software is deployed, upgraded and monitored.
  • Containers & Orchestration - Strong hands-on experience with container technologies like Docker and Kubernetes, and other orchestration tools like Helm or OpenShift. Experience with both managed (AKS, EKS, GKE.) and unmanaged (on-prem) Kubernetes.
  • Monitoring & Observability - Expertise with monitoring, alerting, and logging tools such as Prometheus, Grafana, Datadog, ELK stack, or similar. Understanding of metrics collection and analysis.
  • Networking/Infra - Solid understanding of networking concepts (TCP/IP, DNS, VPN, load balancing, firewalls, etc.) and network performance tuning in cloud environments. Experience with high-level Network Fnfrastructure for Datacentre and Cloud

Key Requirements

  • Bachelors/Masters in Computer Science, Engineering or a related field.
  • Engineering: 8+ Years of engineering experience with 3+ Years of core Site reliability engineering experience.
  • Experience with managing and resolving high-severity incidents in production environments. Ability to lead post-mortems and implement improvements.
  • Solid understanding of Cloud technologies.
  • Strong experience with automation practices and principles to reduce manual work and improve efficiency.
  • Experience working in a cross-functional team environment, often collaborating with developers, QA, and security teams.
  • Must be a team player.

Certifications (Optional but Preferred)

  • Cloud Certifications: AWS Certified Solutions Architect, Google Cloud Certified - Professional Cloud Architect, Microsoft Certified: Azure Solutions Architect Expert.
  • DevOps Certifications: Certified Kubernetes Administrator (CKA), HashiCorp Terraform Associate, or similar certifications.

Benefits

  • Top range of market compensation
  • A friendly culture that brings the best out of everybody
  • Mediclaim Insurance – Employees and their eligible dependents including dental coverage
  • Personal Accident Insurance
  • Internet Reimbursement
  • Last updated on Aug 23, 2024

    See more

    About the company

    F
    FortanixFortanix is a cloud security company that provides data encryption, tokenization, and hardware security module services for enterprises. Their solutions protect sensitive data in cloud environments.

    More jobs at Fortanix

    Analyzing

    Bengaluru, Karnataka

     · 

    30+ days ago

    Bengaluru, Karnataka

     · 

    30+ days ago

    Bengaluru, Karnataka

     · 

    30+ days ago

    Bengaluru, Karnataka

     · 

    30+ days ago

    More jobs like this

    Analyzing
    UI/Sr.UI Developer
    AV
    AUC Ventures ·  Venture capital firm

    Bengaluru, Karnataka

     · 

    30+ days ago

    iOS Lead Developer
    Cityflo ·  Urban transportation and logistics

    Mumbai, Maharashtra

     · 

    30+ days ago

    DBA
    SB
    SaleBuild ·  B2B lead generation and marketing

    Pune, Maharashtra

     · 

    30+ days ago

    PHP Developer / Sr. PHP Developer
    P
    Propstack ·  Real estate data and analytics

    Mumbai, Maharashtra

     · 

    30+ days ago

    New Delhi, Delhi

     · 

    30+ days ago

    Javascript Developer (Frontend - React)
    V
    Vinsol ·  Web and mobile app development

    Remote

     · 

    30+ days ago

    Senior PhoneGap Developer (Ionic)
    Logic Square ·  IT solutions and services company

    Kolkata, West Bengal

     · 

    30+ days ago

    Experienced developer for SaaS application
    Essel Environmental Engineering and Consulting ·  Environmental consulting services

    Remote

     · 

    30+ days ago

    Marunji, Maharashtra

     · 

    30+ days ago

    IVR Developer
    I
    interactcrm

    Mumbai, Maharashtra

     · 

    30+ days ago

    Developed by Blake and Linh in the US and Vietnam.
    We're interested in hearing what you like and don't like! Live chat with our founder or join our Discord
    Changelog
    🚀 LaunchpadNov 27
    Create a site and sell services based on your resume.
    🔥 Job search dashboardNov 13
    Revamped job search UI with a sortable grid, live filtering, bookmarks, and application tracking.
    🫡 Cover letter instructionsSep 27
    New Studio settings give you control over AI output.
    ✨ Cover Letter StudioAug 9
    Automatically generate cover letters for any job.
    🎯 Suggested filtersAug 6
    Copilot suggests additional filters above the results.
    ⚡️ Quick applicationsAug 2
    Apply to jobs using info from your resume. Initial coverage of ~200k jobs in Spain, Germany, Austria, Switzerland, France, and the Netherlands.
    🧠 Job AnalysisJul 12
    Have Copilot read job descriptions and extract out key info you want to know. Click "Analyze All" to try it out. Click on the Copilot's gear icon to customize the prompt.
    © 2024 RemoteAmbitionAffiliate · Privacy · Terms · Sitemap · Status