Have Copilot read job descriptions and extract out key info you want to know. Click "Analyze All" to try it out. Click on the Copilot's gear icon to customize the prompt.

F

Senior SRE Engineer

Fortanix · 30+ days ago

Bengaluru, India

Negotiable

Full-time

Continue

By pressing the button above, you agree to our Terms and Privacy Policy, and agree to receive email job alerts. You can unsubscribe anytime.

As a Senior Site Reliability Engineer at Fortanix, you will be at the forefront of ensuring the reliability, scalability, and performance of our cutting-edge production environments. You’ll design and build operations as code, architecting automated solutions that enhance system stability. Partnering closely with our product engineering teams, you'll have a hands-on role in continuously improving the reliability of our platforms, ensuring our systems are robust and resilient. You'll develop and implement a comprehensive, actionable monitoring framework that detects and prevents issues before they impact our users.

In this role, you'll be a critical part of our production on-call rotation, responding to incidents with agility and executing post-incident reviews to drive continuous improvement. If you’re passionate about automation, enjoy tackling complex reliability challenges, and thrive in a fast-paced, high-impact environment, this role is for you!

Join us to shape the future of secure computing with a focus on building reliable, scalable, and secure production systems.

Key Responsibilities

System Architecture & Design

Collaborate with software development teams to design scalable, reliable, and secure systems.
Architect and build robust infrastructure to handle growth and ensure system uptime.

Automation & Infrastructure as Code (IaC)

Automate infrastructure deployment and management using tools like Terraform, Ansible, or CloudFormation.
Implement continuous integration and continuous deployment (CI/CD) pipelines for automated testing and deployment.
Write automation scripts and code for scaling and self-healing systems.

Monitoring & Incident Management

Design and implement comprehensive monitoring and alerting solutions to detect anomalies and issues before they impact users.
Implement logging and observability tools to gain insight into system health and performance (e.g., Prometheus, Grafana, ELK stack).
Manage on-call rotations, ensure timely responses to incidents, and perform root cause analysis and post-mortems.

Performance Tuning & Optimization

Perform load testing and system benchmarking to identify performance bottlenecks.
Optimize application and infrastructure performance, reducing latency and improving response times.

Security & Compliance

Ensure systems are secure by design, incorporating security best practices (e.g., encryption, firewalls, access controls).
Stay up-to-date with security vulnerabilities and patch systems accordingly.
Implement compliance standards (e.g., GDPR, HIPAA) where applicable.

Collaboration & Mentoring

Work closely with developers to ensure that applications are designed for reliability and scalability.
Serve as a mentor to junior engineers, fostering a culture of reliability and best practices.
Collaborate across teams (DevOps, Development, QA) to enhance system robustness.

Disaster Recovery & High Availability

Develop and maintain disaster recovery and business continuity plans.
Ensure systems are highly available, designing systems that can withstand failures without service disruptions.

Capacity Planning & Scalability

Forecast future system demand and plan for capacity increases as needed.
Design infrastructure that scales automatically to handle increased loads.

Continuous Improvement & Reliability Culture

Analyze incidents and failures to identify opportunities for improving system reliability.
Drive a culture of reliability across the engineering organization, advocating for best practices and SRE principles.

Cloud & Hybrid Infrastructure Management

Manage cloud infrastructure (AWS, GCP, Azure) and hybrid environments, ensuring optimal usage of cloud resources.
Implement cost optimization strategies for cloud resources while maintaining performance and reliability.

This role requires a deep understanding of both software engineering and infrastructure management, as well as strong collaboration and problem-solving skills

Requirements

Technical Experience

Demonstrated expertise in modern enterprise Site Reliability Engineering is essential for this role. In addition, experience in the following areas is highly beneficial:

Proficiency in Programming/Scripting Languages - Strong coding skills in languages such as Python, Go, or similar. Familiarity with scripting languages like Bash or PowerShell is also important.
Problem Solving - Advanced experience with Linux administration and automation. Experience with production debugging and the ability to implement fast workarounds.

CI/CD & Devops - Advanced experience in managing software deployment on Cloud via pipelines (example: bitbucket/Gitlab). Understanding DevOps practices on how modern software is deployed, upgraded and monitored.
Containers & Orchestration - Strong hands-on experience with container technologies like Docker and Kubernetes, and other orchestration tools like Helm or OpenShift. Experience with both managed (AKS, EKS, GKE.) and unmanaged (on-prem) Kubernetes.
Monitoring & Observability - Expertise with monitoring, alerting, and logging tools such as Prometheus, Grafana, Datadog, ELK stack, or similar. Understanding of metrics collection and analysis.

Networking/Infra - Solid understanding of networking concepts (TCP/IP, DNS, VPN, load balancing, firewalls, etc.) and network performance tuning in cloud environments. Experience with high-level Network Fnfrastructure for Datacentre and Cloud

Key Requirements

Bachelors/Masters in Computer Science, Engineering or a related field.
Engineering: 8+ Years of engineering experience with 3+ Years of core Site reliability engineering experience.
Experience with managing and resolving high-severity incidents in production environments. Ability to lead post-mortems and implement improvements.
Solid understanding of Cloud technologies.

Strong experience with automation practices and principles to reduce manual work and improve efficiency.
Experience working in a cross-functional team environment, often collaborating with developers, QA, and security teams.
Must be a team player.

Certifications (Optional but Preferred)

Cloud Certifications: AWS Certified Solutions Architect, Google Cloud Certified - Professional Cloud Architect, Microsoft Certified: Azure Solutions Architect Expert.
DevOps Certifications: Certified Kubernetes Administrator (CKA), HashiCorp Terraform Associate, or similar certifications.

Benefits

Top range of market compensation

A friendly culture that brings the best out of everybody

Mediclaim Insurance – Employees and their eligible dependents including dental coverage

Personal Accident Insurance

Internet Reimbursement

•

Last updated on Aug 23, 2024

About the company

F

FortanixFortanix is a cloud security company that provides data encryption, tokenization, and hardware security module services for enterprises. Their solutions protect sensitive data in cloud environments.

More jobs at Fortanix

Analyzing

Sales Engineer (remote US - OH, IN, IL or MI)

Remote

·

30+ days ago

Customer Success Engineer

Bengaluru, Karnataka

·

30+ days ago

SDET Engineer - Armor Platform Team

Bengaluru, Karnataka

·

30+ days ago

Design Intern

Bengaluru, Karnataka

·

30+ days ago

Operations Service Delivery Lead

Bengaluru, Karnataka

·

30+ days ago

More jobs like this

Analyzing

UI/Sr.UI Developer

AV

AUC Ventures · Venture capital firm

Bengaluru, Karnataka

·

30+ days ago

iOS Lead Developer

Cityflo · Urban transportation and logistics

Mumbai, Maharashtra

·

30+ days ago

DBA

SB

SaleBuild · B2B lead generation and marketing

Pune, Maharashtra

·

30+ days ago

PHP Developer / Sr. PHP Developer

P

Propstack · Real estate data and analytics

Mumbai, Maharashtra

·

30+ days ago

Senior Software Engineer- Android

H

harappa

New Delhi, Delhi

·

30+ days ago

Javascript Developer (Frontend - React)

V

Vinsol · Web and mobile app development

Remote

·

30+ days ago

Senior PhoneGap Developer (Ionic)

Logic Square · IT solutions and services company

Kolkata, West Bengal

·

30+ days ago

Experienced developer for SaaS application

Essel Environmental Engineering and Consulting · Environmental consulting services

Remote

·

30+ days ago

Senior Software Engineer – Test Automation

7

7601

Marunji, Maharashtra

·

30+ days ago

IVR Developer

I

interactcrm

Mumbai, Maharashtra

·

30+ days ago

For job seekers

Job searchSearch millions of jobs

LaunchpadNew

Resume to business

Cover Letter StudioGenerate a cover letter

Add to ChatGPTFind and discuss jobs

For employers and recruiters

Resume ScreenerOrganize and rank candidates

Talent PipelinePre-order

Source top talent

Promote a jobReach more candidates

For everyone

Referral programEarn 30% commission

Get mobile appBrowse anywhere

Jobs API

Become a partner

Developed by Blake and Linh in the US and Vietnam.

We're interested in hearing what you like and don't like! Live chat with our founder or join our Discord

Changelog

🚀 LaunchpadNov 27

Create a site and sell services based on your resume.

🔥 Job search dashboardNov 13

Revamped job search UI with a sortable grid, live filtering, bookmarks, and application tracking.

🫡 Cover letter instructionsSep 27

New Studio settings give you control over AI output.

✨ Cover Letter StudioAug 9

Automatically generate cover letters for any job.

🎯 Suggested filtersAug 6

Copilot suggests additional filters above the results.

⚡️ Quick applicationsAug 2

Apply to jobs using info from your resume. Initial coverage of ~200k jobs in Spain, Germany, Austria, Switzerland, France, and the Netherlands.

🧠 Job AnalysisJul 12

Have Copilot read job descriptions and extract out key info you want to know. Click "Analyze All" to try it out. Click on the Copilot's gear icon to customize the prompt.