Role and Responsibilities:
Deploy, Maintain, Enhance and Monitor a highly scalable infrastructure for data processing plaRorm using Kubernetes
Using AWS Cloud and open-source services to address critical business needs
Ensure the 24/7 availability of the system, with proper alerting and monitoring
Identify and fix bugs and performance issues in the plaRorm
Work with agile teams on seXng error budgets, root cause analysis exercises, and blameless post-mortems
Utilize continuous delivery (CI/CD) with Gitlab CI, Jenkins, ArgoCD, Ar:factory, Docker
Data pipeline and application monitoring and failure recovery
Setup and monitor application access and connectivity
Advocate for a DevOps culture of automation, self-service, and engineering best practices to enable development teams
Autoscaling and monitoring performance for Kubernetes and running applications using Prometheus and Grafana or similar tools
Performing all SRE ac:vi:es such as availability and reliability monitoring and reports
Tune, Monitor and configure tools such as Kaaa, Spark, Presto, Airflow, MQTT
Use infrastructure as a service with Terraform
Operate and maintain code repository with GitLab.
Required Qualifications:
Bachelor's degree in Computer Science OR Computer Engineer
Minimum 5+ years of experience in DevOps engineering or software development.
Strong coding and scripting experience with Bash, Python, Go or similar languages.
Comprehensive experience with AWS including a solid understanding of CI/CD, Amazon S3, EC2, IAM, CloudFormation and Route 53
Experience with user access, authentication, user permission management and security, LDAP, AD, OIDC, Kerberos
Experience with secure infrastructure networking with AWS using different types of Load Balancers, seXng up VPCs, subnets, and rou:ng tables
Experience with auto scaling, performance tes:ng and capacity planning.
Experience with tools such as Jenkins, Ar:factory, etc. to build automation, CI/CD, Self- Service pipelines.
Experience owning infrastructure in production, as well as designing and creating build/deploy & monitoring systems using CloudFormation/Terraform
Experience with resRul services, pub/sub communication model, service-oriented architecture, distributed systems, cloud system (AWS) and micro-services architecture paFern.
Preferred Qualifications:
Master's degree in Computer Science OR Computer Engineer
Experience with configuration management tools: Puppet, Chef, Kustomize, or Ansible
Experience with containerization and scheduling, with Docker and Kubernetes.
Strong distributed systems implementation experience
Experience with AWS Direct Connect or seXng up and maintaining a hybrid cloud
Experience with optimizing storage classes, lifecycle rules, instance classes, and throughput tuning to optimize for cost without sacrificing performance
Experience in backend services deployment and management
Last updated on Nov 22, 2023
Newark, California
·30+ days ago
Newark, California
·30+ days ago
Newark, California
·30+ days ago
Newark, California
·30+ days ago
Newark, California
·30+ days ago
Dallas, Texas
·30+ days ago
Atlanta, Georgia
·30+ days ago
Cambridge, Massachusetts
·30+ days ago
San Jose, California
·30+ days ago
Boston, Massachusetts
·30+ days ago
Remote
·30+ days ago
30+ days ago
San Francisco, California
·30+ days ago
San Francisco, California
·30+ days ago
Alpharetta, Georgia
·30+ days ago