Site Reliability Engineer
Job SummaryHardware Engineering is seeking a Site Reliability Engineer to support multiple internal applications. From brainstorming through implementation, the Site Reliability Engineer will work with engineers of several internal tools to build performant and fault tolerant infrastructure in a way that is maintainable, scalable, and reliable. Key Qualifications3+ years experience in a Site Reliability Engineering or Systems Administration roleMust be capable of independent problem-solving and self-direction with an eye towards creating business value for our internal customersSolid grounding in build/release (CI/CD) pipelines, methodologies and tools (Jenkins, Spinnaker, Artifactory)Proven ability to write programs using a high-level programming language as well as modern application server frameworks (Ruby, PHP, Python).Working knowledge of both enterprise and third party cloud environments (OpenStack, AWS)Experience developing, administering and executing disaster recovery solutionsDeep understanding of the Linux Operating SystemExperience with configuration management systems (Chef, Ansible) and provisioning systems (Terraform)Practical knowledge of network technologies, security and communication protocols including TCP/IP, DNS, DHCP, load balancing, firewalls, jump and proxy servers, network file systems, intrusion detection, secret stores, CDNs, and cachingBackground using Kubernetes, Docker, AWS, immutable infrastructure and related technologies in geographically dispersed production environmentsExperience administering and supporting RDBMS (Postgres, MySQL) and NoSQL databasesPassion for eliminating repetitive manual processes using data and automation (e.g. ELK, AWX, etc)Demonstrated experience supporting RESTful, session based web applications using technologies such as Java, Ruby, Rails, Passenger, PHP, Python, Nginx, etcExperience developing and tuning observability platforms and load generation tools in order to proactively identify and implement changes to ensure optimal application availability and performanceCapable of taking part in a 24/7 on-call rotationDescriptionThe core responsibility for this role is to build reliable and easily maintainable infrastructure to be used by our internal tools teams for the development of mission critical enterprise applications. You will often be the first tier of support in outages and must possess strong debugging and problem solving skills in order to quickly identify and resolve issues in production environments.This role will require a strong sense of ownership, customer service, and time management. Communication skills are critical to the success of the person in this position, as you will be required to meet regularly with engineering stakeholders across several teams and thoroughly document decisions and processes related to the deployment of applications and their ongoing management.EducationBA or BS in Computer Science or equivalent degree •
Last updated on Apr 19, 2022