JOB DESCRIPTION
Position Name: ystems Analyst HPC and Linux
Position Summary
This position's primary responsibility is to manage cloud native infrastructure and underlying services. As a High-Performance Cluster (HPC) Systems Administrator on the ACIS team, this role leads efforts to incorporate open source tools, automation, virtualization, and cloud resources for the compute environment. This role will leverage key infrastructure technologies (e.g., Linux and containerization) to create operational efficiencies that allow researchers and software developers to focus on their core responsibilities. This position is responsible for implementing continuous integration and delivery to limit manual testing and troubleshooting. The HPC Cluster Systems Admin provides technical support and guidance for RENCI projects that involve local and national collaborations.
Required Qualifications
Strong experience with Linux
Strong experience with traditional HPC environments
Experience with high-speed interconnects, such as Infiniband
Experience with HPC performance management tools, SLURM
Experience with system provisioning tools, such as xCAT
Experience with Configuration Management tools such as Ansible, Puppet or Chef
Working knowledge of tools like Git, GitHub and DockerHub
Working knowledge with one or more programming tools: Bash or Python
Experience with code compilation tools, such as GCC, Client, etc.
General operational experience, including incident and problem management, configuration management, capacity management, vulnerability management, and troubleshooting IT infrastructure issues
Preferred Qualifications
Working knowledge of public cloud platforms
Working knowledge of network technologies, such as Cisco
Working knowledge of infrastructure technologies, including DNS, DHCP, NFS, etc.
Working knowledge of container technologies (Docker or Singularity)
Experience with enterprise storage systems (NetApp, Isilon)
Working knowledge of security frameworks such as CIS and NIST
Working knowledge of monitoring technologies, such as Nagios
Principal F***tions
60%
Enterprise Computing Architecture
Design, develop, and deploy scientific research solutions utilizing commodity and specialty IT components. Evaluate, install, port, and support software on a variety of platforms. Architect and deploy research computational and data infrastructure. Provide architectural guidance to further scientific research and optimize use of RENCI resources.
20%
Enterprise Computing Operations 1057 2991
Provide primary and secondary support and maintenance for a dynamic enterprise IT infrastructure, including compute, storage, virtualization, networking, and identity management systems. Create and maintain a user-friendly computational environment. This person will assist other staff members, graduate research assistants, post-doctoral research associates, and external collaborators in utilizing RENCI computational resources in the most effective manner.
15%
Documentation, Training, and Knowledge Sharing
Create and maintain documentation for current and future infrastructure and services.
Maintain technology skills and capabilities through continuing education through a variety of online, virtual, and in-person training opportunities.
Provide training to team members and peer employees on RENCI research infrastructure and cybersecurity.
Actively participate in bidirectional knowledge sharing with team members for career development and organizational effectiveness.
5%
Other duties as assigned. •
Last updated on Aug 8, 2023