Browse
Employers / Recruiters

HPC Infrastructure Engineer

thinkahead · 30+ days ago
Negotiable
Full-time
Continue
By pressing the button above, you agree to our Terms and Privacy Policy, and agree to receive email job alerts. You can unsubscribe anytime.
AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help enterprises deliver on the promise of digital transformation.

At AHEAD, we prioritize creating a culture of belonging, where all perspectives and voices are represented, valued, respected, and heard. We create spaces to empower everyone to speak up, make change, and drive the culture at AHEAD. 

We are an equal opportunity employer, and do not discriminate based on an individual's race, national origin, color, gender, gender identity, gender expression, sexual orientation, religion, age, disability, marital status, or any other protected characteristic under applicable law, whether actual or perceived. 

We embrace all candidates that will contribute to the diversification and enrichment of ideas and perspectives at AHEAD. 

The High-Performance Computing Infrastructure Engineer is primarily responsible for the overall health and maintenance of storage technologies in our managed services customer's environments. Our HPC Infrastructure Engineers are a valued member of the Managed Services Infrastructure Practice responsible for Tier 3 incident management, service request management and change management infrastructure support for all Managed Services customers. 

Roles & Responsibilities

  • Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities 
  • Plan and perform maintenance activities 
  • Assess customer environments for performance and design issues and propose resolutions 
  • Work across technical teams to troubleshoot complex infrastructure issues 
  • Create and maintain detailed documentation 
  • Serve as a subject matter expert and escalation point for storage technologies 
  • Work with vendors to resolve storage issues 
  • Communicate with customers and internal team with transparency 
  • Participate in on-call rotation 
  • Completion of training and certification as assigned to further skills and knowledge 

Skills Required

  • Bachelor’s degree or equivalent Information Systems or related field. Unique education, specialized experience, skills, knowledge, training, or certification may be substituted for education 
  • 5+ years of expert level experience managing infrastructure in high-performance computing environments including configuration, troubleshooting, and best practice. 
  • 1+ years of experience with Nvidia DGX preferred. 
  • Experience with high-performance computing (HPC) schedulers (e.g., SLURM, PBS, Torque) required. 
  • Experience configuring, maintaining and troubleshooting Kubernetes. 
  • Experience with storage technology (e.g., Ceph, Vast Data Platform) and distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS). 
  • Experience with machine learning or data science workflows in HPC/AI environments 
  • Advances experience with Linux operating systems. 
  • Experience configuring, maintaining and troubleshooting Nvidia/Mellanox (Cumulus OS) switches a plus 
  • Experience with both ethernet and InfiniBand networking a plus. 
  • 1+ years working with monitoring platforms (e.g., Prometheus, Grafana); Elastic Observability experience is a bonus 
  • 1+ years working with an enterprise ITSM system: Service Now is a bonus 
  • Previous experience with automation tools such as Ansible, Puppet, or Chef a plus. 
  • Managed Services or consulting experience is required. 
  • Strong background with customer service 
  • High level problem-solving and communication skills 
  • Strong oral and written communications skills 
  • Related network certifications are a bonus. 
Why AHEAD:

Through our daily work and internal groups like Moving Women AHEAD and RISE AHEAD, we value and benefit from diversity of people, ideas, experience, and everything in between.

We fuel growth by stacking our office with top-notch technologies in a multi-million-dollar lab, by encouraging cross department training and development, sponsoring certifications and credentials for continued learning.

USA Employment Benefits include: 
- Medical, Dental, and Vision Insurance 
- 401(k) 
- Paid company holidays 
- Paid time off 
- Paid parental and caregiver leave 
- Plus more! See benefits https://www.aheadbenefits.com/ for additional details. 

The compensation range indicated in this posting reflects the On-Target Earnings (“OTE”) for this role, which includes a base salary and any applicable target bonus amount. This OTE range may vary based on the candidate’s relevant experience, qualifications, and geographic location.  

Last updated on Feb 20, 2025

See more

About the company

More jobs at thinkahead

Analyzing

 · 

30+ days ago

 · 

30+ days ago

 · 

30+ days ago

 · 

30+ days ago

More jobs like this

Analyzing

Dallas, Texas

 · 

30+ days ago

Senior Cloud Database Architect
B
b6jdnwcpcemgg8el3r9winlpunj8hc038b1vkhowrzxn9gitznreodi38t7rirkp

Atlanta, Georgia

 · 

30+ days ago

Consulting Software Engineer (715669)
R
rsjdnwc9jel4i3xyjsm3m8vnhrmayk037bphn44zg3i1bl3dcjtqhqlclsisinpr

Cambridge, Massachusetts

 · 

30+ days ago

Database Architect
TT
The Talently ·  AI recruitment platform

San Jose, California

 · 

30+ days ago

Front End Developer
G
Grapevine ·  Influencer marketing platform for YouTube

Boston, Massachusetts

 · 

30+ days ago

Salesforce Developer
B
b8jdnwfetm91aeh4xxktytk2xff310011dbi7c94iwf3w4g8qka7cjkc4daepyd7

 · 

30+ days ago

Senior Software Engineer, Infrastructure Security$202-316k
Asana ·  Collaboration software for teams

San Francisco, California

 · 

30+ days ago

San Francisco, California

 · 

30+ days ago

JDA Developer
C
crjdnwsnowo2i4nz45b1teboszrxlg0351vr73gpqw7yanury9u287prckhdnkww

Alpharetta, Georgia

 · 

30+ days ago

Developed by Blake and Linh in the US and Vietnam.
We're interested in hearing what you like and don't like! Live chat with our founder or join our Discord
Changelog
🚀 LaunchpadNov 27
Create a site and sell services based on your resume.
🔥 Job search dashboardNov 13
Revamped job search UI with a sortable grid, live filtering, bookmarks, and application tracking.
🫡 Cover letter instructionsSep 27
New Studio settings give you control over AI output.
✨ Cover Letter StudioAug 9
Automatically generate cover letters for any job.
🎯 Suggested filtersAug 6
Copilot suggests additional filters above the results.
⚡️ Quick applicationsAug 2
Apply to jobs using info from your resume. Initial coverage of ~200k jobs in Spain, Germany, Austria, Switzerland, France, and the Netherlands.
🧠 Job AnalysisJul 12
Have Copilot read job descriptions and extract out key info you want to know. Click "Analyze All" to try it out. Click on the Copilot's gear icon to customize the prompt.
© 2024 RemoteAmbitionAffiliate · Privacy · Terms · Sitemap · Status