Have Copilot read job descriptions and extract out key info you want to know. Click "Analyze All" to try it out. Click on the Copilot's gear icon to customize the prompt.

LA

Senior HPC engineer, Research infrastructure

LumaAi · 30+ days ago

Palo Alto, California

$180-220k

Full-time

Continue

By pressing the button above, you agree to our Terms and Privacy Policy, and agree to receive email job alerts. You can unsubscribe anytime.

Help Luma build some of the biggest & fastest AI supercomputing clusters in the world! As a High-Performance Computing engineer, you’ll work at the intersection of hardware and software, designing systems that deliver the maximum possible performance for running large-scale AI models. We work at the very cutting edge of speed and scale, combining the traditions of High-Performance Computing (HPC) in a modern cloud environment.

For this role, it’s important you understand how to combine CPU’s, GPU’s, and network devices into systems that are then deployed at a large scale to peak efficiency. You understand the lowest levels of the software platforms that sit on top of this hardware, including how to best optimize the Linux kernel and user-space code. You are capable of writing code to automate the monitoring and healing of these systems, commanding a large number of servers with few people.

Responsibilities

In this role, you will work closely with and directly accelerate machine learning researchers, but don't need to be a machine learning expert yourself.
We value people who can quickly obtain a deep technical understanding of new domains and enjoy being self-directed and identifying the most important problems to solve.
You’ll be managing training HPC clusters at Luma from provisioning to performance tuning.
Areas of work will include observability, distributed job tracing, GPU diagnostics, software environment management and additional tooling plus work on the actual code to enable necessary features.
We believe that increasing compute is a huge lever to AI progress. You will have a direct impact on our ability to grow to an unprecedented scale and likewise produce unprecedented results.

Experience

8+ years experience as infrastructure engineer or Devops in large and complex distributed systems.
Deep understanding of networking, bonus points for experience in HPC networking.
Experience developing high-quality software in a general-purpose programming language, preferably including Python.
Excellent problem-solving skills and attention to detail.
Experience with GPUs in large scale clusters is strongly preferred.
Strong knowledge of observability and monitoring in distributed systems.
Tenacious at troubleshooting hardware and network topology failures in distributed systemsIndependently driven and able to own problems and build solutions from end-to-end.
Experience with large scale data center operations, proficiency in cloud orchestration and system tools.
Please note this role is not meant for recent grads.

Compensation

In addition to cash base pay, you'll also receive a sizable grant of Luma's equity.
The pay range for this position is $180000- 220000/yr for Bay Area. Base pay offered will vary depending on job-related knowledge, skills, candidate location, and experience.

Your application is reviewed by real people.

•

Last updated on Oct 7, 2024

About the company

LA

LumaAi

More jobs at LumaAi

Analyzing

Business Development and Strategic Partnerships

Palo Alto, California

·

30+ days ago

Staff Product Designer

Palo Alto, California

·

30+ days ago

AI Agent Engineer

Palo Alto, California

·

30+ days ago

Senior iOS Engineer

Palo Alto, California

·

30+ days ago

Senior Data Scientist

Palo Alto, California

·

30+ days ago

For job seekers

Job searchSearch millions of jobs

LaunchpadNew

Resume to business

Cover Letter StudioGenerate a cover letter

Add to ChatGPTFind and discuss jobs

For employers and recruiters

Resume ScreenerOrganize and rank candidates

Talent PipelinePre-order

Source top talent

Promote a jobReach more candidates

For everyone

Referral programEarn 30% commission

Get mobile appBrowse anywhere

Jobs API

Become a partner

Developed by Blake and Linh in the US and Vietnam.

We're interested in hearing what you like and don't like! Live chat with our founder or join our Discord

Changelog

🚀 LaunchpadNov 27

Create a site and sell services based on your resume.

🔥 Job search dashboardNov 13

Revamped job search UI with a sortable grid, live filtering, bookmarks, and application tracking.

🫡 Cover letter instructionsSep 27

New Studio settings give you control over AI output.

✨ Cover Letter StudioAug 9

Automatically generate cover letters for any job.

🎯 Suggested filtersAug 6

Copilot suggests additional filters above the results.

⚡️ Quick applicationsAug 2

Apply to jobs using info from your resume. Initial coverage of ~200k jobs in Spain, Germany, Austria, Switzerland, France, and the Netherlands.

🧠 Job AnalysisJul 12

Have Copilot read job descriptions and extract out key info you want to know. Click "Analyze All" to try it out. Click on the Copilot's gear icon to customize the prompt.