Have Copilot read job descriptions and extract out key info you want to know. Click "Analyze All" to try it out. Click on the Copilot's gear icon to customize the prompt.

R

DL Communications Collectives SW Engineer

rivosinc · 30+ days ago

(US&UK) Santa Clara CA , Austin TX, Portland OR, or Boulder CO, Cambridge UK, Remote (North America & Europe)

Negotiable

Full-time

Continue

By pressing the button above, you agree to our Terms and Privacy Policy, and agree to receive email job alerts. You can unsubscribe anytime.

We are working on software to improve the Deep Learning ecosystem and help hardware engineers build great Deep Learning parallel systems.

We are looking for a strong candidate with a background in writing systems software for networking devices (and optionally Linux kernel networking stack or network drivers). Someone who's implemented network protocols or has worked on OpenMPI.This role involves designing and implementing highly optimized communication collectives libraries similar to UCC (Unified Collective Communication) and NCCL (NVIDIA Collective Communications Library). The ideal candidate will work closely with hardware and software teams to ensure efficient data communication and synchronization across multiple AI accelerators in a distributed system, enabling scalable deep learning and high-performance computing applications.

You will be learning technical and organizational skills from industry veterans: how to write performant and readable code; how to structure and communicate projects, ideas, and progress; how to work effectively with the Open Source community.

We are big proponents of Open Source and Free software and contribute back our improvements to all the great projects we use.

We prefer candidates who work out of one of our offices, but will consider remote candidates as well.

Responsibilities

Build-up communication components of an AI Software Stack
Port AI Software to run on a new H/W platform
Profiling and tuning of communications within AI applications
Design, develop, and optimize communication collectives (e.g., AllReduce, AllGather, Broadcast, ReduceScatter) for large-scale distributed computing and machine learning frameworks.
Implement and optimize communication algorithms (ring, tree, butterfly, etc.) tailored for our architectures and multi-node clusters.
Ensure low-latency, high-bandwidth communication across multi-GPU setups, supporting interconnects such as PCIe and Infiniband.
Collaborate with hardware engineers and other software teams to optimize performance.
Implement fault tolerance and scalability mechanisms in distributed systems to handle large-scale workloads.
Write unit tests and benchmark tools to validate the performance and correctness of collective operations.
Stay current with advancements in hardware and networking technologies to continuously improve the library's performance.

Requirements

Strong understanding of GPU architectures (CUDA, AMD ROCm) and experience in GPU programming (CUDA, HIP, or similar).
Proficiency in designing and implementing parallel and distributed algorithms, particularly communication collectives.
Experience with network interconnects (NVLink, PCIe, Infiniband, RDMA) and understanding of their performance implications.
Hands-on experience with communication collectives libraries like UCC, NCCL, or MPI.
Strong knowledge of concurrency, synchronization, and memory consistency models in multi-threaded and distributed environments.
Experience with profiling and optimizing low-level performance (memory bandwidth, latency, throughput) on GPU architectures.
Familiarity with deep learning frameworks (TensorFlow, PyTorch, etc.) and their use of communication collectives.
Strong problem-solving skills and ability to work in a fast-paced, collaborative environment.
Network driver experience recommended
Excellent skills in problem solving, written and verbal communication
Strong organization skills, and highly self-motivated.
Ability to work well in a team and be productive under aggressive schedules.

Optional Requirements

Experience with NumPy, PyTorch, TensorFlow or JAX
Experience with Rust
Experience with CUDA, OpenCL, OpenGL, or SYCL
Coursework or experience with Machine Learning algorithms

Education and Experience

Bachelor’s, Master’s, or PhD in Computer Engineering, Software Engineering or Computer Science

•

Last updated on Sep 18, 2024

About the company

R

rivosinc

More jobs at rivosinc

Analyzing

Security Infrastructure Engineer - Full Time

Austin, Texas

·

30+ days ago

Silicon DFT - Full time$42k+

Austin, Texas

·

30+ days ago

SOC Physical Design - Full time$173k+

Austin, Texas

·

30+ days ago

SOC Physical Design Verification Engineer - Full Time$83k+

Austin, Texas

·

30+ days ago

SOC Static Timing Analysis Engineer - Full Time$83k+

Austin, Texas

·

30+ days ago

For job seekers

Job searchSearch millions of jobs

LaunchpadNew

Resume to business

Cover Letter StudioGenerate a cover letter

Add to ChatGPTFind and discuss jobs

For employers and recruiters

Resume ScreenerOrganize and rank candidates

Talent PipelinePre-order

Source top talent

Promote a jobReach more candidates

For everyone

Referral programEarn 30% commission

Get mobile appBrowse anywhere

Jobs API

Become a partner

Developed by Blake and Linh in the US and Vietnam.

We're interested in hearing what you like and don't like! Live chat with our founder or join our Discord

Changelog

🚀 LaunchpadNov 27

Create a site and sell services based on your CV.

🔥 Job search dashboardNov 13

Revamped job search UI with a sortable grid, live filtering, bookmarks, and application tracking.

🫡 Cover letter instructionsSep 27

New Studio settings give you control over AI output.

✨ Cover Letter StudioAug 9

Automatically generate cover letters for any job.

🎯 Suggested filtersAug 6

Copilot suggests additional filters above the results.

⚡️ Quick applicationsAug 2

Apply to jobs using info from your CV. Initial coverage of ~200k jobs in Spain, Germany, Austria, Switzerland, France, and the Netherlands.

🧠 Job AnalysisJul 12

Have Copilot read job descriptions and extract out key info you want to know. Click "Analyze All" to try it out. Click on the Copilot's gear icon to customize the prompt.