We are working on software to improve the Deep Learning ecosystem and help hardware engineers build great Deep Learning parallel systems.
We are looking for a strong candidate with a background in writing systems software for networking devices (and optionally Linux kernel networking stack or network drivers). Someone who's implemented network protocols or has worked on OpenMPI.This role involves designing and implementing highly optimized communication collectives libraries similar to UCC (Unified Collective Communication) and NCCL (NVIDIA Collective Communications Library). The ideal candidate will work closely with hardware and software teams to ensure efficient data communication and synchronization across multiple AI accelerators in a distributed system, enabling scalable deep learning and high-performance computing applications. You will be learning technical and organizational skills from industry veterans: how to write performant and readable code; how to structure and communicate projects, ideas, and progress; how to work effectively with the Open Source community.
We are big proponents of Open Source and Free software and contribute back our improvements to all the great projects we use.
We prefer candidates who work out of one of our offices, but will consider remote candidates as well.