Browse
Employers / Recruiters

MLOps Engineer (LLM Serving and Infrastructure)

cloudwalk · 30+ days ago
Negotiable
Full-time
Continue
By pressing the button above, you agree to our Terms and Privacy Policy, and agree to receive email job alerts. You can unsubscribe anytime.
We are not just another fintech unicorn. We are a pack of dreamers, makers, and tech enthusiasts building the future of payments. With millions of happy customers and a hunger for innovation, we're now expanding our neural network - literally and metaphorically.

Your Mission:
At CloudWalk, we're at the cutting edge of AI, pioneering the use of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to drive innovation. As a MLOps Engineer, you will play a critical role in operationalizing the visionary work of our LLM Data Scientists. Your expertise will ensure the smooth deployment, efficient management, and scalable performance of LLMs across our extensive infrastructure. Your contributions will turn advanced AI research into scalable, high-performance solutions, with a particular focus on optimizing network communication and parallel processing capabilities.

What You’ll Do:

  • Deploy and Manage LLMs: Employ Kubernetes, Terraform, and cloud services to deploy and scale LLMs efficiently, ensuring their adaptability to high-demand scenarios.
  • Optimize Computing Infrastructure: Focus on enhancing GPU utilization, distributed training, bandwidth efficiency between machines, and VPC connections to maximize system performance.
  • Leverage Cutting-Edge Technologies: Utilize libraries such as Hugging Face's Accelerate and PyTorch's torchrun to facilitate parallel training across multiple machines in a cluster, optimizing our AI models' training and inference processes.
  • Collaborate on Innovation: Partner with our R&D team to transition LLM and RAG technologies from conceptual stages to scalable, production-ready systems.
  • Monitor and Improve System Performance: Implement advanced monitoring and logging practices to ensure system reliability and performance, continuously seeking improvements.
  • Stay Updated on Industry Advances: Actively pursue the latest developments in MLOps, cloud computing, and AI technologies to implement innovative solutions and maintain our infrastructure's leading edge.

Technologies You Will Work With:

  • Kubernetes, Terraform, and cloud computing platforms for scalable AI model deployment.
  • CI/CD pipelines, Git for version control, and Bash scripting for operational efficiency.
  • Hugging Face's Accelerate and PyTorch's torchrun for parallel training and optimization across multiple machines.
  • A comprehensive understanding of network infrastructure to optimize bandwidth and secure VPC connections is essential.

What We Expect From You:

  • Technical Mastery: Solid experience with DevOps, cloud infrastructure, and deploying machine learning models. Expertise in network optimization and parallel computing is crucial.
  • Problem-Solving Mindset: The ability to navigate complex challenges, strategically manage resources, and improve system efficiency.
  • Collaborative Approach: Strong communication skills and the ability to contribute effectively within a dynamic, interdisciplinary team.
  • Lifelong Learner: A commitment to continuous learning, staying abreast of the latest technological advancements, and applying innovative solutions.
Join us at CloudWalk, where we’re not just engineering solutions; we’re building a smarter, AI-driven future for payments—together.

Last updated on Dec 12, 2024

See more

About the company

More jobs at cloudwalk

Analyzing

São Paulo, State of São Paulo

 · 

30+ days ago

São Paulo, State of São Paulo

 · 

30+ days ago

São Paulo, State of São Paulo

 · 

30+ days ago

São Paulo, State of São Paulo

 · 

30+ days ago

São Paulo, State of São Paulo

 · 

30+ days ago

Developed by Blake and Linh in the US and Vietnam.
We're interested in hearing what you like and don't like! Live chat with our founder or join our Discord
Changelog
🚀 LaunchpadNov 27
Create a site and sell services based on your resume.
🔥 Job search dashboardNov 13
Revamped job search UI with a sortable grid, live filtering, bookmarks, and application tracking.
🫡 Cover letter instructionsSep 27
New Studio settings give you control over AI output.
✨ Cover Letter StudioAug 9
Automatically generate cover letters for any job.
🎯 Suggested filtersAug 6
Copilot suggests additional filters above the results.
⚡️ Quick applicationsAug 2
Apply to jobs using info from your resume. Initial coverage of ~200k jobs in Spain, Germany, Austria, Switzerland, France, and the Netherlands.
🧠 Job AnalysisJul 12
Have Copilot read job descriptions and extract out key info you want to know. Click "Analyze All" to try it out. Click on the Copilot's gear icon to customize the prompt.
© 2024 RemoteAmbitionAffiliate · Privacy · Terms · Sitemap · Status