Back to jobs
R

Senior Staff Machine Learning Engineer, GenAI Platform

🇺🇸Reddit

Remote - United StatesRemote0 applicants
Full TimeLead

Job Description

Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 121 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com . Who We Are: The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams. What You’ll Do: As a Senior Staff Software Engineer, you will help define and lead the vision for Reddit’s large-scale GenAI Platform, shaping the strategy, architecture, and operating model that enable teams across the company to build, deploy, and scale generative AI products with confidence. Contribute to the design, implementation, and maintenance of the LLM Gateway, focusing on features like unified API endpoints for internal/externally hosted LLM, rate/token limit management, and intelligent failover mechanisms to boost uptime and reliability. Lead and execute the vision, strategy, and roadmap for Reddit’s large-scale GenAI Platform. Define the platform architecture and operating model that enable teams to build, deploy, and scale GenAI products reliably. Drive the strategy for a unified LAG Gateway supporting internally and externally hosted LLMs through consistent APIs and abstractions. Set the direction for core platform capabilities such as rate and token limit management, intelligent failover, and production resilience. Shape Reddit’s approach to an enterprise-grade RAG system Establish the strategic direction for agentic AI workflows and tool-use patterns across the platform. Own the end-to-end platform strategy from concept through production adoption and long-term evolution. Drive MLOps and LLMOps standards across CI/CD, testing, versioning, evaluation, and lifecycle management. Define best practices for obs

Read original posting

Required Skills

RustRCI/CDRESTMachine LearningLLM
R

Reddit