Back to jobs
R

Senior Machine Learning Systems Engineer

🇺🇸Reddit

Remote - United StatesRemote0 applicants
Full TimeSenior

Job Description

Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 121 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com . Who We Are: The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams. What You’ll Do: As a Senior ML Infrastructure Engineer, you will lead development of a platform for large scale ML models at Reddit. Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges Who You Might Be: 5+ years of experience in ML infrastructure, including model training and model deployments Hands-on experience with ML optimization, including memory and GPU profiling Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wand

Read original posting

Required Skills

GoRustScalaRGCPTerraformRESTMachine Learning
R

Reddit