Machine Learning Infrastructure Engineer

Gray Swan

📍Remote (Global) 🕔 Full Time
💰 $133,630 - $231,100 USD 🔄 Rolling Applications

We’re seeking an ML Infra Engineer to build robust, scalable, and high-performance infrastructure for distributed inference and training. You’ll take specialized language models from our ML research team and transform them into fast and reliable services that scale from proof-of-concept to enterprise deployment.

This is a foundational role where you’ll architect the ML inference infrastructure that powers our AI security platform, both in the cloud and on customer premises. Your work will span the entire production lifecycle, from deployment pipelines and performance optimization to cost modeling and 24/7 reliability.

What You’ll Own

  • Build and scale GPU inference with vLLM (and similar) for high‑throughput, low latency LLM serving

  • Optimize for performance and cost, implementing batching and caching strategies, quantization, and hardware-specific optimizations to maximize tokens per dollar

  • Create robust deployment pipelines with automated testing, progressive rollouts, and instant rollbacks

  • Establish observability with comprehensive metrics, distributed tracing, and intelligent alerting that catches issues before customers notice

  • Design for multi-environment deployment supporting both our cloud platform and secure on-premises installations with reproducible, hardened builds

  • Drive operational excellence through clear SLOs, thorough runbooks, and a culture of continuous improvement

  • Shape our ML infrastructure vision as we scale, mentoring teammates and establishing patterns that will serve us for years

What We’re Looking For

  • Essential Experience:

    • Several years building and operating production backend systems, with hands-on experience optimizing distributed inference and training

    • Strong proficiency in Python plus at least one systems language (Go, Rust, C++)

    • Deep expertise with containerization, orchestration, and cloud-native architectures

    • Practical understanding of GPU performance characteristics, memory management, and inference optimization

    • Track record of building observable, secure systems with strong operational practices

    • Ability to work from first principles, whether modeling costs, designing for scale, or debugging performance

  • You’ll Stand Out If You Have:

    • Direct experience with LLM serving frameworks (e.g., vLLM, SGLang) and Transformer model optimization

    • Past experience implementing a full stack LLM model (from high level model description to low-level optimizations)

    • Experience with low-level GPU optimization for ML workloads, using both CUDA and higher-level libraries like Triton

    • Contributions to open-source ML infrastructure projects or have published ML system research papers

    • Experience with rate limiting/quotas, per‑tenant isolation, metering, attribution, and cost allocation

    • A knack for clear technical communication through writing, talks, or mentorship

What We Offer

  • Meaningful Equity in a fast-growing startup

  • Comprehensive Benefits:

    • Health, dental, and vision coverage

    • 401(k) with 4% company match

    • 28 days combined PTO

    • Learning & development budget

    • Top-tier equipment and home office support

  • Growth Opportunities:

    • Define our ML infrastructure from the ground up

    • Direct impact on product architecture and customer success

    • Clear path to technical leadership as we scale

  • Great Culture:

    • Mission-driven team at the forefront of AI security

    • Collaborative environment that values curiosity and humility

    • Remote-friendly with optional Pittsburgh office

Who Thrives Here

We’re a startup building critical infrastructure for the AI era. We value engineers who:

  • Ship first, perfect later--without compromising on reliability or security

  • Think in systems and trade-offs

  • Learn voraciously and share knowledge generously

  • Take ownership from idea to production to continuous improvement

Ready to Apply?

Send us your resume or portfolio highlighting production systems you’ve built, especially any GPU or ML infrastructure work. We review every application personally and move quickly for the right candidates.

Gray Swan AI is an equal opportunity employer committed to building a diverse and inclusive team. We provide visa sponsorship for exceptional candidates.

Gray Swan

Gray Swan is an AI safety and security company. We develop tools that automatically assess the risks of AI models and we develop secure AI models that provide best-in-class safety and security.

Previous
Previous

Software Engineer

Next
Next

Senior Policy Advisor