RTB Model Serving Platform

Factored built an ML serving platform for RTB that scaled throughput 20X, cut latency to 10ms, and improved reliability and cost.

Key Takeaways:

Handling the Volume of Demand.

The ML ad serving platform had limited throughput (50,000 requests per second) and struggled with high latency. Additionally, there was a need for multi-framework support, robust monitoring, and data quality assurance in both the serving and feature store pipeline.

Using ML for Operational Efficiency & Insights.

  • We enabled ML inference for multiple model development libraries like PyTorch and TensorFlow using NVIDIA Triton for flexible deployment.
  • To enhance reliability, we integrated Prometheus and Grafana for real-time monitoring, improving error handling, automated recovery, and rate limiting to prevent overload.
  • Performance was optimized through Grafana load testing, Golang MLServing enhancements, and Aerospike Cache integration.

Scaling Throughput 20X - 1M Per Second.

  • Latency as low as 10ms.
  • Reduced cost and better performance.
  • More reliable and transparent ML operations.
Skills
No items found.
Roles
No items found.

Continue Reading

Server-side analytics ingestion

Server-Side Analytics

Metric accuracy restored

Salesforce data pipelines

Salesforce Data Migration

Unified platform enabled

LLM framework integrations

LLM Hardware Integrations

50K+ monthly downloads

Want to discuss a solution for you?
Talk to an Expert
Elite engineers ready to accelerate your roadmap
Start vetting within one week
Have talent placed in under a month.