From innovation to impact: How AWS and NVIDIA enable real-world generative AI success
AWS Machine Learning
MARCH 19, 2025
They use a highly optimized inference stack built with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server to serve both their search application and pplx-api, their public API service that gives developers access to their proprietary models. The results speak for themselvestheir inference stack achieves up to 3.1
Let's personalize your content