Accelerate NLP inference with ONNX Runtime on AWS Graviton processors
AWS Machine Learning
MAY 15, 2024
ONNX Runtime is the runtime engine used for model inference and training with ONNX. We also demonstrate the resulting speedup through benchmarking. Benchmark setup We used an AWS Graviton3-based c7g.4xl 1014-aws kernel) The ONNX Runtime repo provides inference benchmarking scripts for transformers-based language models.
Let's personalize your content