APIs, Benchmark and Metrics - Customer Contact Central

Benchmarking Amazon Nova and GPT-4o models with FloTorch

AWS Machine Learning

MARCH 11, 2025

Using its enterprise software, FloTorch conducted an extensive comparison between Amazon Nova models and OpenAIs GPT-4o models with the Comprehensive Retrieval Augmented Generation (CRAG) benchmark dataset. How do Amazon Nova Micro and Amazon Nova Lite perform against GPT-4o mini in these same metrics?

Benchmark

Benchmark APIs Enterprise Scripts

LLM-as-a-judge on Amazon Bedrock Model Evaluation

AWS Machine Learning

FEBRUARY 12, 2025

This approach allows organizations to assess their AI models effectiveness using pre-defined metrics, making sure that the technology aligns with their specific needs and objectives. The introduction of an LLM-as-a-judge framework represents a significant step forward in simplifying and streamlining the model evaluation process.

Metrics

Metrics Engineering Benchmark APIs

Pixtral-12B-2409 is now available on Amazon Bedrock Marketplace

AWS Machine Learning

MARCH 3, 2025

Overview of Pixtral 12B Pixtral 12B, Mistrals inaugural VLM, delivers robust performance across a range of benchmarks, surpassing other open models and rivaling larger counterparts, according to Mistrals evaluation. Performance metrics and benchmarks Pixtral 12B is trained to understand both natural images and documents, achieving 52.5%

Benchmark

Benchmark APIs Enterprise Construction

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference

AWS Machine Learning

JANUARY 28, 2025

To effectively optimize AI applications for responsiveness, we need to understand the key metrics that define latency and how they impact user experience. These metrics differ between streaming and nonstreaming modes and understanding them is crucial for building responsive AI applications.

Benchmark

Benchmark APIs Engineering Metrics

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

AWS Machine Learning

MARCH 6, 2025

Current RAG pipelines frequently employ similarity-based metrics such as ROUGE , BLEU , and BERTScore to assess the quality of the generated responses, which is essential for refining and enhancing the models capabilities. More sophisticated metrics are needed to evaluate factual alignment and accuracy.

Metrics

Metrics Enterprise APIs Engineering

GraphStorm 0.3: Scalable, multi-task learning on graphs with user-friendly APIs

AWS Machine Learning

AUGUST 2, 2024

adds new APIs to customize GraphStorm pipelines: you now only need 12 lines of code to implement a custom node classification training loop. Based on customer feedback for the experimental APIs we released in GraphStorm 0.2, introduces refactored graph ML pipeline APIs. Specifically, GraphStorm 0.3 In addition, GraphStorm 0.3

APIs

APIs Benchmark Construction Enterprise

Anthropic Claude 3.5 Sonnet ranks number 1 for business and finance in S&P AI Benchmarks by Kensho

AWS Machine Learning

JULY 9, 2024

Sonnet currently ranks at the top of S&P AI Benchmarks by Kensho , which assesses large language models (LLMs) for finance and business. For example, there could be leakage of benchmark datasets’ questions and answers into training data. Anthropic Claude 3.5 Kensho is the AI Innovation Hub for S&P Global. Anthropic Claude 3.5

Finance

Finance Benchmark industry standards Accountability

Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs

AWS Machine Learning

SEPTEMBER 28, 2023

In this post, we describe the enhancements to the forecasting capabilities of SageMaker Canvas and guide you on using its user interface (UI) and AutoML APIs for time-series forecasting. While the SageMaker Canvas UI offers a code-free visual interface, the APIs empower developers to interact with these features programmatically.

APIs

APIs Construction Finance Enterprise

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

AWS Machine Learning

DECEMBER 2, 2024

This integration provides a powerful multilingual model that excels in reasoning benchmarks. The integration offers enterprise-grade features including model evaluation metrics, fine-tuning and customization capabilities, and collaboration tools, all while giving customers full control of their deployment.

Enterprise

Enterprise Benchmark Technology APIs

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

AWS Machine Learning

FEBRUARY 24, 2025

Performance metrics and benchmarks According to Mistral, the instruction-tuned version of the model achieves over 81% accuracy on Massive Multitask Language Understanding (MMLU) with 150 tokens per second latency, making it currently the most efficient model in its category.

APIs

APIs Enterprise Benchmark Feedback

Generate training data and cost-effectively train categorical models with Amazon Bedrock

AWS Machine Learning

MARCH 27, 2025

Where discrete outcomes with labeled data exist, standard ML methods such as precision, recall, or other classic ML metrics can be used. These metrics provide high precision but are limited to specific use cases due to limited ground truth data. If the use case doesnt yield discrete outputs, task-specific metrics are more appropriate.

Education

Education Engineering APIs Enterprise

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning

NOVEMBER 19, 2024

Gain insights into training strategies, productivity metrics, and real-world use cases to empower your developers to harness the full potential of this game-changing technology. Discover how to create and manage evaluation jobs, use automatic and human reviews, and analyze critical metrics like accuracy, robustness, and toxicity.

APIs

APIs Enterprise Best practices Government

Get started with Amazon Titan Text Embeddings V2: A new state-of-the-art embeddings model on Amazon Bedrock

AWS Machine Learning

MAY 2, 2024

As new embedding models are released with incremental quality improvements, organizations must weigh the potential benefits against the associated costs of upgrading, considering factors like computational resources, data reprocessing, integration efforts, and projected performance gains impacting business metrics.

Benchmark

Benchmark Metrics Enterprise APIs

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

AWS Machine Learning

JULY 2, 2024

You can see that for the 45 models we benchmarked, there is a 1.35x latency improvement (geomean for the 45 models). You can see that for the 33 models we benchmarked, there is around 2x performance improvement (geomean for the 33 models). We benchmarked 45 models using the scripts from the TorchBench repo.

Benchmark

Benchmark Scripts Metrics APIs

Reduce conversational AI response time through inference at the edge with AWS Local Zones

AWS Machine Learning

MARCH 3, 2025

They enable applications requiring very low latency or local data processing using familiar APIs and tool sets. Through comparative benchmarking tests, we illustrate how deploying FMs in Local Zones closer to end users can significantly reduce latencya critical factor for real-time applications such as conversational AI assistants.

APIs

APIs Benchmark Metrics Healthcare

Optimize your machine learning deployments with auto scaling on Amazon SageMaker

AWS Machine Learning

FEBRUARY 8, 2023

Although you can integrate the model directly into an application, the approach that works well for production-grade applications is to deploy the model behind an endpoint and then invoke the endpoint via a RESTful API call to obtain the inference. However, you can use any other benchmarking tool.

Benchmark

Benchmark Metrics APIs Engineering

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

AWS Machine Learning

OCTOBER 18, 2023

The solution uses the following services: Amazon API Gateway is a fully managed service that makes it easy for developers to publish, maintain, monitor, and secure APIs at any scale. Purina’s solution is deployed as an API Gateway HTTP endpoint, which routes the requests to obtain pet attributes.

APIs

APIs Metrics Consulting Consulting

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

AWS Machine Learning

JULY 29, 2024

You can also either use the SageMaker Canvas UI, which provides a visual interface for building and deploying models without needing to write any code or have any ML expertise, or use its automated machine learning (AutoML) APIs for programmatic interactions.

APIs

APIs Scripts Benchmark Metrics

Q&A recap: crash course in Customer Success and SaaS metrics with Dave Kellogg

ChurnZero

MARCH 18, 2022

With so many SaaS metrics floating around, and even more opinions on when and how to use them, it can be hard to know if you’re measuring what really matters. Leading SaaS expert, Dave Kellogg, and ChurnZero CEO, You Mon Tsang, sat down to answer all the questions you want to know about SaaS metrics like ARR, NRR, GRR, LTV, and CAC (i.e.,

SaaS

SaaS Metrics Surveys Enterprise

Image classification model selection using Amazon SageMaker JumpStart

AWS Machine Learning

FEBRUARY 6, 2023

The former question addresses model selection across model architectures, while the latter question concerns benchmarking trained models against a test dataset. This post provides details on how to implement large-scale Amazon SageMaker benchmarking and model selection tasks. swin-large-patch4-window7-224 195.4M efficientnet-b5 29.0M

APIs

APIs Scripts Metrics Benchmark

17 Alternatives To Qualtrics: Key Features & Use Cases Compared

Interaction Metrics

APRIL 17, 2025

At Interaction Metrics, we help organizations of all sizes improve how they collect and use feedback. Ill get to the top 17 Qualtrics alternatives in just a minute, but first, a shameless plug for Interaction Metrics. It supports SMS/MMS and offline collection, with exportable reports (XLS, CSV, PDF) and HRIS integration via API.

Surveys

Surveys Enterprise B2B Analytics

Secure AccountantAI Chatbot: Lili’s journey with Amazon Bedrock

AWS Machine Learning

JULY 18, 2024

The ingestion workflow transforms these curated questions into vector embeddings using Amazon Titan Text Embeddings model API. The system first converts the query into a vector embedding using the Amazon Titan Text Embeddings model API, which is accessed securely through PrivateLink.

Chatbots

Chatbots APIs Accountability Finance

Improved ML model deployment using Amazon SageMaker Inference Recommender

AWS Machine Learning

APRIL 20, 2023

Each machine learning (ML) system has a unique service level agreement (SLA) requirement with respect to latency, throughput, and cost metrics. An advanced job is a custom load test job that allows you to perform extensive benchmarks based on your ML application SLA requirements, such as latency, concurrency, and traffic pattern.

APIs

APIs Metrics Benchmark Engineering

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

AWS Machine Learning

NOVEMBER 22, 2023

You can save time, money, and labor by implementing classifications in your workflow, and documents go to downstream applications and APIs based on document type. This helps you avoid throttling limits on API calls due to polling the Get* APIs. Model monitoring The performance of ML models is monitored for degradation over time.

APIs

APIs Metrics Benchmark Enterprise

Best practices for load testing Amazon SageMaker real-time inference endpoints

AWS Machine Learning

JANUARY 10, 2023

From there, we dive into how you can track and understand the metrics and performance of the SageMaker endpoint utilizing Amazon CloudWatch metrics. We first benchmark the performance of our model on a single instance to identify the TPS it can handle per our acceptable latency requirements. Metrics to track.

Best practices

Best practices Scripts APIs Metrics

Testing times: testingRTC is the smart, synchronized, real-world scenario WebRTC testing solution for the times we live in.

Spearline

JULY 21, 2022

Consequently, no other testing solution can provide the range and depth of testing metrics and analytics. And testingRTC offers multiple ways to export these metrics, from direct collection from webhooks, to downloading results in CSV format using the REST API. Happy days! You can check framerate information for video here too.

Scripts

Scripts APIs Metrics Analytics

Amazon Comprehend announces lower annotation limits for custom entity recognition

AWS Machine Learning

AUGUST 3, 2022

For example, you can immediately start detecting entities such as people, places, commercial items, dates, and quantities via the Amazon Comprehend console , AWS Command Line Interface , or Amazon Comprehend APIs. In this post, we walk you through the benchmarking process and the results we obtained while working on subsampled datasets.

Benchmark

Benchmark APIs Metrics Scripts

Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock

AWS Machine Learning

JUNE 20, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Metrics

Metrics Engineering Accountability Benchmark

How Forethought saves over 66% in costs for generative AI models using Amazon SageMaker

AWS Machine Learning

JUNE 13, 2023

In addition, all SageMaker real-time endpoints benefit from built-in capabilities to manage and monitor models, such as including shadow variants , auto scaling , and native integration with Amazon CloudWatch (for more information, refer to CloudWatch Metrics for Multi-Model Endpoint Deployments ).

APIs

APIs Benchmark Engineering Management

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

AWS Machine Learning

MARCH 19, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models from leading AI companies and Amazon via a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. A limitation of the approach is its larger computational cost.

APIs

APIs Benchmark SaaS Engineering

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning

SEPTEMBER 18, 2024

Together, these AI-driven tools and technologies aren’t just reshaping how brands perform marketing tasks; they’re setting new benchmarks for what’s possible in customer engagement. From our experience, artifact server has some limitations, such as limits on artifact size (because of sending it using REST API).

APIs

APIs Engineering Analytics Marketing

Cepsa Química improves the efficiency and accuracy of product stewardship using Amazon Bedrock

AWS Machine Learning

AUGUST 2, 2024

Queries are sent to the backend using a REST API defined in Amazon API Gateway , a fully managed service that makes it straightforward for developers to create, publish, maintain, monitor, and secure APIs at any scale, and implemented through an API Gateway private integration.

APIs

APIs Analytics Chatbots Engineering

How Games24x7 transformed their retraining MLOps pipelines with Amazon SageMaker

AWS Machine Learning

APRIL 12, 2023

All the training and evaluation metrics were inspected manually from Amazon Simple Storage Service (Amazon S3). For every epoch in our training, we were already sending our training metrics through stdOut in the script. This allows us to compare training metrics like accuracy and precision across multiple runs as shown below.

Scripts

Scripts APIs Metrics Best practices

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

AWS Machine Learning

OCTOBER 2, 2024

In addition, they use the developer-provided instruction to create an orchestration plan and then carry out the plan by invoking company APIs and accessing knowledge bases using Retrieval Augmented Generation (RAG) to provide an answer to the user’s request. In Part 1, we focus on creating accurate and reliable agents.

Best practices

Best practices APIs Metrics Accountability

Evaluation of generative AI techniques for clinical report summarization

AWS Machine Learning

MAY 13, 2024

This is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API. These metrics will assess how well a machine-generated summary compares to one or more reference summaries.

Healthcare

Healthcare Engineering APIs Benchmark

Optimize generative AI workloads for environmental sustainability

AWS Machine Learning

SEPTEMBER 21, 2023

Use managed services – Depending on your expertise and specific use case, weigh the options between opting for Amazon Bedrock , a serverless, fully managed service that provides access to a diverse range of foundation models through an API, or deploying your models on a fully managed infrastructure by using Amazon SageMaker.

Best practices

Best practices Engineering Metrics Benchmark

A review of purpose-built accelerators for financial services

AWS Machine Learning

SEPTEMBER 11, 2024

In terms of resulting speedups, the approximate order is programming hardware, then programming against PBA APIs, then programming in an unmanaged language such as C++, then a managed language such as Python. The CUDA API and SDK were first released by NVIDIA in 2007. GPU PBAs, 4% other PBAs, 4% FPGA, and 0.5%

Benchmark

Benchmark Banking Analytics Big data

Reduce inference time for BERT models using neural architecture search and SageMaker Automated Model Tuning

AWS Machine Learning

JANUARY 19, 2024

The goal of NAS is to find the optimal architecture for a given problem by searching over a large set of candidate architectures using techniques such as gradient-free optimization or by optimizing the desired metrics. The performance of the architecture is typically measured using metrics such as validation loss. training.py ).

Metrics

Metrics Scripts Benchmark Enterprise

How to Successfully Implement Customer Journey Analytics – Part 1

Pointillist

JULY 25, 2018

Success Metrics for the Team. Ultimately, the biggest success metric for the Champion is to be able to show the Executive Sponsor and key Stakeholders that real business value has been gained through the use of customer journey analytics. Success Metrics for the Project. Success Metrics for the Business. Churn Rate.

Analytics

Analytics Government Metrics APIs

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning

NOVEMBER 1, 2024

We also provide insights on how to achieve optimal results for different dataset sizes and use cases, backed by experimental data and performance metrics. Tools and APIs – For example, when you need to teach Anthropic’s Claude 3 Haiku how to use your APIs well. We focus on the task of answering questions about the table.

Best practices

Best practices APIs Finance Metrics

Hyperparameter optimization for fine-tuning pre-trained transformer models from Hugging Face

AWS Machine Learning

JUNE 29, 2022

Syne Tune allows us to find a better hyperparameter configuration that achieves a relative improvement between 1-4% compared to default hyperparameters on popular GLUE benchmark datasets. Furthermore, we add another callback function to Hugging Face’s Trainer API that reports the validation performance after each epoch back to Syne Tune.

Benchmark

Benchmark Metrics APIs Scripts

The executive’s guide to generative AI for sustainability

AWS Machine Learning

APRIL 22, 2024

Examples of tools you can use to advance sustainability initiatives are: Amazon Bedrock – a fully managed service that provides access to high-performing FMs from leading AI companies through a single API, enabling you to choose the right model for your sustainability use cases.

Best practices

Best practices Benchmark Transportation Engineering

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

AWS Machine Learning

AUGUST 26, 2024

This involves benchmarking new models against our current selections across various metrics, running A/B tests, and gradually incorporating high-performing models into our production pipeline. API design Account summary generation requests are handled asynchronously to eliminate client wait times for responses.

Sales

Sales Accountability Feedback Metrics

Build well-architected IDP solutions with a custom lens – Part 5: Cost optimization

AWS Machine Learning

NOVEMBER 22, 2023

Define goals and metrics – The function needs to deliver value to the organization in different ways. Establish regular cadence – The group should come together regularly to review their goals and metrics. This allows the workload to be implemented to achieve the desired goals of the organization.

Finance

Finance Best practices APIs Accountability

Benchmarking Amazon Nova and GPT-4o models with FloTorch

LLM-as-a-judge on Amazon Bedrock Model Evaluation

Trending Sources

Pixtral-12B-2409 is now available on Amazon Bedrock Marketplace

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

GraphStorm 0.3: Scalable, multi-task learning on graphs with user-friendly APIs

Anthropic Claude 3.5 Sonnet ranks number 1 for business and finance in S&P AI Benchmarks by Kensho

Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

Generate training data and cost-effectively train categorical models with Amazon Bedrock

Your guide to generative AI and ML at AWS re:Invent 2024

Get started with Amazon Titan Text Embeddings V2: A new state-of-the-art embeddings model on Amazon Bedrock

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Reduce conversational AI response time through inference at the edge with AWS Local Zones

Optimize your machine learning deployments with auto scaling on Amazon SageMaker

Optimize pet profiles for Purina’s Petfinder application using Amazon Rekognition Custom Labels and AWS Step Functions

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

Q&A recap: crash course in Customer Success and SaaS metrics with Dave Kellogg

Image classification model selection using Amazon SageMaker JumpStart

17 Alternatives To Qualtrics: Key Features & Use Cases Compared

Secure AccountantAI Chatbot: Lili’s journey with Amazon Bedrock

Improved ML model deployment using Amazon SageMaker Inference Recommender

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

Best practices for load testing Amazon SageMaker real-time inference endpoints

Testing times: testingRTC is the smart, synchronized, real-world scenario WebRTC testing solution for the times we live in.

Amazon Comprehend announces lower annotation limits for custom entity recognition

Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock

How Forethought saves over 66% in costs for generative AI models using Amazon SageMaker

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

Cepsa Química improves the efficiency and accuracy of product stewardship using Amazon Bedrock

How Games24x7 transformed their retraining MLOps pipelines with Amazon SageMaker

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

Evaluation of generative AI techniques for clinical report summarization

Optimize generative AI workloads for environmental sustainability

A review of purpose-built accelerators for financial services

Reduce inference time for BERT models using neural architecture search and SageMaker Automated Model Tuning

How to Successfully Implement Customer Journey Analytics – Part 1

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

Hyperparameter optimization for fine-tuning pre-trained transformer models from Hugging Face

The executive’s guide to generative AI for sustainability

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

Build well-architected IDP solutions with a custom lens – Part 5: Cost optimization

Stay Connected