2024, Benchmark and Metrics - Customer Contact Central

Benchmarking Amazon Nova and GPT-4o models with FloTorch

AWS Machine Learning

MARCH 11, 2025

Using its enterprise software, FloTorch conducted an extensive comparison between Amazon Nova models and OpenAIs GPT-4o models with the Comprehensive Retrieval Augmented Generation (CRAG) benchmark dataset. OpenAI launched GPT-4o in May 2024, and Amazon introduced Amazon Nova models at AWS re:Invent in December 2024.

Benchmark

Benchmark APIs Enterprise Scripts

Top 5 Customer Service & CX Articles for Week of January 26, 2025

ShepHyken

JANUARY 26, 2025

According to Forresters Consumer Benchmark Survey, 2024, 54% of US online adults agree that loyalty programs influence what they buy, and 64% agree that programs influence where they make purchases. Are Your CX Metrics Hurting Your Customer Experience? There are ongoing discussions about which CX metric is the best.

Customer Service

Customer Service Metrics Benchmark Consulting

Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

AWS Machine Learning

DECEMBER 3, 2024

According to New Relic’s 2024 Observability Forecast , businesses face a median annual downtime of 77 hours from high-impact outages. It examines service performance metrics, forecasts of key indicators like error rates, error patterns and anomalies, security alerts, and overall system status and health. million per hour.

Customer Experience

Customer Experience Engineering Enterprise Benchmark

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning

NOVEMBER 19, 2024

Gain insights into training strategies, productivity metrics, and real-world use cases to empower your developers to harness the full potential of this game-changing technology. Discover how to create and manage evaluation jobs, use automatic and human reviews, and analyze critical metrics like accuracy, robustness, and toxicity.

APIs

APIs Enterprise Best practices Government

Anthropic Claude 3.5 Sonnet ranks number 1 for business and finance in S&P AI Benchmarks by Kensho

AWS Machine Learning

JULY 9, 2024

Sonnet currently ranks at the top of S&P AI Benchmarks by Kensho , which assesses large language models (LLMs) for finance and business. For example, there could be leakage of benchmark datasets’ questions and answers into training data. Anthropic Claude 3.5 Kensho is the AI Innovation Hub for S&P Global. Anthropic Claude 3.5

Finance

Finance Benchmark industry standards Accountability

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference

AWS Machine Learning

JANUARY 28, 2025

During re:Invent 2024, we launched latency-optimized inference for foundation models (FMs) in Amazon Bedrock. To effectively optimize AI applications for responsiveness, we need to understand the key metrics that define latency and how they impact user experience. These metrics are shown in the following diagram.

Benchmark

Benchmark APIs Engineering Metrics

Reimagining software development with the Amazon Q Developer Agent

AWS Machine Learning

JUNE 11, 2024

This post describes how to get started with the software development agent, gives an overview of how the agent works, and discusses its performance on public benchmarks. This is an overview of the system as of May 2024. A single metric never tells the whole story. The success metric for SWE-bench is binary.

Benchmark

Benchmark Metrics Feedback Personalization

Top Skills Every Java Developer Should Master in 2025

CSM Magazine

MARCH 26, 2025

Continuous education involves more than glancing at release announcements it includes testing beta features, benchmarking real world results, and actively sharing insights. A 2024 survey identified Spring Boot as the most adopted Java framework at 62%, with Quarkus at 15% and Micronaut at 10%.

Finance

Finance Benchmark Enterprise Surveys

Expansion is a Team Sport: Strategies for Aligning CS and Sales in 2024

Totango

MAY 7, 2024

That way, both teams can use those outcomes as a benchmark of success throughout the customer journey. Strategy #3: Share revenue responsibility and success metrics There’s been a major shift in the B2B space over the past few years. This doesn’t just provide a useful shared goal.

Sales

Sales Upselling Metrics Accountability

Get started with Amazon Titan Text Embeddings V2: A new state-of-the-art embeddings model on Amazon Bedrock

AWS Machine Learning

MAY 2, 2024

As new embedding models are released with incremental quality improvements, organizations must weigh the potential benefits against the associated costs of upgrading, considering factors like computational resources, data reprocessing, integration efforts, and projected performance gains impacting business metrics.

Benchmark

Benchmark Metrics Enterprise APIs

Four key insights for SaaS and CS leaders from the 2024 B2B SaaS Benchmarking Survey

ChurnZero

MAY 2, 2024

The 2024 B2B SaaS Benchmarking Survey by SaaS Capital is the most comprehensive and up-to-date source of its kind for SaaS and customer success leaders who want to know where they stand compared to peers and competitors. To find out more about the survey, and see more research and benchmarking data, visit SaaS Capital here.

SaaS

SaaS Benchmark B2B Surveys

GraphStorm 0.3: Scalable, multi-task learning on graphs with user-friendly APIs

AWS Machine Learning

AUGUST 2, 2024

We also released a comprehensive study of co-training language models (LM) and graph neural networks (GNN) for large graphs with rich text features using the Microsoft Academic Graph (MAG) dataset from our KDD 2024 paper. To address this, with GraphStorm 0.3, Dataset Num. of nodes Num. of edges Num. of node/edge types Num.

APIs

APIs Benchmark Construction Enterprise

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

AWS Machine Learning

FEBRUARY 24, 2025

The 2501 version follows previous iterations (Mistral-Small-2409 and Mistral-Small-2402) released in 2024, incorporating improvements in instruction-following and reliability.

APIs

APIs Enterprise Benchmark Feedback

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI

AWS Machine Learning

FEBRUARY 12, 2025

This repository is a modified version of the original How to Fine-Tune LLMs in 2024 on Amazon SageMaker. Within the repository, you can use the medusa_1_train.ipynb notebook to run all the steps in this post. We added simplified Medusa training code, adapted from the original Medusa repository.

Scripts

Scripts Metrics Engineering Accountability

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

AWS Machine Learning

OCTOBER 2, 2024

Laying the groundwork: Collecting ground truth data The foundation of any successful agent is high-quality ground truth data—the accurate, real-world observations used as reference for benchmarks and evaluating the performance of a model, algorithm, or system. user id 111 Today: 09/03/2024 Certainly! Your appointment ID is XXXX.

Best practices

Best practices APIs Metrics Accountability

The executive’s guide to generative AI for sustainability

AWS Machine Learning

APRIL 22, 2024

Figure 1: Examples of generative AI for sustainability use cases across the value chain According to KPMG’s 2024 ESG Organization Survey , investment in ESG capabilities is another top priority for executives as organizations face increasing regulatory pressure to disclose information about ESG impacts, risks, and opportunities.

Best practices

Best practices Benchmark Transportation Engineering

Data Governance in the Age of AI: A Competitive Edge for Business Leaders

COPC

OCTOBER 8, 2024

In 2024 alone, 11x more AI models were put into production than last year, showing a clear shift from experimentation to real-world application. Real-time insights from sources like sales metrics, customer interactions, and digital analytics help businesses stay competitive by spotting trends early and seizing opportunities.

Government

Government Healthcare Benchmark Technology

Mixtral 8x22B is now available in Amazon SageMaker JumpStart

AWS Machine Learning

MAY 17, 2024

What is Mixtral 8x22B Mixtral 8x22B is Mistral AI’s latest open-weights model and sets a new standard for performance and efficiency of available foundation models , as measured by Mistral AI across standard industry benchmarks. making the model available for exploring, testing, and deploying. Therefore, she sold the car for $18,248.33.

APIs

APIs Benchmark Personalization Enterprise

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

AWS Machine Learning

AUGUST 26, 2024

From the period of September 2023 to March 2024, sellers leveraging GenAI Account Summaries saw a 4.9% This involves benchmarking new models against our current selections across various metrics, running A/B tests, and gradually incorporating high-performing models into our production pipeline.

Sales

Sales Accountability Feedback Metrics

What is First Call Resolution? How to Improve (+Examples)

Calabrio

MARCH 5, 2025

First call resolution is far more than just a metric; it’s a direct reflection of your customer service effectiveness and significantly impacts your business’s bottom line. As you measure, and attempt to optimize, your contact centers first call resolution rate, its crucial to keep benchmarks and industry standards in mind.

First call resolution

First call resolution Contact Center Analytics Morale

A review of purpose-built accelerators for financial services

AWS Machine Learning

SEPTEMBER 11, 2024

In March 2024, AWS announced it will offer the new NVIDIA Blackwell platform, featuring the new GB200 Grace Blackwell chip. Accelerator benchmarking When considering compute services, users benchmark measures such as price-performance, absolute performance, availability, latency, and throughput.

Benchmark

Benchmark Banking Analytics Big data

How to Solve the 2025 Customer Retention Problem

The Petrova Experience

JANUARY 14, 2025

In 2024 alone, hospitality saw a 20% decrease in already troubling customer retention rates. However, attrition is a metric of past performance. Attrition is the metric that confirms one or all of these scenarios occurred. And they continue to struggle to stem the tide. There are many reasons for this. This is important to note.

Customer retention

Customer retention Feedback B2C Metrics

How to Choose Outsourced Call Center Solutions

Outsource Consultants

MARCH 2, 2025

You can use industry benchmarks to estimate your staffing needs. For instance, if you receive 1,000 calls per day, you’d need to consider factors such as average handle time, first call resolution, and customer satisfaction metrics to determine the appropriate number of agents. A data breach can cost you more than just money.

call center solutions

call center solutions outsourcing Call Center Average Handle Time

How to Upgrade to an Advanced Customer Experience Strategy

Lumoa

FEBRUARY 14, 2024

From lead conversion rates (CVR), click-through rates (CTR), and Net Promoter Scores (NPS), companies use multiple metrics to analyze the effectiveness of their CX strategy. Is your CX strategy up to the task of meeting customers’ expectations going into 2024? One CX software that works well for most companies is Lumoa.

Customer Experience

Customer Experience Chatbots SaaS Analytics

How to Choose the Right Financial Services Call Center Outsourcing Partner

Outsource Consultants

MARCH 3, 2025

Financial services cybersecurity regulations are constantly evolving, with new requirements expected for 2024 and beyond. Measure Quality and Performance Quality assurance and performance metrics form the backbone of effective call center operations. These metrics should align with your business objectives and industry standards.

outsourcing

outsourcing Call Center Average Handle Time First call resolution

Medallia vs. Qualtrics vs. Lumoa: A Buyer’s Guide

Lumoa

APRIL 28, 2024

More than 80% of business leaders see customer experience as a growing priority in 2024. Despite efforts to collect and analyze feedback, employees frequently struggle to pinpoint what affects these metrics. 78% of customers have backed out of a purchase due to a poor customer experience (CX).

Feedback

Feedback Consulting Consulting Enterprise

Boost inference performance for Mixtral and Llama 2 models with new Amazon SageMaker containers

AWS Machine Learning

APRIL 8, 2024

In January 2024, Amazon SageMaker launched a new version (0.26.0) In this post, we explore the latest features introduced in this release, examine performance benchmarks, and provide a detailed guide on deploying new LLMs with LMI DLCs at high performance. of Large Model Inference (LMI) Deep Learning Containers (DLCs).

Engineering

Engineering Calibration APIs Enterprise

Protecting Your Contact Center Agents’ Emotional Well-Being

Playvox

APRIL 4, 2023

Gartner also predicted that by 2024, this emotional effort will be the top reason customer service reps leave the service center. Agents who aren’t meeting your KPI benchmarks for how many interactions they handle in a shift might be avoiding interactions or too distracted by emotional overwhelm.

Contact Center

Contact Center Agent burnout Morale Coaching

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

AWS Machine Learning

NOVEMBER 16, 2023

Enable a data science team to manage a family of classic ML models for benchmarking statistics across multiple medical units. Users from several business units were trained and onboarded to the platform, and that number is expected to grow in 2024. Another important metric is the efficiency for data science users.

Healthcare

Healthcare Government Engineering APIs

How Clearwater Analytics is revolutionizing investment management with generative AI and Amazon SageMaker JumpStart

AWS Machine Learning

DECEMBER 13, 2024

With SageMaker JumpStart, you can evaluate, compare, and select foundation models (FMs) quickly based on predefined quality and responsibility metrics to perform tasks such as article summarization and image generation. RAG benchmark Compare the fine-tuned models performance against a RAG system using a pre-trained model.

Analytics

Analytics Management Accountability Engineering

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

AWS Machine Learning

DECEMBER 2, 2024

At re:Invent 2024, we are excited to announce new capabilities to speed up your AI inference workloads with NVIDIA accelerated computing and software offerings on Amazon SageMaker. This integration provides a powerful multilingual model that excels in reasoning benchmarks.

Enterprise

Enterprise Benchmark Technology APIs

Secure AccountantAI Chatbot: Lili’s journey with Amazon Bedrock

AWS Machine Learning

JULY 18, 2024

Specific accounting knowledge that is relevant to the question and the model is not familiar with, such as updated data for 2024. Lili selected Anthropic’s Claude model family for AccountantAI after reviewing industry benchmarks and conducting their own quality assessment. Data relevant to answering the customer's question.

Chatbots

Chatbots APIs Accountability Finance

Amazon Bedrock Custom Model Import now generally available

AWS Machine Learning

OCTOBER 21, 2024

models demonstrate state-of-the-art performance on a wide range of industry benchmarks and introduce features to help you build a new generation of AI experiences. These new models provide enhanced capabilities and broader applicability across various use cases. With a focus on responsible innovation and system-level safety, the Llama 3.2

APIs

APIs Scripts Finance Real estate

Fast and accurate zero-shot forecasting with Chronos-Bolt and AutoGluon

AWS Machine Learning

DECEMBER 2, 2024

These approaches train a single deep learning model across multiple time series in a dataset—for example, sales across a broad e-commerce catalog or observability metrics for thousands of customers. Transactions on Machine Learning Research (2024). [2] In NeurIPS Track on Datasets and Benchmarks (2021). 2] Hyndman, R.

Benchmark

Benchmark Telecommunications Engineering Construction

Unlocking complex problem-solving with multi-agent collaboration on Amazon Bedrock

AWS Machine Learning

JANUARY 14, 2025

We refer to this approach as assertion-based benchmarking. Here is an example of a scenario and corresponding assertions for assertion-based benchmarking: Goals : User needs the weather conditions expected in Las Vegas for tomorrow, January 5, 2025. Since 2024, Raphael worked on multi-agent collaboration with LLM-based agents.

Finance

Finance Benchmark Construction Enterprise

Ground truth generation and review best practices for evaluating generative AI question-answering with FMEval

AWS Machine Learning

MARCH 5, 2025

Running deterministic evaluation of generative AI assistants against use case ground truth data enables the creation of custom benchmarks. These benchmarks are essential for tracking performance drift over time and for statistically comparing multiple assistants in accomplishing the same task. See for examples.

Best practices

Best practices Enterprise Metrics Benchmark

Improve factual consistency with LLM Debates

AWS Machine Learning

NOVEMBER 22, 2024

This post and the subsequent code implementation were inspired by one of the International Conference on Machine Learning (ICML) 2024 best papers on LLM debates Debating with More Persuasive LLMs Leads to More Truthful Answers. Refer to the evaluation metrics section for accuracy definition) This continues for N(=3 in this notebook) rounds.

Consulting

Consulting Consulting APIs Calibration

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

AWS Machine Learning

DECEMBER 2, 2024

Today at AWS re:Invent 2024, we are excited to announce the new Container Caching capability in Amazon SageMaker, which significantly reduces the time required to scale generative AI models for inference. To run this benchmark, we use sub-minute metrics to detect the need for scaling. The following table summarizes our setup.

Engineering

Engineering APIs Benchmark Technology

Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – part 1

AWS Machine Learning

DECEMBER 2, 2024

Today at AWS re:Invent 2024, we are excited to announce a new capability in Amazon SageMaker Inference that significantly reduces the time required to deploy and scale LLMs for inference using LMI: Fast Model Loader. To run this benchmark, we use sub-minute metrics to detect the need for scaling. For the LLaMa 3.1

Engineering

Engineering Benchmark Enterprise Technology

Benchmarking Amazon Nova and GPT-4o models with FloTorch

Top 5 Customer Service & CX Articles for Week of January 26, 2025

Trending Sources

Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

Your guide to generative AI and ML at AWS re:Invent 2024

Anthropic Claude 3.5 Sonnet ranks number 1 for business and finance in S&P AI Benchmarks by Kensho

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference

Reimagining software development with the Amazon Q Developer Agent

Top Skills Every Java Developer Should Master in 2025

Expansion is a Team Sport: Strategies for Aligning CS and Sales in 2024

Get started with Amazon Titan Text Embeddings V2: A new state-of-the-art embeddings model on Amazon Bedrock

Four key insights for SaaS and CS leaders from the 2024 B2B SaaS Benchmarking Survey

GraphStorm 0.3: Scalable, multi-task learning on graphs with user-friendly APIs

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

Achieve ~2x speed-up in LLM inference with Medusa-1 on Amazon SageMaker AI

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

The executive’s guide to generative AI for sustainability

Data Governance in the Age of AI: A Competitive Edge for Business Leaders

Mixtral 8x22B is now available in Amazon SageMaker JumpStart

AWS empowers sales teams using generative AI solution built on Amazon Bedrock

What is First Call Resolution? How to Improve (+Examples)

A review of purpose-built accelerators for financial services

How to Solve the 2025 Customer Retention Problem

How to Choose Outsourced Call Center Solutions

How to Upgrade to an Advanced Customer Experience Strategy

How to Choose the Right Financial Services Call Center Outsourcing Partner

Medallia vs. Qualtrics vs. Lumoa: A Buyer’s Guide

Boost inference performance for Mixtral and Llama 2 models with new Amazon SageMaker containers

Protecting Your Contact Center Agents’ Emotional Well-Being

Philips accelerates development of AI-enabled healthcare solutions with an MLOps platform built on Amazon SageMaker

How Clearwater Analytics is revolutionizing investment management with generative AI and Amazon SageMaker JumpStart

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

Secure AccountantAI Chatbot: Lili’s journey with Amazon Bedrock

Amazon Bedrock Custom Model Import now generally available

Fast and accurate zero-shot forecasting with Chronos-Bolt and AutoGluon

Unlocking complex problem-solving with multi-agent collaboration on Amazon Bedrock

Ground truth generation and review best practices for evaluating generative AI question-answering with FMEval

Improve factual consistency with LLM Debates

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – part 1

Stay Connected