APIs, Benchmark and Document - Customer Contact Central

Benchmarking Amazon Nova and GPT-4o models with FloTorch

AWS Machine Learning

MARCH 11, 2025

Using its enterprise software, FloTorch conducted an extensive comparison between Amazon Nova models and OpenAIs GPT-4o models with the Comprehensive Retrieval Augmented Generation (CRAG) benchmark dataset. FloTorch used these queries and their ground truth answers to create a subset benchmark dataset.

Benchmark

Benchmark APIs Enterprise Scripts

Pixtral-12B-2409 is now available on Amazon Bedrock Marketplace

AWS Machine Learning

MARCH 3, 2025

Overview of Pixtral 12B Pixtral 12B, Mistrals inaugural VLM, delivers robust performance across a range of benchmarks, surpassing other open models and rivaling larger counterparts, according to Mistrals evaluation. Performance metrics and benchmarks Pixtral 12B is trained to understand both natural images and documents, achieving 52.5%

Benchmark

Benchmark APIs Enterprise Construction

Generate training data and cost-effectively train categorical models with Amazon Bedrock

AWS Machine Learning

MARCH 27, 2025

Lets say the task at hand is to predict the root cause categories (Customer Education, Feature Request, Software Defect, Documentation Improvement, Security Awareness, and Billing Inquiry) for customer support cases. For a multiclass classification problem such as support case root cause categorization, this challenge compounds many fold.

Education

Education Engineering APIs Enterprise

Scalable intelligent document processing using Amazon Bedrock

AWS Machine Learning

JUNE 12, 2024

In today’s data-driven business landscape, the ability to efficiently extract and process information from a wide range of documents is crucial for informed decision-making and maintaining a competitive edge. The Anthropic Claude 3 Haiku model then processes the documents and returns the desired information, streamlining the entire workflow.

APIs

APIs Accountability Benchmark Government

GraphStorm 0.3: Scalable, multi-task learning on graphs with user-friendly APIs

AWS Machine Learning

AUGUST 2, 2024

adds new APIs to customize GraphStorm pipelines: you now only need 12 lines of code to implement a custom node classification training loop. For more details about how to run graph multi-task learning with GraphStorm, refer to Multi-task Learning in GraphStorm in our documentation. introduces refactored graph ML pipeline APIs.

APIs

APIs Benchmark Construction Enterprise

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

AWS Machine Learning

MARCH 6, 2025

Amazon Bedrock is a fully managed service that offers a choice of high-performing Foundation Models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Metrics

Metrics Enterprise APIs Engineering

Fine-tune LLMs with synthetic data for context-based Q&A using Amazon Bedrock

AWS Machine Learning

FEBRUARY 12, 2025

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. Solution overview The solution comprises two main steps: Generate synthetic data using the Amazon Bedrock InvokeModel API.

APIs

APIs Management Benchmark Scripts

Anthropic Claude 3.5 Sonnet ranks number 1 for business and finance in S&P AI Benchmarks by Kensho

AWS Machine Learning

JULY 9, 2024

Sonnet currently ranks at the top of S&P AI Benchmarks by Kensho , which assesses large language models (LLMs) for finance and business. For example, there could be leakage of benchmark datasets’ questions and answers into training data. Anthropic Claude 3.5 Kensho is the AI Innovation Hub for S&P Global. Anthropic Claude 3.5

Finance

Finance Benchmark industry standards Accountability

LLM-as-a-judge on Amazon Bedrock Model Evaluation

AWS Machine Learning

FEBRUARY 12, 2025

Amazon Bedrock , a fully managed service offering high-performing foundation models from leading AI companies through a single API, has recently introduced two significant evaluation capabilities: LLM-as-a-judge under Amazon Bedrock Model Evaluation and RAG evaluation for Amazon Bedrock Knowledge Bases. 0]}-{evaluator_model.split('.')[0]}-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"

Metrics

Metrics Engineering APIs Benchmark

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference

AWS Machine Learning

JANUARY 28, 2025

Consider benchmarking your user experience to find the best latency for your use case, considering that most humans cant read faster than 225 words per minute and therefore extremely fast response can hinder user experience. In such scenarios, you want to optimize for TTFT. Users prefer accurate responses over quick but less reliable ones.

Benchmark

Benchmark APIs Engineering Metrics

From RAG to fabric: Lessons learned from building real-world RAGs at GenAIIC – Part 2

AWS Machine Learning

NOVEMBER 15, 2024

This centralized system consolidates a wide range of data sources, including detailed reports, FAQs, and technical documents. The system integrates structured data, such as tables containing product properties and specifications, with unstructured text documents that provide in-depth product descriptions and usage guidelines.

APIs

APIs Engineering Chatbots Construction

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning

NOVEMBER 19, 2024

Its agent for software development can solve complex tasks that go beyond code suggestions, such as building entire application features, refactoring code, or generating documentation. Attendees will learn practical applications of generative AI for streamlining and automating document-centric workflows. Hear from Availity on how 1.5

APIs

APIs Enterprise Best practices Government

Intelligent healthcare forms analysis with Amazon Bedrock

AWS Machine Learning

AUGUST 13, 2024

The healthcare industry generates and collects a significant amount of unstructured textual data, including clinical documentation such as patient information, medical history, and test results, as well as non-clinical documentation like administrative records. Lastly, the Lambda function stores the question list in Amazon S3.

Healthcare

Healthcare APIs Consulting Consulting

Get started with Amazon Titan Text Embeddings V2: A new state-of-the-art embeddings model on Amazon Bedrock

AWS Machine Learning

MAY 2, 2024

In September of 2023, we announced the launch of Amazon Titan Text Embeddings V1, a multilingual text embeddings model that converts text inputs like single words, phrases, or large documents into high-dimensional numerical vector representations. In this benchmark, 33 different text embedding models were evaluated on the MTEB tasks.

Benchmark

Benchmark Metrics Enterprise APIs

AI21 Labs Jamba-Instruct model is now available in Amazon Bedrock

AWS Machine Learning

JUNE 25, 2024

Jamba-Instruct is built by AI21 Labs, and most notably supports a 256,000-token context window, making it especially useful for processing large documents and complex Retrieval Augmented Generation (RAG) applications. Prompt guidance for Jamba-Instruct can be found in the AI21 model documentation.

APIs

APIs Benchmark Enterprise Technology

Build a RAG-based QnA application using Llama3 models from SageMaker JumpStart

AWS Machine Learning

SEPTEMBER 12, 2024

You can use the BGE embedding model to retrieve relevant documents and then use the BGE reranker to obtain final results. On Hugging Face, the Massive Text Embedding Benchmark (MTEB) is provided as a leaderboard for diverse text embedding tasks. It currently provides 129 benchmarking datasets across 8 different tasks on 113 languages.

APIs

APIs Benchmark Enterprise Construction

Reduce conversational AI response time through inference at the edge with AWS Local Zones

AWS Machine Learning

MARCH 3, 2025

They enable applications requiring very low latency or local data processing using familiar APIs and tool sets. Through comparative benchmarking tests, we illustrate how deploying FMs in Local Zones closer to end users can significantly reduce latencya critical factor for real-time applications such as conversational AI assistants.

APIs

APIs Benchmark Metrics Healthcare

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

AWS Machine Learning

NOVEMBER 22, 2023

When a customer has a production-ready intelligent document processing (IDP) workload, we often receive requests for a Well-Architected review. To follow along with this post, you should be familiar with the previous posts in this series ( Part 1 and Part 2 ) and the guidelines in Guidance for Intelligent Document Processing on AWS.

APIs

APIs Metrics Benchmark Enterprise

Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart

AWS Machine Learning

SEPTEMBER 6, 2023

These SageMaker endpoints are consumed in the Amplify React application through Amazon API Gateway and AWS Lambda functions. To protect the application and APIs from inadvertent access, Amazon Cognito is integrated into Amplify React, API Gateway, and Lambda functions. You access the React application from your computer.

Enterprise

Enterprise APIs Real estate Construction

Build a multilingual automatic translation pipeline with Amazon Translate Active Custom Translation

AWS Machine Learning

JUNE 15, 2023

We demonstrate how to use the AWS Management Console and Amazon Translate public API to deliver automatic machine batch translation, and analyze the translations between two language pairs: English and Chinese, and English and Spanish. First, we put the source documents, reference documents, and parallel data training set in an S3 bucket.

APIs

APIs Benchmark Best practices Engineering

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

AWS Machine Learning

JUNE 6, 2024

You can use this tutorial as a starting point for a variety of chatbot-based solutions for customer service, internal support, and question answering systems based on internal and private documents. This makes the models especially powerful at tasks such as clustering for long documents like legal text or product documentation.

Benchmark

Benchmark Enterprise Construction APIs

Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock

AWS Machine Learning

JUNE 20, 2024

In addition, RAG architecture can lead to potential issues like retrieval collapse , where the retrieval component learns to retrieve the same documents regardless of the input. Lack of standardized benchmarks – There are no widely accepted and standardized benchmarks yet for holistically evaluating different capabilities of RAG systems.

Metrics

Metrics Engineering Accountability Benchmark

Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 2: Interactive User Experiences in SageMaker Studio

AWS Machine Learning

NOVEMBER 30, 2023

SageMaker makes it easy to deploy models into production directly through API calls to the service. It’s a low-level API available for Java, C++, Go, JavaScript, Node.js, PHP, Ruby, and Python. It’s a low-level API available for Java, C++, Go, JavaScript, Node.js, PHP, Ruby, and Python.

Benchmark

Benchmark APIs Scripts Engineering

Amazon Comprehend announces lower annotation limits for custom entity recognition

AWS Machine Learning

AUGUST 3, 2022

Amazon Comprehend is a natural-language processing (NLP) service you can use to automatically extract entities, key phrases, language, sentiments, and other insights from documents. All you need to do is load your dataset of documents and annotations, and use the Amazon Comprehend console, AWS CLI, or APIs to create the model.

Benchmark

Benchmark APIs Metrics Scripts

The executive’s guide to generative AI for sustainability

AWS Machine Learning

APRIL 22, 2024

These include the ability to analyze massive amounts of data, identify patterns, summarize documents, perform translations, correct errors, or answer questions. This involves documenting data lineage, data versioning, automating data processing, and monitoring data management costs.

Best practices

Best practices Benchmark Transportation Engineering

Build a contextual chatbot for financial services using Amazon SageMaker JumpStart, Llama 2 and Amazon OpenSearch Serverless with Vector Engine

AWS Machine Learning

NOVEMBER 22, 2023

Model choices – SageMaker JumpStart offers a selection of state-of-the-art ML models that consistently rank among the top in industry-recognized HELM benchmarks. For instance, a financial firm might prefer its Q&A bot to source answers from its latest internal documents, ensuring accuracy and compliance with its business rules.

Engineering

Engineering Chatbots APIs Benchmark

A progress update on our commitment to safe, responsible generative AI

AWS Machine Learning

JULY 10, 2024

AI Service Cards are a form of responsible AI documentation that provide customers with a single place to find information on the intended use cases and limitations, responsible AI design choices, and deployment and performance optimization best practices for our AI services and models.

Government

Government Education Best practices APIs

Mixtral 8x22B is now available in Amazon SageMaker JumpStart

AWS Machine Learning

MAY 17, 2024

What is Mixtral 8x22B Mixtral 8x22B is Mistral AI’s latest open-weights model and sets a new standard for performance and efficiency of available foundation models , as measured by Mistral AI across standard industry benchmarks. making the model available for exploring, testing, and deploying.

APIs

APIs Benchmark Personalization Enterprise

Build well-architected IDP solutions with a custom lens – Part 5: Cost optimization

AWS Machine Learning

NOVEMBER 22, 2023

An intelligent document processing (IDP) project usually combines optical character recognition (OCR) and natural language processing (NLP) to read and understand a document and extract specific terms or words. As of this writing, it includes the following values: TABLES , FORMS , QUERIES , SIGNATURES , and LAYOUT.

Finance

Finance Best practices APIs Accountability

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

AWS Machine Learning

OCTOBER 2, 2024

In addition, they use the developer-provided instruction to create an orchestration plan and then carry out the plan by invoking company APIs and accessing knowledge bases using Retrieval Augmented Generation (RAG) to provide an answer to the user’s request. In Part 1, we focus on creating accurate and reliable agents.

Best practices

Best practices APIs Metrics Accountability

Run PyTorch Lightning and native PyTorch DDP on Amazon SageMaker Training, featuring Amazon Search

AWS Machine Learning

AUGUST 18, 2022

The team’s early benchmarking results show 7.3 The baseline model used in these benchmarking is a multi-layer perceptron neural network with seven dense fully connected layers and over 200 parameters. The following table summarizes the benchmarking result on ml.p3.16xlarge SageMaker training instances. Number of Instances.

Scripts

Scripts APIs Benchmark Engineering

Testing times: testingRTC is the smart, synchronized, real-world scenario WebRTC testing solution for the times we live in.

Spearline

JULY 21, 2022

And testingRTC offers multiple ways to export these metrics, from direct collection from webhooks, to downloading results in CSV format using the REST API. testingRTC is predominantly a self-service platform, where you write and test any script you want independently of us with our extensive knowledge base documentation as a guide.

Scripts

Scripts APIs Metrics Analytics

How Mend.io unlocked hidden patterns in CVE data with Anthropic Claude on Amazon Bedrock

AWS Machine Learning

JULY 18, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

APIs

APIs Technology Analytics Benchmark

What to Look for in a Document Automation Tool

Cincom

MAY 3, 2024

Manually creating customized communication documents like quotes, invoices, contracts, and reports is an inefficient process prone to human error. If you’re considering implementing a document automation solution for your organization, there are several key capabilities to evaluate during your search. What Is Document Automation?

Enterprise

Enterprise CRM APIs Customer retention

Minimize real-time inference latency by using Amazon SageMaker routing strategies

AWS Machine Learning

NOVEMBER 30, 2023

When ML models deployed on instances receive API calls from a large number of clients, a random distribution of requests can work very well when there is not a lot of variability in your requests and responses. To learn more about SageMaker routing features, refer to documentation.

Engineering

Engineering APIs Benchmark Enterprise

Accelerate Amazon SageMaker inference with C6i Intel-based Amazon EC2 instances

AWS Machine Learning

MARCH 20, 2023

Refer to the appendix for instance details and benchmark data. To access the code and documentation, refer to the GitHub repo. Given a document as an input, the model will answer simple questions based on the learning and contexts from the input document. The following diagram illustrates the high-level flow.

Calibration

Calibration Scripts Benchmark APIs

Illustrative notebooks in Amazon SageMaker JumpStart

AWS Machine Learning

DECEMBER 1, 2022

They show the usage of various SageMaker and JumpStart APIs. This notebook demonstrates how to deploy AlexaTM 20B through the JumpStart API and run inference. NTM is most commonly used to discover a user-specified number of topics shared by documents within a text corpus. This is called zero-shot in-context learning.

APIs

APIs Benchmark Advertising Banking

Evaluation of generative AI techniques for clinical report summarization

AWS Machine Learning

MAY 13, 2024

This is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API. It’s serverless, so you don’t have to manage any infrastructure.

Healthcare

Healthcare Engineering APIs Benchmark

Evaluate conversational AI agents with Amazon Bedrock

AWS Machine Learning

JULY 25, 2024

Although existing large language model (LLM) benchmarks like MT-bench evaluate model capabilities, they lack the ability to validate the application layers. Refer to the agent-evaluation target documentation for details. The principal must have the permissions to call the target agent.

APIs

APIs Engineering Best practices Virtual Agent

New performance improvements in Amazon SageMaker model parallel library

AWS Machine Learning

DECEMBER 16, 2022

Finally, we’ll benchmark performance of 13B, 50B, and 100B parameter auto-regressive models and wrap up with future work. For training a different model type, you can follow the API document to learn about how to apply SMP APIs. You can refer to this document for supported configurations. Benchmarking performance.

Benchmark

Benchmark Engineering APIs Scripts

Train gigantic models with near-linear scaling using sharded data parallelism on Amazon SageMaker

AWS Machine Learning

OCTOBER 31, 2022

To get started, follow Modify a PyTorch Training Script to adapt SMPs’ APIs in your training script. You can follow the comments in the script and API document to learn more about where SMP APIs are used. Benchmarking performance. We benchmarked sharded data parallelism in the SMP library on both 16 and 32 p4d.24xlarge

Scripts

Scripts Benchmark APIs Engineering

Best practices to build generative AI applications on AWS

AWS Machine Learning

MARCH 14, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single API. Kojima et al. 2022) introduced an idea of zero-shot CoT by using FMs’ untapped zero-shot capabilities.

Best practices

Best practices Engineering Chatbots Enterprise

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

AWS Machine Learning

NOVEMBER 1, 2024

Tools and APIs – For example, when you need to teach Anthropic’s Claude 3 Haiku how to use your APIs well. Based on our hyperparameter tuning experiments across different use cases, the API allows a range of 4–256, with a default of 32.

Best practices

Best practices APIs Finance Engineering

Snowflake Arctic models are now available in Amazon SageMaker JumpStart

AWS Machine Learning

AUGUST 22, 2024

Snowflake Arctic is a family of enterprise-grade large language models (LLMs) built by Snowflake to cater to the needs of enterprise users, exhibiting exceptional capabilities (as shown in the following benchmarks ) in SQL querying, coding, and accurately following instructions. To learn more, refer to API documentation.

Enterprise

Enterprise APIs Benchmark Scripts

Benchmarking Amazon Nova and GPT-4o models with FloTorch

Pixtral-12B-2409 is now available on Amazon Bedrock Marketplace

Trending Sources

Generate training data and cost-effectively train categorical models with Amazon Bedrock

Scalable intelligent document processing using Amazon Bedrock

GraphStorm 0.3: Scalable, multi-task learning on graphs with user-friendly APIs

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

Fine-tune LLMs with synthetic data for context-based Q&A using Amazon Bedrock

Anthropic Claude 3.5 Sonnet ranks number 1 for business and finance in S&P AI Benchmarks by Kensho

LLM-as-a-judge on Amazon Bedrock Model Evaluation

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference

From RAG to fabric: Lessons learned from building real-world RAGs at GenAIIC – Part 2

Your guide to generative AI and ML at AWS re:Invent 2024

Intelligent healthcare forms analysis with Amazon Bedrock

Get started with Amazon Titan Text Embeddings V2: A new state-of-the-art embeddings model on Amazon Bedrock

AI21 Labs Jamba-Instruct model is now available in Amazon Bedrock

Build a RAG-based QnA application using Llama3 models from SageMaker JumpStart

Reduce conversational AI response time through inference at the edge with AWS Local Zones

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart

Build a multilingual automatic translation pipeline with Amazon Translate Active Custom Translation

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock

Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 2: Interactive User Experiences in SageMaker Studio

Amazon Comprehend announces lower annotation limits for custom entity recognition

The executive’s guide to generative AI for sustainability

Build a contextual chatbot for financial services using Amazon SageMaker JumpStart, Llama 2 and Amazon OpenSearch Serverless with Vector Engine

A progress update on our commitment to safe, responsible generative AI

Mixtral 8x22B is now available in Amazon SageMaker JumpStart

Build well-architected IDP solutions with a custom lens – Part 5: Cost optimization

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1

Run PyTorch Lightning and native PyTorch DDP on Amazon SageMaker Training, featuring Amazon Search

Testing times: testingRTC is the smart, synchronized, real-world scenario WebRTC testing solution for the times we live in.

How Mend.io unlocked hidden patterns in CVE data with Anthropic Claude on Amazon Bedrock

What to Look for in a Document Automation Tool

Minimize real-time inference latency by using Amazon SageMaker routing strategies

Accelerate Amazon SageMaker inference with C6i Intel-based Amazon EC2 instances

Illustrative notebooks in Amazon SageMaker JumpStart

Evaluation of generative AI techniques for clinical report summarization

Evaluate conversational AI agents with Amazon Bedrock

New performance improvements in Amazon SageMaker model parallel library

Train gigantic models with near-linear scaling using sharded data parallelism on Amazon SageMaker

Best practices to build generative AI applications on AWS

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

Snowflake Arctic models are now available in Amazon SageMaker JumpStart

Stay Connected