APIs, Benchmark and How To - Customer Contact Central

Pixtral-12B-2409 is now available on Amazon Bedrock Marketplace

AWS Machine Learning

MARCH 3, 2025

In this post, we walk through how to discover, deploy, and use the Pixtral 12B model for a variety of real-world vision use cases. Performance metrics and benchmarks Pixtral 12B is trained to understand both natural images and documents, achieving 52.5% To begin using Pixtral 12B, choose Deploy.

Benchmark

Benchmark APIs Enterprise Construction

GraphStorm 0.3: Scalable, multi-task learning on graphs with user-friendly APIs

AWS Machine Learning

AUGUST 2, 2024

adds new APIs to customize GraphStorm pipelines: you now only need 12 lines of code to implement a custom node classification training loop. For more details about how to run graph multi-task learning with GraphStorm, refer to Multi-task Learning in GraphStorm in our documentation. introduces refactored graph ML pipeline APIs.

APIs

APIs Benchmark Construction Enterprise

Benchmark and optimize endpoint deployment in Amazon SageMaker JumpStart

AWS Machine Learning

JANUARY 29, 2024

This post explores these relationships via a comprehensive benchmarking of LLMs available in Amazon SageMaker JumpStart, including Llama 2, Falcon, and Mistral variants. We provide theoretical principles on how accelerator specifications impact LLM benchmarking. Additionally, models are fully sharded on the supported instance.

Benchmark

Benchmark APIs Enterprise Accountability

LLM-as-a-judge on Amazon Bedrock Model Evaluation

AWS Machine Learning

FEBRUARY 12, 2025

Amazon Bedrock , a fully managed service offering high-performing foundation models from leading AI companies through a single API, has recently introduced two significant evaluation capabilities: LLM-as-a-judge under Amazon Bedrock Model Evaluation and RAG evaluation for Amazon Bedrock Knowledge Bases. 0]}-{evaluator_model.split('.')[0]}-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"

Metrics

Metrics Engineering Benchmark APIs

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference

AWS Machine Learning

JANUARY 28, 2025

As businesses increasingly use large language models (LLMs) for these critical tasks and processes, they face a fundamental challenge: how to maintain the quick, responsive performance users expect while delivering the high-quality outputs these sophisticated models promise. In such scenarios, you want to optimize for TTFT.

Benchmark

Benchmark APIs Engineering Metrics

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

AWS Machine Learning

MARCH 6, 2025

Amazon Bedrock is a fully managed service that offers a choice of high-performing Foundation Models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Metrics

Metrics Enterprise APIs Engineering

Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs

AWS Machine Learning

SEPTEMBER 28, 2023

In this post, we describe the enhancements to the forecasting capabilities of SageMaker Canvas and guide you on using its user interface (UI) and AutoML APIs for time-series forecasting. While the SageMaker Canvas UI offers a code-free visual interface, the APIs empower developers to interact with these features programmatically.

APIs

APIs Construction Finance Enterprise

From RAG to fabric: Lessons learned from building real-world RAGs at GenAIIC – Part 2

AWS Machine Learning

NOVEMBER 15, 2024

We gave practical tips, based on hands-on experience with customer use cases, on how to improve text-only RAG solutions, from optimizing the retriever to mitigating and detecting hallucinations. We first introduce routers, and how they can help managing diverse data sources. has 92% accuracy on the HumanEval code benchmark.

APIs

APIs Engineering Chatbots Construction

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

AWS Machine Learning

FEBRUARY 24, 2025

In this post, we walk through how to discover, deploy, and use Mistral-Small-24B-Instruct-2501. At the time of writing this post, you can use the InvokeModel API to invoke the model. It doesnt support Converse APIs or other Amazon Bedrock tooling. In this section, we go over how to discover the models in SageMaker Studio.

APIs

APIs Enterprise Benchmark Feedback

Fine-tune LLMs with synthetic data for context-based Q&A using Amazon Bedrock

AWS Machine Learning

FEBRUARY 12, 2025

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. In this post, we explore how to use Amazon Bedrock to generate synthetic training data to fine-tune an LLM.

APIs

APIs Management Benchmark Scripts

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

AWS Machine Learning

DECEMBER 2, 2024

This integration provides a powerful multilingual model that excels in reasoning benchmarks. We’ll then dive into NVIDIA’s model offerings on SageMaker JumpStart, showcasing how to access and deploy the Nemotron-4 model directly in the JumpStart interface. or Mixtral.

Enterprise

Enterprise Benchmark Technology APIs

Your guide to generative AI and ML at AWS re:Invent 2024

AWS Machine Learning

NOVEMBER 19, 2024

In this talk, you’ll understand how to recognize the latest signals in changing data patterns, and adapt data strategies that flex to changes in consumer behavior and innovations in technology like AI. This session provides practical steps to streamline your model selection process, providing high-quality, reliable AI deployments.

APIs

APIs Enterprise Best practices Government

Generate training data and cost-effectively train categorical models with Amazon Bedrock

AWS Machine Learning

MARCH 27, 2025

We dive deep into this process on how to use XML tags to structure the prompt and guide Amazon Bedrock in generating a balanced label dataset with high accuracy. In the following sections, we explain how to take an incremental and measured approach to improve Anthropics Claude 3.5 Sonnet prediction accuracy through prompt engineering.

Education

Education Engineering APIs Enterprise

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

AWS Machine Learning

JULY 2, 2024

In this blog post, we show how we optimized torch.compile performance on AWS Graviton3-based EC2 instances, how to use the optimizations to improve inference performance, and the resulting speedups. You can see that for the 45 models we benchmarked, there is a 1.35x latency improvement (geomean for the 45 models).

Benchmark

Benchmark Scripts Metrics APIs

Reduce conversational AI response time through inference at the edge with AWS Local Zones

AWS Machine Learning

MARCH 3, 2025

They enable applications requiring very low latency or local data processing using familiar APIs and tool sets. This guide demonstrates how to deploy an open source FM from Hugging Face on Amazon Elastic Compute Cloud (Amazon EC2) instances across three locations: a commercial AWS Region and two AWS Local Zones.

APIs

APIs Benchmark Metrics Healthcare

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

AWS Machine Learning

JULY 29, 2024

You can also either use the SageMaker Canvas UI, which provides a visual interface for building and deploying models without needing to write any code or have any ML expertise, or use its automated machine learning (AutoML) APIs for programmatic interactions.

APIs

APIs Scripts Benchmark Metrics

How to use public reviews to improve customer experience

Lumoa

DECEMBER 8, 2022

Recently, we organized a webinar about how to integrate public reviews using Lumoa where “that cool guy Garen” showed: How to bring public reviews in Lumoa within minutes. How to identify key pain points that you can immediately tackle to increase the score.

Customer Experience

Customer Experience Benchmark APIs Marketing

Get started with Amazon Titan Text Embeddings V2: A new state-of-the-art embeddings model on Amazon Bedrock

AWS Machine Learning

MAY 2, 2024

In this post, we discuss the benefits of the V2 model, how to conduct your own evaluation of the model, and how to migrate to using the new model. A common way to select an embedding model (or any model) is to look at public benchmarks; an accepted benchmark for measuring embedding quality is the MTEB leaderboard.

Benchmark

Benchmark Metrics Enterprise APIs

Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart

AWS Machine Learning

SEPTEMBER 6, 2023

In this post, we build a secure enterprise application using AWS Amplify that invokes an Amazon SageMaker JumpStart foundation model, Amazon SageMaker endpoints, and Amazon OpenSearch Service to explain how to create text-to-text or text-to-image and Retrieval Augmented Generation (RAG). You access the React application from your computer.

Enterprise

Enterprise APIs Real estate Construction

How to extend the functionality of AWS Trainium with custom operators

AWS Machine Learning

APRIL 27, 2023

Similar to the process of PyTorch integration with C++ code, Neuron CustomOps requires a C++ implementation of an operator via a NeuronCore-ported subset of the Torch C++ API. Finally, the custom library is built by calling the load API. For more information, refer to Custom Operators API Reference Guide [Experimental].

APIs

APIs Engineering Scripts Benchmark

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

AWS Machine Learning

MARCH 19, 2024

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models from leading AI companies and Amazon via a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. A limitation of the approach is its larger computational cost.

APIs

APIs Benchmark SaaS Engineering

Optimize your machine learning deployments with auto scaling on Amazon SageMaker

AWS Machine Learning

FEBRUARY 8, 2023

Although you can integrate the model directly into an application, the approach that works well for production-grade applications is to deploy the model behind an endpoint and then invoke the endpoint via a RESTful API call to obtain the inference. However, you can use any other benchmarking tool. large two-core machine.

Benchmark

Benchmark Metrics APIs Engineering

How to Successfully Implement Customer Journey Analytics – Part 1

Pointillist

JULY 25, 2018

Learning how to choose the best customer journey analytics platform is just the start. Whether you’re just starting to evaluate an investment in a customer journey analytics platform or you’ve already made the decision and have chosen a vendor, it’s time to think about how to implement customer journey analytics in your organization.

Analytics

Analytics Government Metrics Upselling

Secure AccountantAI Chatbot: Lili’s journey with Amazon Bedrock

AWS Machine Learning

JULY 18, 2024

The ingestion workflow transforms these curated questions into vector embeddings using Amazon Titan Text Embeddings model API. Provides the core instructions on how to approach answering the question appropriately and meet expectations. The vector embeddings are persisted in the application in-memory vector store.

Chatbots

Chatbots APIs Accountability Finance

Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 2: Interactive User Experiences in SageMaker Studio

AWS Machine Learning

NOVEMBER 30, 2023

SageMaker makes it easy to deploy models into production directly through API calls to the service. It’s a low-level API available for Java, C++, Go, JavaScript, Node.js, PHP, Ruby, and Python. It’s a low-level API available for Java, C++, Go, JavaScript, Node.js, PHP, Ruby, and Python.

Benchmark

Benchmark APIs Scripts Engineering

Build a RAG-based QnA application using Llama3 models from SageMaker JumpStart

AWS Machine Learning

SEPTEMBER 12, 2024

On Hugging Face, the Massive Text Embedding Benchmark (MTEB) is provided as a leaderboard for diverse text embedding tasks. It currently provides 129 benchmarking datasets across 8 different tasks on 113 languages. medium instance to demonstrate deploying the model as an API endpoint using an SDK through SageMaker JumpStart.

APIs

APIs Benchmark Enterprise Construction

Evaluation of generative AI techniques for clinical report summarization

AWS Machine Learning

MAY 13, 2024

This is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API. It’s serverless, so you don’t have to manage any infrastructure.

Healthcare

Healthcare Engineering APIs Benchmark

Build a multilingual automatic translation pipeline with Amazon Translate Active Custom Translation

AWS Machine Learning

JUNE 15, 2023

We demonstrate how to use the AWS Management Console and Amazon Translate public API to deliver automatic machine batch translation, and analyze the translations between two language pairs: English and Chinese, and English and Spanish. In this post, we present a solution that D2L.ai

APIs

APIs Benchmark Best practices Engineering

Best practices for load testing Amazon SageMaker real-time inference endpoints

AWS Machine Learning

JANUARY 10, 2023

From there, we dive into how you can track and understand the metrics and performance of the SageMaker endpoint utilizing Amazon CloudWatch metrics. We first benchmark the performance of our model on a single instance to identify the TPS it can handle per our acceptable latency requirements. Deploy a real-time endpoint.

Best practices

Best practices Scripts APIs Metrics

Improved ML model deployment using Amazon SageMaker Inference Recommender

AWS Machine Learning

APRIL 20, 2023

In this post, we discuss a credit card fraud detection use case, and learn how to use Inference Recommender to find the optimal inference instance type and ML system configurations that can detect fraudulent credit card transactions in milliseconds. Inference Recommender uses this information to run a performance benchmark load test.

APIs

APIs Metrics Benchmark Engineering

Q&A: How To Transform Reviews Into Recurring Revenue

ChurnZero

NOVEMBER 20, 2020

During our webinar with G2, we shared how modern Customer Success teams maximize insights from customer reviews to drive recurring revenue, including how to: Know when a customer is most primed to leave a raving review – and how to perfectly time your ask. The easiest and fastest way is just an email alert.

Finance

Finance Feedback Sales Accountability

Image classification model selection using Amazon SageMaker JumpStart

AWS Machine Learning

FEBRUARY 6, 2023

And an ML researcher may ask questions like: “How can I generate my own fair comparison of multiple model architectures against a specified dataset while controlling training hyperparameters and computer specifications, such as GPUs, CPUs, and RAM?” swin-large-patch4-window7-224 195.4M efficientnet-v2-imagenet21k-ft1k-l 118.1M

APIs

APIs Scripts Metrics Benchmark

Run PyTorch Lightning and native PyTorch DDP on Amazon SageMaker Training, featuring Amazon Search

AWS Machine Learning

AUGUST 18, 2022

The team’s early benchmarking results show 7.3 The baseline model used in these benchmarking is a multi-layer perceptron neural network with seven dense fully connected layers and over 200 parameters. The following table summarizes the benchmarking result on ml.p3.16xlarge SageMaker training instances. Number of Instances.

Scripts

Scripts APIs Benchmark Engineering

Generating fashion product descriptions by fine-tuning a vision-language model with SageMaker and Amazon Bedrock

AWS Machine Learning

MAY 22, 2024

To solve this problem, this post shows you how to predict domain-specific product attributes from product images by fine-tuning a VLM on a fashion dataset using Amazon SageMaker , and then using Amazon Bedrock to generate product descriptions using the predicted attributes as input.

Scripts

Scripts Engineering Accountability Benchmark

Scalable intelligent document processing using Amazon Bedrock

AWS Machine Learning

JUNE 12, 2024

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading artificial intelligence (AI) startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.

APIs

APIs Accountability Benchmark Government

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

AWS Machine Learning

JUNE 6, 2024

In this post, we walk through how to discover and deploy the jina-embeddings-v2 model as part of a Retrieval Augmented Generation (RAG)-based question answering system in SageMaker JumpStart. Long input-context length – Jina Embeddings v2 models support 8,192 input tokens.

Benchmark

Benchmark Enterprise Construction APIs

A review of purpose-built accelerators for financial services

AWS Machine Learning

SEPTEMBER 11, 2024

In terms of resulting speedups, the approximate order is programming hardware, then programming against PBA APIs, then programming in an unmanaged language such as C++, then a managed language such as Python. The CUDA API and SDK were first released by NVIDIA in 2007. GPU PBAs, 4% other PBAs, 4% FPGA, and 0.5%

Benchmark

Benchmark Banking Analytics Big data

Train gigantic models with near-linear scaling using sharded data parallelism on Amazon SageMaker

AWS Machine Learning

OCTOBER 31, 2022

Then, you’ll learn how to train a 30B parameter GPT-2 model on SageMaker with ease with this new feature. Let’s now learn how to train a GPT-2 model with sharded data parallel, with SMP encapsulating the complexity for you. To get started, follow Modify a PyTorch Training Script to adapt SMPs’ APIs in your training script.

Scripts

Scripts Benchmark APIs Engineering

Brilliant words, brilliant writing: Using AWS AI chips to quickly deploy Meta LLama 3-powered applications

AWS Machine Learning

OCTOBER 21, 2024

In this blog post, we will introduce how to use an Amazon EC2 Inf2 instance to cost-effectively deploy multiple industry-leading LLMs on AWS Inferentia2 , a purpose-built AWS AI chip, helping customers to quickly test and open up an API interface to facilitate performance benchmarking and downstream application calls at the same time.

APIs

APIs Benchmark Education Marketing

Evaluate conversational AI agents with Amazon Bedrock

AWS Machine Learning

JULY 25, 2024

Although existing large language model (LLM) benchmarks like MT-bench evaluate model capabilities, they lack the ability to validate the application layers. Evaluator considerations By default, evaluators use the InvokeModel API with On-Demand mode, which will incur AWS charges based on input tokens processed and output tokens generated.

Engineering

Engineering APIs Best practices Virtual Agent

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

AWS Machine Learning

NOVEMBER 22, 2023

In the following sections, we discuss how to address common challenges in regards to technical focus areas. You can save time, money, and labor by implementing classifications in your workflow, and documents go to downstream applications and APIs based on document type.

APIs

APIs Metrics Benchmark Enterprise

Minimize real-time inference latency by using Amazon SageMaker routing strategies

AWS Machine Learning

NOVEMBER 30, 2023

When ML models deployed on instances receive API calls from a large number of clients, a random distribution of requests can work very well when there is not a lot of variability in your requests and responses. We explained how to enable LOR and how it can benefit your model deployments.

Engineering

Engineering Benchmark APIs Enterprise

New performance improvements in Amazon SageMaker model parallel library

AWS Machine Learning

DECEMBER 16, 2022

Then, we’ll revisit how to train foundational models using sharded data parallel. Finally, we’ll benchmark performance of 13B, 50B, and 100B parameter auto-regressive models and wrap up with future work. For training a different model type, you can follow the API document to learn about how to apply SMP APIs.

Benchmark

Benchmark Engineering APIs Scripts

Snowflake Arctic models are now available in Amazon SageMaker JumpStart

AWS Machine Learning

AUGUST 22, 2024

Snowflake Arctic is a family of enterprise-grade large language models (LLMs) built by Snowflake to cater to the needs of enterprise users, exhibiting exceptional capabilities (as shown in the following benchmarks ) in SQL querying, coding, and accurately following instructions. To learn more, refer to API documentation.

Enterprise

Enterprise APIs Benchmark Scripts

Pixtral-12B-2409 is now available on Amazon Bedrock Marketplace

GraphStorm 0.3: Scalable, multi-task learning on graphs with user-friendly APIs

Trending Sources

Benchmark and optimize endpoint deployment in Amazon SageMaker JumpStart

LLM-as-a-judge on Amazon Bedrock Model Evaluation

Optimizing AI responsiveness: A practical guide to Amazon Bedrock latency-optimized inference

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs

From RAG to fabric: Lessons learned from building real-world RAGs at GenAIIC – Part 2

Mistral-Small-24B-Instruct-2501 is now available on SageMaker Jumpstart and Amazon Bedrock Marketplace

Fine-tune LLMs with synthetic data for context-based Q&A using Amazon Bedrock

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

Your guide to generative AI and ML at AWS re:Invent 2024

Generate training data and cost-effectively train categorical models with Amazon Bedrock

Accelerated PyTorch inference with torch.compile on AWS Graviton processors

Reduce conversational AI response time through inference at the edge with AWS Local Zones

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

How to use public reviews to improve customer experience

Get started with Amazon Titan Text Embeddings V2: A new state-of-the-art embeddings model on Amazon Bedrock

Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart

How to extend the functionality of AWS Trainium with custom operators

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

Optimize your machine learning deployments with auto scaling on Amazon SageMaker

How to Successfully Implement Customer Journey Analytics – Part 1

Secure AccountantAI Chatbot: Lili’s journey with Amazon Bedrock

Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 2: Interactive User Experiences in SageMaker Studio

Build a RAG-based QnA application using Llama3 models from SageMaker JumpStart

Evaluation of generative AI techniques for clinical report summarization

Build a multilingual automatic translation pipeline with Amazon Translate Active Custom Translation

Best practices for load testing Amazon SageMaker real-time inference endpoints

Improved ML model deployment using Amazon SageMaker Inference Recommender

Q&A: How To Transform Reviews Into Recurring Revenue

Image classification model selection using Amazon SageMaker JumpStart

Run PyTorch Lightning and native PyTorch DDP on Amazon SageMaker Training, featuring Amazon Search

Generating fashion product descriptions by fine-tuning a vision-language model with SageMaker and Amazon Bedrock

Scalable intelligent document processing using Amazon Bedrock

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

A review of purpose-built accelerators for financial services

Train gigantic models with near-linear scaling using sharded data parallelism on Amazon SageMaker

Brilliant words, brilliant writing: Using AWS AI chips to quickly deploy Meta LLama 3-powered applications

Evaluate conversational AI agents with Amazon Bedrock

Build well-architected IDP solutions with a custom lens – Part 4: Performance efficiency

Minimize real-time inference latency by using Amazon SageMaker routing strategies

New performance improvements in Amazon SageMaker model parallel library

Snowflake Arctic models are now available in Amazon SageMaker JumpStart

Stay Connected