article thumbnail

Pre-training genomic language models using AWS HealthOmics and Amazon SageMaker

AWS Machine Learning

The ability to effectively analyze and interpret genomic data at scale is the key to precision medicine, agricultural optimization, and biotechnological breakthroughs, making genomic language models a possible new foundational technology in these industries. You can skip this step If you already have your own genomic data in a sequence store.

article thumbnail

Efficiently fine-tune the ESM-2 protein language model with Amazon SageMaker

AWS Machine Learning

In the following sections, we go through the steps to prepare your training data, create a training script, and run a SageMaker training job. save_to_disk(test_s3_uri) Create a training script SageMaker script mode allows you to run your custom training code in optimized machine learning (ML) framework containers managed by AWS.

Scripts 115
article thumbnail

Accelerate protein structure prediction with the ESMFold language model on Amazon SageMaker

AWS Machine Learning

This post provides an example Jupyter notebook and related scripts in the following GitHub repository. script to load the model, run the prediction, and format the output. This script includes much of the same code we used in our notebook. CPU-optimized image on an ml.r5.xlarge xlarge instance type.