Announcing Llama 3.1 405B, 70B, and 8B models from Meta in Amazon Bedrock

Today, we are announcing the general availability of Llama 3.1 models in Amazon Bedrock. The Llama 3.1 models are Meta’s most advanced and capable models to date. The Llama 3.1 models are a collection of 8B, 70B, and 405B parameter size models that demonstrate state-of-the-art performance on a wide range of industry benchmarks and offer new capabilities for your generative artificial intelligence (generative AI) applications.

All Llama 3.1 models support a 128K context length (an increase of 120K tokens from Llama 3) that has 16 times the capacity of Llama 3 models and improved reasoning for multilingual dialogue use cases in eight languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

You can now use three new Llama 3.1 models from Meta in Amazon Bedrock to build, experiment, and responsibly scale your generative AI ideas:

Llama 3.1 405B is the world’s largest publicly available large language model (LLM) according to Meta. The model sets a new standard for AI and is ideal for enterprise-level applications and research and development (R&D). It is ideal for tasks like synthetic data generation where the outputs of the model can be used to improve smaller Llama models and model distillations to transfer knowledge to smaller models from the 405B model. This model excels at general knowledge, long-form text generation, multilingual translation, machine translation, coding, math, tool use, enhanced contextual understanding, and advanced reasoning and decision-making. To learn more, visit the AWS Machine Learning Blog about using Llama 3.1 405B to generate synthetic data for model distillation.
Llama 3.1 70B is ideal for content creation, conversational AI, language understanding, R&D, and enterprise applications. The model excels at text summarization and accuracy, text classification, sentiment analysis and nuance reasoning, language modeling, dialogue systems, code generation, and following instructions.
Llama 3.1 8B is best suited for limited computational power and resources. The model excels at text summarization, text classification, sentiment analysis, and language translation requiring low-latency inferencing.

Meta measured the performance of Llama 3.1 on over 150 benchmark datasets that span a wide range of languages and extensive human evaluations. As you can see in the following chart, Llama 3.1 outperforms Llama 3 in every major benchmarking category.

To learn more about Llama 3.1 features and capabilities, visit the Llama 3.1 Model Card from Meta and Llama models in the AWS documentation.

You can take advantage of Llama 3.1’s responsible AI capabilities, combined with the data governance and model evaluation features of Amazon Bedrock to build secure and reliable generative AI applications with confidence.

Guardrails for Amazon Bedrock – By creating multiple guardrails with different configurations tailored to specific use cases, you can use Guardrails to promote safe interactions between users and your generative AI applications by implementing safeguards customized to your use cases and responsible AI policies. With Guardrails for Amazon Bedrock, you can continually monitor and analyze user inputs and model responses that might violate customer-defined policies, detect hallucination in model responses that are not grounded in enterprise data or are irrelevant to the user’s query, and evaluate across different models including custom and third-party models. To get started, visit Create a guardrail in the AWS documentation.
Model evaluation on Amazon Bedrock – You can evaluate, compare, and select the best Llama models for your use case in just a few steps using either automatic evaluation or human evaluation. With model evaluation on Amazon Bedrock, you can choose automatic evaluation with predefined metrics such as accuracy, robustness, and toxicity. Alternatively, you can choose human evaluation workflows for subjective or custom metrics such as relevance, style, and alignment to brand voice. Model evaluation provides built-in curated datasets or you can bring in your own datasets. To get started, visit Get started with model evaluation in the AWS documentation.

To learn more about how to keep your data and applications secure and private in AWS, visit the Amazon Bedrock Security and Privacy page.

Getting started with Llama 3.1 models in Amazon Bedrock
If you are new to using Llama models from Meta, go to the Amazon Bedrock console in the US West (Oregon) Region and choose Model access on the bottom left pane. To access the latest Llama 3.1 models from Meta, request access separately for Llama 3.1 8B Instruct, Llama 3.1 70B Instruct, or Llama 3.1 450B Instruct.

To test the Llama 3.1 models in the Amazon Bedrock console, choose Text or Chat under Playgrounds in the left menu pane. Then choose Select model and select Meta as the category and Llama 3.1 8B Instruct, Llama 3.1 70B Instruct, or Llama 3.1 405B Instruct as the model.

In the following example I selected the Llama 3.1 405B Instruct model.

By choosing View API request, you can also access the model using code examples in the AWS Command Line Interface (AWS CLI) and AWS SDKs. You can use model IDs such as meta.llama3-1-8b-instruct-v1, meta.llama3-1-70b-instruct-v1 , or meta.llama3-1-405b-instruct-v1.

Here is a sample of the AWS CLI command:

aws bedrock-runtime invoke-model \
  --model-id meta.llama3-1-405b-instruct-v1:0 \
--body "{\"prompt\":\" [INST]You are a very intelligent bot with exceptional critical thinking[/INST] I went to the market and bought 10 apples. I gave 2 apples to your friend and 2 to the helper. I then went and bought 5 more apples and ate 1. How many apples did I remain with? Let's think step by step.\",\"max_gen_len\":512,\"temperature\":0.5,\"top_p\":0.9}" \
  --cli-binary-format raw-in-base64-out \
  --region us-west-2 \
  invoke-model-output.txt

You can use code examples for Llama models in Amazon Bedrock using AWS SDKs to build your applications using various programming languages. The following Python code examples show how to send a text message to Llama using the Amazon Bedrock Converse API for text generation.

import boto3
from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region you want to use.
client = boto3.client("bedrock-runtime", region_name="us-west-2")

# Set the model ID, e.g., Llama 3 8b Instruct.
model_id = "meta.llama3-1-405b-instruct-v1:0"

# Start a conversation with the user message.
user_message = "Describe the purpose of a 'hello world' program in one line."
conversation = [
    {
        "role": "user",
        "content": [{"text": user_message}],
    }
]

try:
    # Send the message to the model, using a basic inference configuration.
    response = client.converse(
        modelId=model_id,
        messages=conversation,
        inferenceConfig={"maxTokens": 512, "temperature": 0.5, "topP": 0.9},
    )

    # Extract and print the response text.
    response_text = response["output"]["message"]["content"][0]["text"]
    print(response_text)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

You can also use all Llama 3.1 models (8B, 70B, and 405B) in Amazon SageMaker JumpStart. You can discover and deploy Llama 3.1 models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. You can operate your models with SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container logs under your virtual private cloud (VPC) controls, which help provide data security.

The fine-tuning for Llama 3.1 models in Amazon Bedrock and Amazon SageMaker JumpStart will be coming soon. When you build fine-tuned models in SageMaker JumpStart, you will also be able to import your custom models into Amazon Bedrock. To learn more, visit Meta Llama 3.1 models are now available in Amazon SageMaker JumpStart on the AWS Machine Learning Blog.

For customers who want to deploy Llama 3.1 models on AWS through self-managed machine learning workflows for greater flexibility and control of underlying resources, AWS Trainium and AWS Inferentia-powered Amazon Elastic Compute Cloud (Amazon EC2) instances enable high performance, cost-effective deployment of Llama 3.1 models on AWS. To learn more, visit AWS AI chips deliver high performance and low cost for Meta Llama 3.1 models on AWS in the AWS Machine Learning Blog.

Customer voices
To celebrate this launch, Parkin Kent, Business Development Manager at Meta, talks about the power of the Meta and Amazon collaboration, highlighting how Meta and Amazon are working together to push the boundaries of what’s possible with generative AI.

Discover how customer’s businesses are leveraging Llama models in Amazon Bedrock to harness the power of generative AI. Nomura, a global financial services group spanning 30 countries and regions, is democratizing generative AI across its organization using Llama models in Amazon Bedrock.

TaskUs, a leading provider of outsourced digital services and next-generation customer experience to the world’s most innovative companies, is helping clients represent, protect, and grow their brands using Llama models in Amazon Bedrock.

Now available
Llama 3.1 405B, 70B, and 8B models from Meta are generally available today in Amazon Bedrock in the US West (Oregon) Region. Check the full Region list for future updates. To learn more, check out the Llama in Amazon Bedrock product page and the Amazon Bedrock pricing page.

Give Llama 3.1 a try in the Amazon Bedrock console today, and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.

Visit our community.aws site to find deep-dive technical content and to discover how our Builder communities are using Amazon Bedrock in their solutions. Let me know what you build with Llama 3.1 in Amazon Bedrock!

— Channy

July 23, 2024 – Updated post to add new screenshot for model access and customer video featuring TaskUs.
July 25, 2024 – Updated post to indicate that Llama 3.1 405B is now generally available.

Blog Article: Here