Meet your training timelines and budgets with new Amazon SageMaker HyperPod flexible training plans

Today, we’re announcing the general availability of Amazon SageMaker HyperPod flexible training plans to help data scientists train large foundation models (FMs) within their timelines and budgets and save them weeks of effort in managing the training process based on compute availability.

At AWS re:Invent 2023, we introduced SageMaker HyperPod to reduce the time to train FMs by up to 40 percent and scale across thousands of compute resources in parallel with preconfigured distributed training libraries and built-in resiliency. Most generative AI model development tasks need accelerated compute resources in parallel. Our customers struggle to find timely access to compute resources to complete their training within their timeline and budget constraints.

With today’s announcement, you can find the required accelerated compute resources for training, create the most optimal training plans, and run training workloads across different blocks of capacity based on the availability of the compute resources. Within a few steps, you can identify training completion date, budget, compute resources requirements, create optimal training plans, and run fully managed training jobs, without needing manual intervention.

SageMaker HyperPod training plans in action
To get started, go to the Amazon SageMaker AI console, choose Training plans in the left navigation pane, and choose Create training plan.

For example, choose your preferred training date and time (10 days), instance type and count (16 ml.p5.48xlarge) for SageMaker HyperPod cluster, and choose Find training plan.

SageMaker HyperPod suggests a training plan that is split into two five-day segments. This includes the total upfront price for the plan.

If you accept this training plan, add your training details in the next step and choose Create your plan.

After creating your training plan, you can see the list of training plans. When you’ve created a training plan, you have to pay upfront for the plan within 12 hours. One plan is in the Active state and already started, with all the instances being used. The second plan is Scheduled to start later, but you can already submit jobs that start automatically when the plan begins.

In the active status, the compute resources are available in SageMaker HyperPod, resume automatically after pauses in availability, and terminates at the end of the plan. There is a first segment currently running and another segment queued up to run after the current segment.

This is similar to the Managed Spot training in SageMaker AI, where SageMaker AI takes care of instance interruptions and continues the training with no manual intervention. To learn more, visit the SageMaker HyperPod training plans in the Amazon SageMaker AI Developer Guide.

Now available
Amazon SageMaker HyperPod training plans are now available in US East (N. Virginia), US East (Ohio), US West (Oregon) AWS Regions and support ml.p4d.48xlarge, ml.p5.48xlarge, ml.p5e.48xlargeml.p5en.48xlarge, and ml.trn2.48xlarge instances. Trn2 and P5en instances are only in US East (Ohio) Region. To learn more, visit the SageMaker HyperPod product page and SageMaker AI pricing page.

Give HyperPod training plans a try in the Amazon SageMaker AI console and send feedback to AWS re:Post for SageMaker AI or through your usual AWS Support contacts.

Channy


Blog Article: Here

  • Related Posts

    Vibe coding with GitHub Copilot: Agent mode and MCP support rolling out to all VS Code users

    In celebration of MSFT’s 50th anniversary, we’re rolling out Agent Mode with MCP support to all VS code users. We are also announcing the new GitHub Copilot Pro+ plan w/ premium requests, the general availability of models from Anthropic, Google, and OpenAI, next edit suggestions for code completions & the Copilot code review agent.

    The post Vibe coding with GitHub Copilot: Agent mode and MCP support rolling out to all VS Code users appeared first on The GitHub Blog.

    Your AI Companion

    As I look back on the incredible impact that Microsoft has had over its now 50 years of relentless innovation, I’m inspired by the simplicity and power of Bill Gates’ bold ambition all those years ago: to put a PC on every desk and in every home. At Microsoft AI we’re driven by that same…

    The post Your AI Companion appeared first on The Official Microsoft Blog.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Vibe coding with GitHub Copilot: Agent mode and MCP support rolling out to all VS Code users

    Vibe coding with GitHub Copilot: Agent mode and MCP support rolling out to all VS Code users

    Say Hello to Your New Colleague, the AI Agent

    Say Hello to Your New Colleague, the AI Agent
    Your AI Companion

    The latest AI news we announced in March

    The latest AI news we announced in March

    Start building with Gemini 2.5 Pro.

    Start building with Gemini 2.5 Pro.

    How Enterprise General Intelligence (EGI) Will Form a New Business Imperative

    How Enterprise General Intelligence (EGI) Will Form a New Business Imperative