AI
Dec 10, 2024

Introducing Amazon Nova: A New Generation of Foundation Models

Amazon Nova Models Expand the Growing Selection of the Broadest and Most Capable Foundation Models in Amazon Bedrock for Enterprise Customers

Today, at AWS re:Invent, Amazon.com Inc. (NASDAQ: AMZN) announced Amazon Nova, a new generation of state-of-the-art foundation models (FMs) that deliver exceptional intelligence, unmatched multimodal capabilities, and industry-leading price performance. These models are now available through Amazon Bedrock, Amazon’s fully managed service for foundation models. The Amazon Nova suite includes four understanding models:

WHY THIS MATTERS? Amazon's Bedrock, an AI foundation model for businesses, gets Nova foundation models to help create and understand text, images, and video.
  • Amazon Nova Micro: A text-to-text model delivering ultra-low latency at minimal cost.
  • Amazon Nova Lite: A multimodal model for processing text, images, and videos with a focus on speed and affordability.
  • Amazon Nova Pro: A highly capable multimodal model combining accuracy, speed, and cost efficiency for a wide range of tasks.
  • Amazon Nova Premier: The most advanced multimodal model for complex reasoning tasks, set for release in early 2025.

Additionally, Amazon introduced two creative models:

  • Amazon Nova Canvas, which generates studio-quality images, and
  • Amazon Nova Reel, designed for professional-grade video generation.

“Inside Amazon, we have about 1,000 generative AI applications in motion, and we’ve had a bird’s-eye view of what application builders are still grappling with,” said Rohit Prasad, SVP of Amazon Artificial General Intelligence. “Our new Amazon Nova models are intended to help with these challenges for internal and external builders, providing compelling intelligence and content generation while also delivering meaningful progress on latency, cost-effectiveness, customization, Retrieval Augmented Generation (RAG), and agentic capabilities.”

State-of-the-Art Intelligence and Performance Benchmarks

The Amazon Nova models represent a significant leap in AI capabilities, with each designed to deliver exceptional performance in its respective category.

Amazon Nova Micro

This text-only model offers unparalleled speed, generating 210 tokens per second—an industry leader in output speed. Nova Micro has been benchmarked extensively and performed equal to or better than Meta’s LLaMa 3.1 8B on 11 tasks and Google’s Gemini 1.5 Flash-8B on 12 tasks. With its low-cost design and ultra-fast responses, Nova Micro is ideal for use cases requiring real-time performance, such as customer service chatbots and dynamic text generation.

Amazon Nova Lite

Nova Lite excels as a multimodal model that balances affordability with functionality. Benchmarks show that it performs equal to or better than OpenAI’s GPT-4o mini on 17 of 19 tasks, Google’s Gemini 1.5 Flash-8B on 17 of 21 tasks, and Anthropic’s Claude Haiku 3.5 on 10 of 12 tasks. In addition to strong text comprehension, Nova Lite leads in understanding videos, charts, and documents, as demonstrated in VATEX, ChartQA, and DocVQA benchmarks. It also excels in agentic workflows, ranking highly on the Berkeley Function Calling Leaderboard and Mind2Web for multimodal agentic capabilities.

Amazon Nova Pro

The Pro model is designed for high-performance multimodal applications, delivering leading results in complex workflows and instruction-following tasks. Nova Pro surpassed OpenAI’s GPT-4o in 17 of 20 benchmarks, Google’s Gemini 1.5 Pro in 16 of 21 benchmarks, and Anthropic’s Claude Sonnet 3.5v2 in 9 of 20 benchmarks. Nova Pro’s strength lies in its ability to perform advanced tasks, such as comprehensive RAG (Retrieval Augmented Generation) workflows and multimodal reasoning, making it ideal for enterprise use cases across finance, healthcare, and media.

Video source: https://www.youtube.com/@amazonwebservices

Unmatched Multimodal and Language Support

Amazon Nova models are engineered to support over 200 languages, ensuring global accessibility and usability. Their context-handling capabilities are among the best in the industry:

  • Amazon Nova Micro supports a context length of 128K tokens.
  • Amazon Nova Lite and Amazon Nova Pro handle up to 300K tokens, allowing for the processing of 30 minutes of video or extensive textual analysis.
  • By early 2025, context lengths will expand to 2M tokens, enabling more comprehensive and large-scale data handling.

Cost Efficiency and Seamless Integration

Amazon Nova models have been designed to deliver unmatched price performance, with costs reduced by up to 75% compared to leading alternatives in their respective categories. These models are fully integrated with Amazon Bedrock, which allows customers to:

  • Experiment with and evaluate Nova models alongside other leading FMs.
  • Fine-tune models using proprietary data for improved accuracy.
  • Leverage distillation techniques to train smaller, efficient models.

Creative Capabilities with Nova Canvas and Nova Reel

Amazon Nova Canvas

This state-of-the-art image generation model allows users to create studio-quality visuals from text or image inputs. Canvas also offers advanced editing features, such as inpainting, background removal, and color adjustments. Built-in controls, including watermarking and content moderation, ensure responsible AI use. In evaluations, Canvas outperformed OpenAI’s DALL-E 3 and Stable Diffusion in both human evaluations and automated metrics.

Amazon Nova Reel

Designed for video generation, Nova Reel simplifies the process of creating professional-grade six-second videos using natural language prompts. It supports controls for pacing, camera motion, and style customization. Nova Reel outperformed Runway’s Gen-3 Alpha in third-party evaluations and will soon support video generation up to two minutes in length.

Future Developments in 2025

Amazon plans to expand the Nova lineup with the following models:

  • Speech-to-Speech Model: Launching in Q1 2025, this model will enable natural language conversations with advanced verbal and non-verbal understanding.
  • Multimodal-to-Multimodal Model: Set for mid-2025, this model will process and generate outputs across text, image, video, and audio, streamlining any-to-any modality workflows.

Early Adoption and Industry Use Cases

Several leading organizations are already leveraging Nova’s capabilities:

  • SAP is integrating Nova models into its AI copilot, Joule, to automate workflows and enhance personalization.
  • Deloitte is using Nova for customized generative AI solutions across industries.
  • Musixmatch incorporates Nova Reel to help emerging artists create music videos based on song context.
  • Shutterstock is utilizing Nova Canvas in its AI Image Generator for high-quality visuals.

Commitment to Responsible AI

Amazon Nova models are developed with built-in safety measures, such as watermarking, content moderation, and AWS AI Service Cards, which provide transparency on use cases, limitations, and best practices for ethical AI deployment.