Amazon Explains How It Will Make AI More Efficient and Affordable

At AWS: re-Invent, Amazon tips a slew of upgrades in everything from storage and databases to new computing chips and various AI tools, mostly aimed at reducing cost and complexity.

Watching sessions from last week's AWS: re-Invent conference, what stood out was Amazon's insistence that AI, rather than something that stands on its own, is fast becoming part of applications, meaning developers need to focus on things like cost and efficiency.

"My view is generative AI inference is going to be a core building block for every single application," said Matt Garman, CEO of Amazon Web Services. "In fact, I think generative AI actually has the potential to transform every single industry, every single company out there, every single workflow out there, every single user experience out there."

To that end, Garman announced a slew of upgrades in everything from storage and databases to new computing chips and various AI tools, mostly aimed at reducing cost and complexity.

Unsurprisingly, generative AI tools received the most attention. Garman pushed Bedrock, the company's AI platform, saying: "Every application is going to use inference in some way to enhance or build or really change an application."

I was impressed by the addition of model distillation features in Bedrock. This lets you use prompts and the output of a very large model to train a much smaller model that covers only a specific subject matter that is much smaller and much cheaper to run. Garman said such models can be 500% faster and 75% less expensive.

Other features he mentioned included better guardrails and security, including a preview of a new automated reasoning system that is supposed to prove a system is working the way it's intended to, thus preventing hallucinations. Other new tools in Bedrock include improved retrieval-augmented generation (RAG) tools, including using better tools for ingesting and evaluating knowledge bases.

Agents are getting a lot of attention this year - see Microsoft Ignite - and Garman talked about a preview version of new agent services, including multi-agent collaboration and orchestration. We are in "the earliest days of generative AI," he said.

That point was reinforced by Amazon CEO Andy Jassy, who took the stage to talk about customer experiences and to announce new models.

"The most success that we've seen from companies everywhere in the world is in cost avoidance and productivity," Jassy noted. "But you also are starting to see completely reimagined and reinvented customer experiences."

He pointed to internal Amazon applications, including customer service chatbots that know who you are and what your ordered. This has resulted in a 500 basis point improvement in customer satisfaction, along with 25% faster processing times, and he said, 25% lower costs.

Other internal applications Jassy discussed included "Sparrow," a robotic system that picks up and moves items from containers into specific customer totes; and "Rufus," which lets you ask questions on any product detail page. Overall, he said, the company has over 1,000 generative AI applications deployed or in development.

While praising Anthropic and its Claude models, Jassy said there will "never be one tool to rule the world," and that "choice matters when it comes to model selection." In that vein, he introduced a new family of "frontier" models, in what Amazon is calling its Nova family.

These include a text-only Micro version and three multi-modal models - Lite, Pro, and Premier. The first three are available now, with Premier due in the first quarter of 2025. He also mentioned Nova Canvas for image generation and Nova Reel for video generation and said Amazon is working on a speech-to-speech model and an any-to-any multimodal model for 2025. (Below, Amazon Nova Reel transforms a single image input into a brief video with the prompt: dolly forward.)

Jassy said the new models have low latency and are deeply integrated with Bedrock features such as fine-tuning and should be 75% less expensive to run than the other leading models in Bedrock - a big step if all other things are equal.

The push toward integrating AI into general applications was brought home by changes to SageMaker, which started life as a tool mainly for training AI models but has now been turned into a unified platform for data, analytics, and AI. The new SageMaker Unified Studio brings together a bunch of various "studios," query editors, and visual tools that were separate products. A new SageMaker Lakehouse is designed to unify your view into storage across multiple data lakes, data warehouses and third-party data sources, for analytics and AI/machine learning. (The previous SageMaker is now rebranded as SageMaker AI.)

In terms of AI for code generation, AWS announced autonomous Q Developer agents for generating code tests, documentation, and code reviews, and previews of agents to transform .NET applications from Windows to Linux, and to migrate VMware instances. On the latter, Garman said the agent could move applications four times better than manual methods, and that the move could result in a savings of 40%.

For running models, Garman announced the general availability of instances based on the first Trainium 2 chips, as well as an EC2 instance with Trainium 2 UltraServers, which connect four nodes into a 64-chip node with 83 petaflops. This chip will be used for both training and inferencing on AI models; Garman joked that "naming is not always perfect for us." But he said this would offer 30 to 40% better price performance than current-generation, GPU-based instances. (As always, I take the performance and price/performance claims from vendors with a grain of salt - your mileage may vary.)

Trainium 3 is coming next year; it will be the first AWS chip on 3nm, with twice the computing power of this year's version and 40% more efficient.

For more traditional computing applications, Garman talked about how Amazon has been developing its own Graviton CPU chips since 2018. Today, Graviton handles as much compute as all of AWS delivered -- x86 and Arm -- in 2019. Graviton provides 40% better price performance than traditional x86 (Intel and AMD) server chips, while using 60% less energy.

"Graviton is widely used by almost every AWS customer," Garman said, with 90% of AWS's top 1,000 customers using it.

He announced a new Graviton 4 chip, which can address more applications. This will be 30% faster per core, but also has three times the number of CPU cores and memory. On database applications, it will be 40% faster, with up to 45% gains on large Java applications.

In storage and databases, Garman announced new S3 tables for supporting Apache Iceberg and an S3 Metadata service. But I was particularly impressed by Aurora DSQL, which he described as the fastest distributed SQL database, a multi-region, serverless service that is still mostly PostgreSQL-compatible. This is designed for always-available applications and builds on a time-sync service Amazon discussed last year and a new transaction engine. He said it's up to four times faster than Google's competing Spanner database.

APK Oasis

Amazon Explains How It Will Make AI More Efficient and Affordable

POPULAR CATEGORY