🤖 AI Summary
Googleは、Gemini AIモデルの開発を進めた一方で、利用条件が厳しいという課題に直面していました。そこで、新たなオープンソースモデル「Gemma 4」を発表し、Apache 2.0ライセンスを採用しました。
Gemma 4は、地元での使用に最適化された4つのサイズがあり、大規模な26B Mixture of Expertsと31B Denseの2つは、NVidia H100 GPUで動作します。このGPUは約2万ドルかかるものの、ローコストなGPUでも精度を維持することができます。
また、モバイル向けに最適化されたEffecive 2B(E2B)とEffective 4B(E4B)も提供され、低消費電力を保ちつつ処理能力を向上させています。これらのモデルはスマートフォンやRaspberry Piなどのデバイスにも対応しています。
GoogleはApache 2.0ライセンスを選択し、「データ、インフラストラクチャ、モデルの完全な制御権」をユーザーに提供すると主張しています。
この発表について、Hugging Faceの共同創業者でCEOのClement Delangueは「大きな里程碑」と評価しました。Gemma 4とその関連技術群である「Gemmaverse」が開発に広く活用されることを期待しています。
An anonymous reader quotes a report from Ars Technica: Google's Gemini AI models have improved by leaps and bounds over the past year, but you can only use Gemini on Google's terms. The company's Gemma open-weight models have provided more freedom, but Gemma 3, which launched over a year ago, is getting a bit long in the tooth. Starting today, developers can start working with Gemma 4, which comes in four sizes optimized for local usage. Google has also acknowledged developer frustrations with AI licensing, so it's dumping the custom Gemma license.
Like past versions of its open-weight models, Google has designed Gemma 4 to be usable on local machines. That can mean plenty of things, of course. The two large Gemma variants, 26B Mixture of Experts and 31B Dense, are designed to run unquantized in bfloat16 format on a single 80GB Nvidia H100 GPU. Granted, that's a $20,000 AI accelerator, but it's still local hardware. If quantized to run at lower precision, these big models will fit on consumer GPUs. Google also claims it has focused on reducing latency to really take advantage of Gemma's local processing. The 26B Mixture of Experts model activates only 3.8 billion of its 26 billion parameters in inference mode, giving it much higher tokens-per-second than similarly sized models. Meanwhile, 31B Dense is more about quality than speed, but Google expects developers to fine-tune it for specific uses.
The other two Gemma 4 models, Effective 2B (E2B) and Effective 4B (E4B), are aimed at mobile devices. These options were designed to maintain low memory usage during inference, running at an effective 2 billion or 4 billion parameters. Google says the Pixel team worked closely with Qualcomm and MediaTek to optimize these models for devices like smartphones, Raspberry Pi, and Jetson Nano. Not only do they use less memory and battery than Gemma 3, but Google also touts "near-zero latency" this time around.
The Apache 2.0 license is much more flexible with its terms of use for commercial restrictions, "granting you complete control over your data, infrastructure, and models," says Google.
Clement Delangue, co-founder and CEO of Hugging Face, called it "a huge milestone" that will help developers use Gemma for more projects and expand what Google calls the "Gemmaverse."
Read more of this story at Slashdot.