Welcome to IT-Branschen – The Channel for IT News, Cybersecurity and Digital Trends

For Companies, Suppliers and Decision Makers in the IT Industry

Digital strategy and insights for decision-makers in the IT industry

Subscribe

Stay up to date with the most important news

By pressing the Subscribe button, you confirm that you have read and agree to our privacy policy and terms of use
Contact us

Discover Gemini: Our Most Powerful and Advanced AI Model Yet

Google Gemini Google Gemini
Google Gemini

Every technological shift is an opportunity to advance scientific discovery, accelerate human progress, and improve lives. I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far greater than the transition to mobile or the web before it. AI has the potential to create opportunities—from the mundane to the extraordinary—for people everywhere. It will bring new waves of innovation and economic progress, driving knowledge, learning, creativity, and productivity on a scale never before seen.

That's what excites me: the chance to make AI useful for everyone, everywhere in the world.

Almost eight years into our journey as an AI-first company, just accelerates the pace: Millions of people are now using generative AI in our products to do things they couldn't even a year ago, from finding answers to more complex questions to using new tools to collaborate and create. At the same time, developers are using our models and infrastructure to build new generative AI applications, and startups and enterprises around the world are growing with our AI tools.

Advertisement

This is incredible speed, and yet we are only beginning to scratch the surface of what is possible.

We approach this work boldly and responsibly. That means being ambitious in our research and pursuing the capabilities that will bring enormous benefits to people and society, while building in safeguards and working in partnership with governments and experts to manage risks as AI becomes more capable. And we continue to invest in the very best tools, foundational models and infrastructure and bring them to our products and to others, guided by our AI principles .

Now we’re taking the next step in our journey with Gemini, our most capable and versatile model yet, with state-of-the-art performance across many leading benchmarks. Our first release, Gemini 1.0, is optimized for different sizes: Ultra, Pro, and Nano. These are the first models of the Gemini era and the first realization of the vision we had when we founded Google DeepMind earlier this year. This new era of models represents one of the biggest scientific and engineering efforts we’ve ever made as a company. I’m truly excited about what’s ahead, and for the possibilities Gemini will unlock for people everywhere.

Introducing Gemini

By Demis Hassabis, CEO and co-founder of Google DeepMind, on behalf of the Gemini team

AI has been the focus of my life's work, as it has been for many of my research colleagues. Ever since I programmed AI for computer games as a teenager, and during my years as a neuroscientist trying to understand how the brain works, I have always believed that if we could build smarter machines, we could harness them to benefit humanity in incredible ways.

This promise of a world responsibly empowered by AI continues to drive our work at Google DeepMind. We have long wanted to build a new generation of AI models, inspired by how humans understand and interact with the world. AI that feels less like smart software and more like something useful and intuitive—an expert helper or assistant.

Today we are one step closer to this vision when we introduce Gemini , the most capable and general model we've ever built.

Gemini is the result of large-scale collaborations from teams at Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, meaning it can generalize and seamlessly understand, operate across, and combine different types of information including text, code, audio, picture and video.

Introducing Gemini: our largest and most capable AI model

Gemini is also our most flexible model yet – efficiently running on everything from data centers to mobile devices. Its cutting-edge capabilities will significantly improve how developers and enterprise customers build and scale with AI.

We have optimized Gemini 1.0, our first version, for three different sizes:

  • Gemini Ultra — our largest and most capable model for highly complex tasks.
  • Gemini Pro — our best model for scaling across a wide range of tasks.
  • Gemini Nano — our most efficient model for on-device tasks.

State-of-the-art performance

We have rigorously tested our Gemini models and evaluated their performance on a variety of tasks. From natural image, audio, and video understanding to mathematical reasoning, Gemini Ultra's performance surpasses current state-of-the-art results on 30 of the 32 widely used academic benchmarks used in research and development of large language models (LLMs).

With a score of 90.0 %, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitasking language comprehension), which uses a combination of 57 subjects such as mathematics, physics, history, law, medicine, and ethics to test both world knowledge and problem-solving skills.

Our new MMLU benchmark allows Gemini to use their reasoning abilities to think more carefully before answering difficult questions, leading to significant improvements over just using their first impression.

Gemini outperforms state-of-the-art performance on a range of benchmarks including text and encoding.

Gemini Ultra also achieves a state-of-the-art rating of 59.4 % on the new MMMU- the benchmark, which consists of multimodal tasks spanning different domains that require conscious reasoning.

In the imaging benchmarks we tested, the Gemini Ultra outperformed previous state-of-the-art models, without the help of optical character recognition (OCR) systems that extract text from images for further processing. These benchmarks highlight Gemini's native multimodality and indicate early signs of Gemini's more complex reasoning capabilities.

See more information in our technical report for Gemini .

Gemini outperforms state-of-the-art performance on a range of multimodal benchmarks.

Next-generation features

Until now, the standard approach to creating multimodal models has involved training separate components for different modalities and then stitching them together to roughly mimic some of this functionality. These models can sometimes be good at performing certain tasks, such as describing images, but struggle with more conceptual and complex reasoning.

We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about any type of input from the ground up, far better than existing multimodal models – and its capabilities are state-of-the-art in almost every domain.

Learn more about Gemini's possibilities and see how it works .

Sophisticated reasoning

Gemini 1.0's sophisticated multimodal reasoning capabilities can help you understand complex written and visual information, making it uniquely adept at uncovering knowledge that can be difficult to discern among large amounts of data.

Its remarkable ability to extract insights from hundreds of thousands of documents by reading, filtering and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance.

Gemini unlocks new scientific insights

Understand text, images, audio and more

Gemini 1.0 was trained to recognize and understand text, images, audio, and more all at once, so it can better understand nuanced information and answer questions about complex topics. This makes it particularly good at explaining reasoning in complex subjects like math and physics.

The twins explain reasoning in mathematics and physics

Advanced coding

Our first version of Gemini can understand, explain, and generate high-quality code in the world's most popular programming languages, including Python, Java, C++, and Go. Its ability to work across languages and reason around complex information makes it one of the leading foundational models for coding in the world.

Gemini Ultra excels in several encoding benchmarks, including HumanEval , a major industry standard for evaluating performance on coding tasks, and Natural2Code, our internal dataset, which uses author-generated sources instead of web-based information.

Gemini can also be used as an engine for more advanced coding systems. Two years ago we introduced AlphaCode , the first AI code generation system to reach a competitive level of performance in programming competitions.

Using a specialized version of Gemini, we created a more advanced code generation system, AlphaCode 2 , which excels at solving competitive programming problems that go beyond coding to involve complex mathematics and theoretical computer science.

Gemini excels at coding and competitive programming

When evaluated on the same platform as the original AlphaCode, AlphaCode 2 shows massive improvements, solving nearly twice as many problems, and we estimate that it outperforms 85 % of the competitors – up from nearly 50 % for AlphaCode. When programmers collaborate with AlphaCode 2 by defining certain properties for the code examples to follow, it performs even better.

We’re excited that programmers are increasingly using highly capable AI models as collaboration tools that can help them reason about problems, suggest code designs, and assist with implementation – so they can release apps and design better services, faster.

See more information in our AlphaCode 2 technical report. .

More reliable, scalable and efficient

We trained Gemini 1.0 at scale on our AI-optimized infrastructure using Google's custom-designed Tensor Processing Units (TPU) v4 and v5e. And we designed it to be our most reliable and scalable model to train, and our most efficient to operate.

On TPUs, Gemini runs significantly faster than before, smaller and less capable models. These specially designed AI accelerators have been at the heart of Google's AI-powered products that serve billions of users like Search, YouTube, Gmail, Google Maps, Google Play and Android. They have also enabled companies around the world to train large-scale AI models cost-effectively.

Today we announce the most powerful, efficient and scalable TPU system to date, Cloud TPU v5p , designed to train cutting-edge AI models. This next-generation TPU will accelerate Gemini’s development and help developers and enterprise customers train large-scale generative AI models faster, enabling new products and opportunities to reach customers faster.

Built with responsibility and safety at the heart

At Google, we are committed to developing bold and responsible AI in everything we do. Building on Google’s AI principles and robust security policies in our products, we are adding new protections to account for Gemini's multimodal capabilities. At every stage of development, we consider potential risks and work to test and mitigate them.

Gemini has the most comprehensive safety evaluations of all Google AI models to date, including for bias and toxicity. We have conducted new research on potential risk areas such as cybercrime, persuasion, and autonomy, and has applied Google Research's best-in-class adversarial testing techniques to help identify critical security issues before Gemini's implementation.

To identify blind spots in our internal evaluation methodology, we work with a diverse group of external experts and partners to stress test our models across a range of issues.

To diagnose content security issues during Gemini's training phases and ensure its results adhere to our policies, we use benchmarks such as Real Toxicity Prompts , a set of 100,000 prompts of varying degrees of toxicity taken from the web, developed by experts at the Allen Institute for AI. More information about this work will be coming soon.

To limit harm, we built dedicated safety classifiers to identify, label, and filter out content that involves violence or negative stereotypes, for example. Combined with robust filters, this layered approach is designed to make Gemini safer and more inclusive for everyone. Additionally, we continue to address known challenges for models like fact, grounding, attribution, and confirmation.

Responsibility and safety will always be central to the development and deployment of our models. This is a long-term commitment that requires collaboration, so we are working with the industry and the wider ecosystem to define best practices and set safety and security benchmarks through organizations such as MLCommons , Frontier Model Forum and its AI Safety Fund , and our Secure AI Framework (SAIF) , which was designed to help mitigate security risks specific to AI systems in the public and private sectors. We will continue to collaborate with researchers, governments, and civil society groups around the world as we develop Gemini.

Making Gemini available to the world

Gemini 1.0 is now rolling out across a range of products and platforms:

Gemini Pro in Google products

We bring Gemini to billions of people through Google products.

Starting today Bard will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, comprehension, and more. This is the biggest upgrade to Bard since its launch. It will be available in English in more than 170 countries and territories, and we plan to expand to different modalities and support new languages and locations in the near future.

We also take Gemini to Pixel . The Pixel 8 Pro is the first smartphone designed to run Gemini Nano, which powers new features like Summarize in the Recorder app and is rolling out Smart Reply in Gboard, starting with WhatsApp, Line, and KakaoTalk. 1 — with more messaging apps coming next year.

In the coming months, Gemini will be available in more of our products and services such as Search, Ads, Chrome, and Duet AI.

We've already started experimenting with Gemini in Search, where it makes our Search Generative Experience (SGE) faster for users, with a 40% reduction in latency in US English, along with improvements in quality.

Building with the Twins

Starting December 13, developers and enterprise customers can access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI .

Google AI Studio is a free, web-based developer tool for quickly prototyping and launching apps with an API key. When it's time for a fully managed AI platform, Vertex AI allows Gemini customization with full data control and leverages additional Google Cloud capabilities for enterprise security, safety, privacy, and data governance and compliance.

Android developers will also be able to build with Gemini Nano, our most efficient model for on-device tasks, via AICore, a new system feature available in Android 14, starting with Pixel 8 Pro devices. Sign up for a early preview of AICore .

Gemini Ultra coming soon

For Gemini Ultra, we are currently completing extensive trust and security checks, including red-teaming trusted external parties, and further refining the model using fine-tuning and reinforcement learning from human feedback (RLHF) before making it generally available.

As part of this process, we will make Gemini Ultra available to select customers, developers, partners, and security and accountability experts for early experimentation and feedback before rolling it out to developers and enterprise customers early next year.

Early next year we will also launch Bard Advanced , a new, groundbreaking AI experience that gives you access to our best models and capabilities, starting with Gemini Ultra.

The Gemini Era: Enabling a Future of Innovation

This is a major milestone in the evolution of AI and the start of a new era for us at Google as we continue to rapidly innovate and responsibly evolve the capabilities of our models.

We've made great progress with Gemini so far, and we're working hard to further expand its capabilities for future releases, including advances in planning and memory, and increasing the context window to process even more information to provide better answers.

We are excited about the amazing possibilities of a world responsibly empowered by AI – a future of innovation that will boost creativity, expand knowledge, advance science, and transform how billions of people live and work around the world.

Stay up to date with the most important news

By pressing the Subscribe button, you confirm that you have read and agree to our privacy policy and terms of use
Advertisement