Gemini 3: Meilleur modèles que Sonnet 4.5 ? (tests cas réels)

www.youtube.com

Brief Summary

This video evaluates the capabilities of Gemini 3 Pro in a large codebase, comparing it with other models like Sonic 4.5 and GPT-5 Codex. The tests involve feature implementation, bug fixing, and UI creation, assessing code quality, speed, and cost-effectiveness. Gemini 3 Pro shows promise, but requires human intervention for optimal results.

Gemini 3 Pro is evaluated for real-world coding tasks.
Comparison with Sonic 4.5 and GPT-5 Codex.
Feature implementation, bug fixing, and UI creation are tested.
Cost-effectiveness and speed are key metrics.
Human intervention is still necessary for optimal results.

Introduction & objectif de la vidéo

The video aims to assess the true capabilities of Gemini 3 Pro within a large codebase, focusing on its utility for developers in enterprise environments rather than simple landing page creation. The presenter contrasts the hype around Gemini's capabilities, often showcased through impressive but basic examples on Twitter, with the practical demands of complex application development. The goal is to determine if Gemini 3 Pro can handle feature implementation, coding tasks, and overall developer responsibilities effectively.

Présentation du challenge et prompts

The initial challenge involves enhancing an email editor within the Codeline application. The presenter wants to add a configuration feature that allows users to modify the layout of emails, such as changing background colors or adding card effects. A detailed prompt is created, instructing the AI to add a "mail config" parameter that accepts a JSON input to modify specific email parameters. The AI is tasked with creating an editor in the settings/mail config section, featuring a form on the left and a preview on the right. The prompt also specifies that the AI can modify Prisma files but should avoid certain actions. Three models—Gemini 3 Pro, Sonic 4.5, and GPT-5 Codex—are given the same prompt with a workflow that includes code analysis, detailed planning, feature coding, and running linters to correct errors.

Lancement des modèles et analyse

The three models—Gemini 3 Pro, Sonic 4.5, and GPT-5 Codex—begin processing the prompt. Gemini 3 Pro starts by exploring the codebase, focusing on the email rendering logic to understand how to implement the requested changes. It identifies the necessary parameters and creates a to-do list. Sonic 4.5 progresses similarly, while GPT-5 Codex is notably slower. Gemini 3 Pro attempts to update the Prisma configuration but encounters issues with the environment setup, leading to errors. The presenter intervenes to guide Gemini 3 Pro away from these initial errors, allowing it to focus on the primary feature implementation.

Résultats initiaux et tweets

Sonic 4.5 bypasses the Prisma migration and proceeds directly to coding, while Gemini 3 Pro initially struggles with environment configurations. The presenter shares tweets discussing Gemini's performance, noting that while Gemini excels in general benchmarks, it lags behind Claude in coding tasks. Cost comparisons reveal that Gemini Night 3 Pro is more expensive in both input and output than GPT-5. The presenter also expresses skepticism towards Open AI, aligning with a tweet criticizing their approach.

Première application des changements

Gemini 3 Pro modifies 12 files with 365 additions and 5 deletions, while Sonnet 4.5 modifies more files with 706 additions and 49 deletions. The presenter applies Gemini's changes and manually runs Prisma Migrate to add the new field. Gemini 3 Pro successfully adds the mail config, modifies the email rendering logic, and creates the configuration form. However, it fails to use the existing layout components, which is noted as an error. The presenter observes that Gemini only modified the test action, indicating a potential issue with the broader application of the changes.

Test UI: preview et bug live

The presenter tests the implemented feature in the UI. The email configuration appears in the settings, displaying a preview of the email. The border radius and colors can be changed, but the live update of the preview does not work, and the changes are not saved upon refreshing the page. This indicates a problem with the form's functionality. The presenter decides to report the issue to Gemini to see if it can correct its own errors, which is considered a critical test of its intelligence.

Corrections et conflits de code

Gemini provides an update to address the form issue. After reapplying the changes, the form now saves the modifications upon refresh, although the live preview still doesn't work. The presenter notes that the email updates correctly. However, the presenter identifies that Gemini has not updated all necessary files, meaning the test email works, but regular email sending would not. The presenter rates Gemini's performance a 7/10, noting that human intervention is still needed to resolve the remaining issues. The presenter then undoes Gemini's changes to apply Sonic 4.5's code, leading to merge conflicts.

Hallucinations et erreurs critiques

After resolving the merge conflicts and applying Sonic 4.5's code, the presenter encounters a critical error: the mail config page results in a 404 error. It's discovered that Sonic 4.5's code causes a complete crash of the page due to a React error with form watching. Additionally, Sonic 4.5 incorrectly uses org action, which is a fundamental error. The presenter sends the React error to Sonic 4.5 for correction. Sonic 4.5 attempts to fix the issues by adding use callback, but this doesn't resolve the underlying problems. The presenter expresses frustration, noting that Sonic 4.5 has hallucinated an export that doesn't exist and failed to analyze the code properly.

Ajustements et rendu d'e‑mail

Despite the initial failures, Sonic 4.5 eventually provides a functional color editor with a live update. The presenter can now select colors and see the changes in real-time. However, the border radius setting still doesn't work in the preview. After saving the changes and sending a test email, the presenter confirms that the border radius works correctly in the actual email. The presenter also experiments with different color combinations to improve the email's aesthetics. The presenter asks Sonic 4.5 to correct the border radius issue, and after another update, the border radius functionality is restored.

Préparation du test image centrée

With the email configuration feature mostly functional, the presenter transitions to a simpler bug-fixing task: centering an image in the email editor. This test aims to assess how quickly and cleanly the models can correct a straightforward UI issue. The presenter prepares to use Gemini Nightra and Sonic 4.5 for this task, sidelining Codex due to its poor performance in previous tests.

Débogage: image centrée (test)

The presenter initiates a new test to address an image alignment issue in the email editor. The image is not centered, and the goal is to see which model can quickly and effectively fix this bug. Sonic 4.5 completes the task first, followed by Gemini. However, after applying Sonic 4.5's suggested fix, the image remains uncentered, marking it as a failure. Gemini's code is then applied, but it also fails to resolve the issue. The presenter identifies that a global CSS rule is causing a margin issue.

Test couleurs suggérées et vitesse

The presenter initiates a test to add suggested color options to the color picker in the email editor. Gemini Nightra and Sonic 4.5 are compared based on speed and UI implementation. Sonic 4.5 completes the task in 0.47 seconds, while Gemini finishes in 0.50 seconds. Both models successfully add the suggested colors, but the presenter prefers Sonic 4.5's UI.

Création de la page About

The presenter tests the creative capabilities of Gemini and Sonic by tasking them with creating an "About" page for the application. The models are instructed to use information about the founder and follow the existing UI style. Gemini and Sonic both complete the task in approximately one minute and 28 seconds. Gemini's implementation includes a double footer, which is noted as an error, but otherwise reuses existing components effectively. Sonic's attempt does not create a functional "About" page and includes nonsensical content and a duplicated footer.

Coûts et bilan d'utilisation

The presenter reviews the costs associated with using the different models. Claude 4.5 Sonnet cost $4.12, GPT 5.1 cost $2, and Gemini 3 Pro Preview cost $2.44. Gemini is noted to be significantly cheaper than Claude Sonnet. The presenter concludes that Gemini is a compelling alternative to GPT and potentially to Sonic, especially considering its cost-effectiveness.

Conclusion et recommandations

The presenter summarizes the performance of Gemini 3 Pro and Sonic 4.5 across various tasks. Gemini shows promise and is cost-effective, but requires human intervention for optimal results. The presenter emphasizes that the choice of tool depends on the specific task and coding environment.

11/19/2025 www.youtube.com

Gemini 3: Meilleur modèles que Sonnet 4.5 ? (tests cas réels)

Brief Summary

Summarize Anything ! Download Summ App