Sunday, May 25, 2025

Browser Automation with Google Gemini Model to run UI Tests

Browser Automation Using Google Gemini to run UI Test

🔧 Prerequisites

  • Python 3.8 or above
  • Visual Studio Code or any Python IDE
  • Internet connection to download models and dependencies

📦 Installation Steps

1. Install browser-use

pip install browser-use

GitHub: https://github.com/browser-use/browser-use

2. Install Chromium browser with Playwright

playwright install chromium --with-deps --no-shell

3. Add your API keys to a .env file

OPENAI_API_KEY=your_openai_key_here

(Gemini/Gemma public model doesn't require a token.)

🤖 Example: AI Agent Running a Selenium Script

This example uses Gemini model to generate a Selenium script that runs a test in headful mode and interacts with your blog.

from smolagents import CodeAgent, DuckDuckGoSearchTool, InferenceClientModel
agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=InferenceClientModel(model_id="google/gemma-2b-it"),
    additional_authorized_imports=[
        "selenium",
        "selenium.webdriver",
        "selenium.webdriver.common.by",
        "selenium.webdriver.common.keys",
        "selenium.webdriver.support.ui",
        "selenium.webdriver.support.expected_conditions",
        "webdriver_manager.chrome"
    ]
)
agent.run("write selenium code to test to run in headful mode for 
https://softwaretesting-guideline.blogspot.com/ and click on different posts")

The resulting execution demonstrates how the AI agent parses the site's DOM, 
detects all hyperlink elements, and programmatically simulates user interactions 
by clicking on various blog post links.
Interactions by clicking on various blog post links

✅ Benefits

  • No manual code writing — just describe your intent
  • Perfect for test engineers and QA automation
  • Supports flexible tools and models

⚠️ Tips

  • Make sure Playwright is installed correctly and compatible with your OS
  • If using OpenAI or HuggingFace models, store API keys securely

🎯 Conclusion

By combining SmolAgents, Gemini model, and Selenium, you can create powerful browser-based test automation flows with minimal effort. AI-powered agents are the future of QA automation!

Happy Testing 🚀

No comments:

Post a Comment

Thanks for your comment..! Keep commenting for more and more updates. To get post updates subscribe Blog or become a follower of this blog. Thanks Again..!