Sunday, May 25, 2025

Browser Automation with Google Gemini Model to run UI Tests

Browser Automation Using Google Gemini to run UI Test

🔧 Prerequisites

  • Python 3.8 or above
  • Visual Studio Code or any Python IDE
  • Internet connection to download models and dependencies

📦 Installation Steps

1. Install browser-use

pip install browser-use

GitHub: https://github.com/browser-use/browser-use

2. Install Chromium browser with Playwright

playwright install chromium --with-deps --no-shell

3. Add your API keys to a .env file

OPENAI_API_KEY=your_openai_key_here

(Gemini/Gemma public model doesn't require a token.)

🤖 Example: AI Agent Running a Selenium Script

This example uses Gemini model to generate a Selenium script that runs a test in headful mode and interacts with your blog.

from smolagents import CodeAgent, DuckDuckGoSearchTool, InferenceClientModel
agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=InferenceClientModel(model_id="google/gemma-2b-it"),
    additional_authorized_imports=[
        "selenium",
        "selenium.webdriver",
        "selenium.webdriver.common.by",
        "selenium.webdriver.common.keys",
        "selenium.webdriver.support.ui",
        "selenium.webdriver.support.expected_conditions",
        "webdriver_manager.chrome"
    ]
)
agent.run("write selenium code to test to run in headful mode for 
https://softwaretesting-guideline.blogspot.com/ and click on different posts")

The resulting execution demonstrates how the AI agent parses the site's DOM, 
detects all hyperlink elements, and programmatically simulates user interactions 
by clicking on various blog post links.
Interactions by clicking on various blog post links

✅ Benefits

  • No manual code writing — just describe your intent
  • Perfect for test engineers and QA automation
  • Supports flexible tools and models

⚠️ Tips

  • Make sure Playwright is installed correctly and compatible with your OS
  • If using OpenAI or HuggingFace models, store API keys securely

🎯 Conclusion

By combining SmolAgents, Gemini model, and Selenium, you can create powerful browser-based test automation flows with minimal effort. AI-powered agents are the future of QA automation!

Happy Testing 🚀

Build an AI Agent with SmolAgents to Control Selenium Browser Automation

AI Agent Controlling Browser Using Selenium with SmolAgents

Artificial Intelligence is rapidly transforming how we automate web tasks. In this tutorial, we will explore how to control a browser using Selenium through a Python AI agent created using SmolAgents. We'll also discuss how to handle import restrictions by allowing additional modules explicitly.

AI Agent Controlling browser using Selenium


🚀 What Is SmolAgents?

As I describe in my previous post, SmolAgents is a lightweight, open-source Python framework created by Hugging Face. It enables you to build smart autonomous agents that can interact with external tools, models, and libraries like selenium, duckduckgo, and more.

🧰 Prerequisites

  1. Python Installed (Recommended: 3.11 or 3.12) 
  2. VS Code or any Python IDE
  3. Install SmolAgents:
    python -m pip install "smolagents[openai]"
  4. Install Selenium & ChromeDriver Manager:
    pip install selenium webdriver-manager

⚠️ Issue with Unauthorized Imports

By default, SmolAgents restricts importing some external libraries such as selenium. This is for safety reasons. But in our use case, we can safely allow it using the additional_authorised_import parameter when initializing the agent.

💡 AI Agent Code Using Selenium

In the below code, we explicitly authorize the necessary Selenium imports and ask the agent to write code to open our blog https://softwaretesting-guideline.blogspot.com and click on different posts in headful mode.

from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel

agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=HfApiModel(),
    additional_authorised_import=[
        "selenium",
        "selenium.webdriver",
        "selenium.webdriver.common.keys",
        "webdriver_manager.chrome"
    ]
)

agent.run("write selenium code to test to run in headful mode for 
https://softwaretesting-guideline.blogspot.com/ and click on different posts")

📝 What This Code Does

  • Creates a CodeAgent with authorized imports
  • Uses a search tool for reasoning
  • Requests the AI to generate a Selenium script to interact with your blog

🧪 Sample Output

The agent typically returns code like the following:

from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
import time

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://softwaretesting-guideline.blogspot.com/")
time.sleep(5)

# Click on all post links
links = driver.find_elements(By.CSS_SELECTOR, "h3.post-title a")
for link in links:
    print("Opening post:", link.text)
    link.click()
    time.sleep(3)
    driver.back()
    time.sleep(2)

driver.quit()

📌 Key Points

  • Running in headful mode lets you visually see the browser actions
  • You can modify the logic to open specific blog posts or interact with elements like labels, buttons, or images
  • Use headless mode for automation pipelines by setting Chrome options accordingly

🔗 Resources

Conclusion: Using AI agents to control browsers opens up exciting new possibilities for intelligent test automation and web scraping. With SmolAgents and Selenium, you can offload even complex test tasks to an autonomous Python agent. Learn AI Agents with SmolAgents in Python for Web Automation

How to Use AI Agents with SmolAgents in Python for Web Automation and Search Tasks

Build a Python AI Agent to Control Browser Using SmolAgents

In this tutorial, we'll show how to build an AI agent in Python that uses SmolAgents to perform intelligent tasks like searching the web or controlling browsers using automation tools. We'll also walk you through a working example that uses DuckDuckGo to fetch a coupon code.

🧰 Prerequisites

  1. Install Python (3.11 or 3.12):
    Download from python.org and make sure to check the box "Add Python to PATH" before clicking "Install Now".
  2. Install Visual Studio Code (VS Code):
    Download and install from https://code.visualstudio.com/.
  3. Install SmolAgents Library:
    Open your terminal or command prompt and run:
    python -m pip install "smolagents[openai]"
  4. Optional - Install Browser Automation Tools:
    For browser interaction using Playwright:
    pip install browseruse
    And then install Playwright browsers:
    python -m playwright install
    Also install MCP server if required, based on the browser automation setup.

💡 What Are SmolAgents?

SmolAgents is a lightweight, open-source Python library created by Hugging Face that allows you to build autonomous AI agents with tools such as search engines, code execution, and browser automation.

🧪 Sample Code: Use SmolAgents to Get a Coupon Code

This example sets up a basic agent with a search tool to look for a coupon code on the website akrabtravel.com.

from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel
agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=HfApiModel()
)
agent.run("Get me the coupon code for flight booking on akrabtravel.com")

🔍 Explanation:

  • CodeAgent: This is the main agent type that coordinates tools and models to perform a task.
  • DuckDuckGoSearchTool: A tool that uses the DuckDuckGo search engine to fetch information from the internet.
  • HfApiModel: This uses Hugging Face's hosted models as the reasoning engine for the agent.
  • agent.run(): Executes the instruction provided and returns the result.

📌 Output

When run, the agent will intelligently search for available coupon codes related to the instruction provided and return a summary. This process is entirely autonomous and powered by AI.

🚀 Next Steps

  • Try integrating browser automation tools like browseruse or Playwright.
  • Connect with APIs to fetch and interact with real-time content.
  • Build more complex multi-step agents using SmolAgents tools like memory or planner agents.

🔗 Resources

Conclusion: AI agents are changing the way we interact with data, browsers, and the web. SmolAgents provides a minimal, open-source approach for building smart tools with Python. Explore it further and build your own personal AI assistant today! Also read the content understanding-AI-agents

Friday, May 23, 2025

Understanding AI Agents: Benefits, Use Cases, and Limitations

What is an AI Agent? Benefits and Drawbacks Explained

In the rapidly evolving world of Artificial Intelligence (AI), one term you might often hear is “AI agent.” But what exactly does it mean? How are these agents revolutionizing industries, and what are the potential downsides? Let’s break it down in this beginner-friendly guide.

🔍 What is an AI Agent?

An AI agent is an autonomous entity that perceives its environment through sensors and acts upon that environment using actuators to achieve specific goals. It uses algorithms—often machine learning or logic-based—to decide what actions to take.

In simpler terms, an AI agent is like a digital assistant that can observe, learn, and take action—without needing constant human input.

📌 Types of AI Agents

  • Simple Reflex Agents: Act solely on the current percept.
  • Model-based Reflex Agents: Maintain internal state from past actions.
  • Goal-based Agents: Take actions to achieve specific goals.
  • Utility-based Agents: Choose actions that maximize a utility function.
  • Learning Agents: Improve performance over time based on experience.

💡 Real-World Examples of AI Agents

  • 🤖 Chatbots – Like ChatGPT, they interact conversationally with users.
  • 🚗 Autonomous Vehicles – Make real-time driving decisions.
  • 📈 Recommendation Engines – Suggest products on Amazon or Netflix.
  • 📧 Email Filters – Automatically classify spam and priority messages.

✅ Benefits of AI Agents

  • Automation: Automate repetitive tasks, saving time and money.
  • Consistency: Perform tasks reliably without fatigue or distraction.
  • Scalability: Handle large volumes of data and interactions.
  • Real-Time Decision Making: React quickly to environmental changes.
  • Learning Ability: Improve performance with more data over time.

⚠️ Drawbacks of AI Agents

  • Bias: AI agents may inherit bias from training data.
  • Black-Box Decisions: Hard to understand how decisions are made.
  • Dependence: Over-reliance can reduce human skills.
  • Security Risks: Vulnerable to adversarial attacks.
  • Cost: Training AI agents can be expensive and complex.

🚀 Final Thoughts

AI agents are playing an increasingly crucial role in automating tasks, making intelligent decisions, and enhancing user experiences across industries. While they offer significant advantages, it’s important to stay aware of their limitations and ethical implications.

As technology evolves, so too will the capabilities—and responsibilities—of these intelligent agents.

Want more blogs like this? Explore our other articles on AI, software testing, and automation at Software Testing Guideline.