AI Agent Controlling Browser Using Selenium with SmolAgents
Artificial Intelligence is rapidly transforming how we automate web tasks. In this tutorial, we will explore how to control a browser using Selenium through a Python AI agent created using SmolAgents. We'll also discuss how to handle import restrictions by allowing additional modules explicitly.
🚀 What Is SmolAgents?
As I describe in my previous post, SmolAgents is a lightweight, open-source Python framework created by Hugging Face. It enables you to build smart autonomous agents that can interact with external tools, models, and libraries like selenium
, duckduckgo
, and more.
🧰 Prerequisites
- Python Installed (Recommended: 3.11 or 3.12)
- VS Code or any Python IDE
- Install SmolAgents:
python -m pip install "smolagents[openai]"
- Install Selenium & ChromeDriver Manager:
pip install selenium webdriver-manager
⚠️ Issue with Unauthorized Imports
By default, SmolAgents restricts importing some external libraries such as selenium
. This is for safety reasons. But in our use case, we can safely allow it using the additional_authorised_import
parameter when initializing the agent.
💡 AI Agent Code Using Selenium
In the below code, we explicitly authorize the necessary Selenium imports and ask the agent to write code to open our blog https://softwaretesting-guideline.blogspot.com
and click on different posts in headful mode.
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel
agent = CodeAgent(
tools=[DuckDuckGoSearchTool()],
model=HfApiModel(),
additional_authorised_import=[
"selenium",
"selenium.webdriver",
"selenium.webdriver.common.keys",
"webdriver_manager.chrome"
]
)
agent.run("write selenium code to test to run in headful mode for
https://softwaretesting-guideline.blogspot.com/ and click on different posts")
📝 What This Code Does
- Creates a
CodeAgent
with authorized imports - Uses a search tool for reasoning
- Requests the AI to generate a Selenium script to interact with your blog
🧪 Sample Output
The agent typically returns code like the following:
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
import time
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://softwaretesting-guideline.blogspot.com/")
time.sleep(5)
# Click on all post links
links = driver.find_elements(By.CSS_SELECTOR, "h3.post-title a")
for link in links:
print("Opening post:", link.text)
link.click()
time.sleep(3)
driver.back()
time.sleep(2)
driver.quit()
📌 Key Points
- Running in headful mode lets you visually see the browser actions
- You can modify the logic to open specific blog posts or interact with elements like labels, buttons, or images
- Use
headless mode
for automation pipelines by setting Chrome options accordingly
🔗 Resources
Conclusion: Using AI agents to control browsers opens up exciting new possibilities for intelligent test automation and web scraping. With SmolAgents and Selenium, you can offload even complex test tasks to an autonomous Python agent. Learn AI Agents with SmolAgents in Python for Web Automation
No comments:
Post a Comment
Thanks for your comment..! Keep commenting for more and more updates. To get post updates subscribe Blog or become a follower of this blog. Thanks Again..!