Thursday, January 23, 2025

WebDriver BiDi offers the best of both worlds in browser automation tools

Anyone testing web applications ought to be aware of a new browser automation protocol called WebDriver BiDi. This new protocol is an evolution of the original WebDriver standard and it incorporates some of the benefits of various other automation tools, most notably, adding bidirectional communication. 

“It’s a brand new protocol, and it’s taking all the best ideas that have been out there for a while and trying to standardize it through the W3C,” said David Burns, head of open source at BrowserStack (a browser testing company who is on the WebDriver BiDi working group) and chair of the Browser Testing and Tools Working Group at W3C, which is the group responsible for the WebDriver and WebDriver BiDi specifications. 

The original WebDriver protocol, or WebDriver Classic, is a “remote control interface that enables introspection and control of user agents,” according to its W3C definition. Essentially, it provides a way to remotely control the behavior of web browsers so that applications can be tested in them. 

However, this protocol only offers one-way communication, meaning that the client sends a request to the server, and the server can reply only to that one request, explained Puja Jagani, team lead at BrowserStack and a key code committer for the WebDriver BiDi project.

“The server cannot initiate communication with the client but can only respond. So if something of interest happens in the browsers it cannot communicate back to the client unless the client asks for it,” explained Jagani.

The BiDi in WebDriver BiDi stands for bidirectional communication, meaning that it actually allows events in the browser to stream back to the controlling software.

According to Jagani, because browsers are event-driven, it’s helpful for the browser to be able to share events back to the client when something interesting happens. 

For instance, with this new protocol, users can subscribe to the events created when a network request is sent to or from the browser, which enables them to monitor (or modify) all outgoing requests and incoming responses.

An example of this in action involves an application that is pointing to a production database in the cloud. When testing that application, WebDriver BiDi could be used to modify outgoing requests to point to a test database so that the production database isn’t flooded with test data.

“This is only possible with bidirectional communication. It is not possible without the W3C BiDi protocol,” said Jagani.

CDP vs WebDriver

The Chrome DevTools Protocol (CDP) and WebDriver Classic have historically been often compared because they are both low-level tools — tools that execute remote commands outside of the browser, such as opening multiple tabs or simulating device mode, Jecelyn Yeen, senior developer relations engineer for Chrome, and Maksim Sadym, software engineer at Google, explained in a blog post

High-level tools, by contrast, are those that execute commands within the browser. Examples of these include Puppeteer, Cypress, and TestCafe.

CDP does enable bidirectional communication, but it’s limited for testing purposes because it only works for Chromium-based browsers, like Google Chrome, and wouldn’t work in Firefox or Safari. According to Yeen and Sadym, “WebDriver BiDi aims to combine the best aspects of WebDriver ‘Classic’ and CDP.”

However, BrowserStack’s Burns emphasized that this new protocol isn’t intended to replace CDP, but rather it’s a new testing and automation protocol entirely. “CDP is always going to be there on Chromium browsers,” he said.

It already has browser support 

CDP’s creator, Google, is heavily involved in developing and supporting WebDriver BiDi, as is Mozilla. “We are glad that Mozilla and Google have come and helped us get it to that point where it’s standardized and now everyone can benefit from it,” Burns said. He added that Apple isn’t quite there yet, and it’s not clear at the moment when support for WebDriver BiDi will be available in WebKit-based browsers. 

“Sometimes standards can move at a glacial pace, and part of that is for good reason. It involves creating the collaboration points and getting consensus — and sometimes consensus can be really hard, especially where Google, Mozilla, and Apple, they have their own ideas of what makes something better, and so getting that can be really, really slow to implement,” Burns explained. 

Testing automation tools and testing companies have also started supporting it

In addition to the browsers needing to support it, another piece of the puzzle is getting the testing automation tools and testing providers on board. Fortunately, the automation tools Selenium and WebDriverIO, as well as the testing companies BrowserStack, SauceLabs, and LambdaTest, are all part of the WebDriver BiDi Working Group. 

WebdriverIO and Selenium already have some support for the new protocol, and BrowserStack supports it too. Selenium itself is also updating its entire implementation from WebDriver to WebDriver BiDi. Burns explained that retrofitting the classic version of WebDriver to BiDi is the last major piece of the process, and is expected to be complete within the next year. 

“It’s a volunteer-driven project, so this happens when everyone’s bandwidth and time matches, so it gets done in like spurts or chugs of work, right? But I think that’s how it is for open source development in general,” said Jagani, who is also a member of the Selenium Technical Leadership Committee.

She noted that by Selenium 5 (the current version is 4.24), the goal is to have at least the high-level APIs done, which cover a number of use cases, like giving the user the ability to listen to console logs and the ability to do basic authentication for their website, to name a couple.

Once Selenium 5 is out, the next goal will be to start transitioning commands one by one from WebDriver Classic to WebDriver BiDi. “Hopefully, by Selenium 6, we are BiDi only,” she said. She did go on to explain that it’s a long process with many external variables. Browsers are still in the process of implementing it, and once BiDi is in the stable version of the browser, that’s when Selenium comes in and can start implementing it. After that, there’s still a period where users will need to use it and give feedback so that Selenium can ensure its implementation is resilient.

Jagani said that the user experience should remain the same once Selenium is switched over to BiDi, and there won’t be a big breaking change. 

“That’s what Selenium tries to do — even from Selenium 3 to 4 — we try to make sure it’s a seamless integration with minimal breaking changes,” she said. “Selenium is very big on backwards compatibility as much as possible, or at least ensuring that we’re deprecating things as required so you know we are going to be removing it and giving sufficient warnings. That experience for users using WebDriver Classic would remain the same, because eventually it’ll be the same APIs, just using BiDi under the hood.”

To take advantage of the new advanced capabilities that BiDi brings, there will be newer APIs available, which will be similar to the ones users are already familiar with. 

Related Articles

Latest Articles