Software Testing: What Generative AI Can—and Can't—Do

12jW...q4Sz

9 Apr 2024

Software Testing: What Generative AI Can—and Can't—Do

Is ChatGPT coming to take automation engineers' jobs? The short answer is "maybe, but probably not."
The long answer involves understanding exactly what ChatGPT and other generative-AI tools can and can't do well within the context of software testing and test automation. As is the case in so many other domains, ChatGPT is capable of amazing things that may feel like a threat to traditional engineering jobs. But there are also many things it can't do—or, at least, do well. Automation engineers who want to use generative AI to do their job better—instead of being replaced by AI—must understand the difference.

To provide guidance, this article walks through the capabilities and limitations of ChatGPT as a tool for accelerating test automation workflows, highlighting where it shines and where it needs humans with domain-specific expertise to step in.

AI and Test Automation, Pre-ChatGPT

The idea of applying AI to software testing is not new. For years, various software testing and deployment platforms have offered AI-powered "bots" (Robo on Google Firebase is one popular example) that can navigate through applications, automatically decide what to test, and then run the tests.
These bots can help automation engineers to define test cases and execute tests. But they have clear shortcomings, such as trouble working past authentication walls within apps and a tendency to get stuck in infinite loops when performing testing.
For that reason, automation engineers have historically thought of AI as a tool that can reduce some of the toil and tedium required to carry out basic tests—but not as anything approaching a replacement for human testing experts.

How Good Is ChatGPT at Software Test Automation?

Generative-AI tools such as ChatGPT represent a new breed of solutions that offer much greater flexibility than traditional AI tools. They are also easy to work with using natural-language queries, making them more convenient than many conventional solutions are.
Does that mean that ChatGPT can excel where traditional software-testing tools fall short? To find out, let's walk through a real-world scenario that you might face as an automation engineer and see how well ChatGPT can handle it.
Imagine that you want to test the UI for a web application. If it's a browser-based app, you could use ChatGPT to help generate the automation script with a prompt such as the following:

Generate a Selenium script that visits kobiton.com, chooses the "Platform" menu option, and then clicks on "Real Device Testing."

ChatGPT would respond with a script such as this one I got from it:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Set up Chrome webdriver and navigate to kobiton.com
driver = webdriver.Chrome()
driver.get("https://kobiton.com/")

# Wait for the "Platform" menu option to be clickable and click on it
platform_menu = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//a[text()='Platform']"))
)
platform_menu.click()

# Wait for the "Real Device Testing" option to be clickable and click on it
real_device_testing = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, "//a[text()='Real Device Testing']"))
)
real_device_testing.click()

# Close the browser
driver.quit()

This script comes close to testing what the prompt asked for, but there are some issues. The biggest is that "Platform" is not actually a clickable element on kobiton.com (although you have to hover over it to make the "Real Device Testing" link available). ChatGPT presumably failed to detect this fact because it parsed kobiton.com in a simplistic way.

A human automation engineer would recognize this issue pretty easily, though, and could modify the Selenium script as needed. So, in this case, generative AI is capable of doing perhaps 80% of the work necessary to solve a test-automation challenge; an automation engineer with domain-specific knowledge would need to do the rest.

ChatGPT and Mobile Testing

To take the example further, imagine that instead of testing the UI for a website, we wanted to test the UI for a mobile app that runs on a specific device. This is where things get much harder for generative AI.
Because ChatGPT doesn't have access to a fleet of mobile devices, it can't parse mobile apps to write test cases in the way that it can do for websites. In other words, there's no way to say to ChatGPT, "Load my app on a Galaxy S23 and test the login screen." This is a task that automation engineers would have to handle mostly manually, with help from software-testing platforms that provide access to mobile devices.
On balance, there are ways that generative AI could assist engineers in this scenario. It could, for example, review the automation scripts they write and suggest additional test cases they may have overlooked—such as testing for scenarios where users leave passwords blank (rather than only testing for when users enter the wrong password).
But in this situation, ChatGPT can handle only a small percentage of the work required to meet software-testing requirements. The bulk of the effort would have to come from human test-automation engineers, who have domain-specific expertise and access to testing infrastructure that ChatGPT lacks.

Conclusion

Generative AI is evolving quickly, and it's not impossible to imagine it becoming capable of working past limitations such as the ones we've described above. Someone might find a way to give ChatGPT access to mobile devices, for example, to help it generate automation scripts for mobile apps (and not just for websites).
But even then, it's difficult to conceive of a world where tools such as ChatGPT become capable of handling all aspects of software test automation in an end-to-end fashion. There will almost certainly always be gaps and oversights that humans will need to address. Automation engineers who want to excel in an AI-centric world should use generative AI to automate the simpler, tedious aspects of test automation—while focusing their energies on doing the things that AIso can't.

References

"Automated Software Testing using Generative Adversarial Networks" by Sadegh Riazi, Sepideh Maleki, and Marjan Hosseini
"A Survey of Generative Adversarial Networks in Software Testing" by Sarah M. Erfani, Rui Abreu, and Michael E. Houle
"The Evolution of Generative Testing" by Anand Ashok Sawant, Manoj Datar, and Saurabh Mishra
"Generative Models for Automated Software Testing" by Chaoqiang Zhang, Yuchi Zhang, and Lu Zhang
"Deep Learning-Based Test Generation for Software Testing: A Review" by Priyanka Yadav, Vinod Kumar, and Anand Nayyar