Conceptual illustration of a developer overseeing an artificial intelligence interface that generates software test code.

Alexandre Rivest

7 min lecture 26 January, 2026

AI and Automated Testing: Why Human Expertise Remains Essential

Artificial intelligence is revolutionizing the way we work in IT. It's changing our practices, our ways of thinking, and allowing us to move much faster on many tasks. But I'm observing a worrying trend: more and more developers who are not used to writing tests are generating them automatically with AI.

At first glance, it's a dream. No need to write all that repetitive code, no need to rack your brains to find all the edge cases. But through my years in software development, I've learned one thing: speed of production is not synonymous with quality. You can produce a lot of tests very quickly, but does that mean you're producing good tests?

AI knows how to write tests. But writing good tests is another story.

Ironically, with AI, we no longer really have an excuse not to write tests. AI can help us generate them, find edge cases, and speed up repetitive work. But here's the trap: tests have become even more critical than before. Why? Because AI itself introduces risks. It moves fast, it's productive, but it can easily break things by accident; a bit like an enthusiastic junior developer. We would therefore need even more robust tests to protect ourselves from these errors. The problem? The tests that AI generates by default are not good.

I recently experienced a situation that perfectly illustrates this problem. A developer at a client was about to write their first tests in the application! It’s a project where tests have only recently been implemented. Enthusiastic, the developer used artificial intelligence to generate its first tests. The result? Tests that validate the implementation line by line, with mocks everywhere, extremely fragile to the slightest code change.

Tests: A Fundamental Tool, Not a Chore

To understand why this is problematic, let's go back to basics. Why do we write tests? It's not just to please QA or to have a beautiful code coverage to show the client. It's not because it's trendy in 2026!

Tests are our safety net. In software development, the only constant is change. Requirements change, technologies evolve, and we must constantly adapt our code. Without tests, every modification is a leap into the unknown.

When developing a complex application, it becomes easy to make changes that accidentally break other parts of the application. Elsewhere in the code, or even directly in the code we are modifying. Even if we apply advanced architecture patterns to make the code modular and minimize risk, the architecture itself can become faulty. This is when tests can truly shine in their role as protectors.

Test Behaviours, Not Implementation

It is precisely for this reason that tests must focus on the application's behaviours, and not on the implementation details. A test linked to the implementation—which checks that a private method is called, that a certain number of lines execute, or that an internal variable has a certain value—becomes a hindrance. As soon as you refactor the code, even without changing the visible behaviour, these tests break.

A behaviour-oriented test, on the other hand, remains valid as long as the application does what it is supposed to do. It describes what the system does, not how it does it. This is the difference between "the user can log in with a valid password" and "the validatePassword method returns true after calling hashService.compare."

In software development, we talk about white-box tests (which know the internal workings of the system) and black-box tests (which only see the inputs and outputs). The behaviour-oriented approach is the black-box: "If I provide X, do I get Y?" Without worrying about how the system arrives at that result.

The Limits of AI in Test Generation

Now that we understand the distinction between testing behaviour and testing implementation, let's look at what AI actually does when asked to generate tests.

Here is what I have noticed over time: AI tends to write white-box tests. It will:

Take all the information inside the class or component for granted
Mock everything that is not in the tested file
Assume it knows how everything works

The result: tests that test the code line by line, exactly what we want to avoid. We want implementation-agnostic tests because they are much less fragile.

Here is a concrete example of what AI typically generates:


// ❌ AI-Generated Test - coupled to implementation

test('login calls authService.validatePassword and returns user', async () => {

  const mockAuthService = {

    validatePassword: jest.fn().mockResolvedValue(true),

    generateToken: jest.fn().mockReturnValue('token-123'),

  };

  const mockUserRepository = {

    findByEmail: jest.fn().mockResolvedValue({ id: 1, email: 'test@test.com' }),

  };


  const result = await login('test@test.com', 'password', mockAuthService, mockUserRepository);

  expect(mockUserRepository.findByEmail).toHaveBeenCalledWith('test@test.com');

  expect(mockAuthService.validatePassword).toHaveBeenCalled();

  expect(mockAuthService.generateToken).toHaveBeenCalledWith({ id: 1 });

  expect(result.token).toBe('token-123');

});

This test checks how the code works: which methods are called, in what order, with what parameters. If tomorrow we refactor login to use a single service instead of two, this test breaks, even if the behaviour remains identical.

Compare this with a behaviour-oriented test:


// ✅ Behavior-Oriented Test

test('given valid credentials, when logging in, should return authentication token', async () => {

  // Given: an existing user with valid credentials

  await createUser({ email: 'user@example.com', password: 'valid-password' });


  // When: the user logs in

  const result = await login('user@example.com', 'valid-password');


  // Then: they receive an authentication token

  expect(result.token).toBeDefined();

  expect(result.success).toBe(true);

});

This test describes what the system does, not how it does it. We can refactor the implementation as much as we want: as long as the user can log in and receive a token, the test remains green.

The Exception: Characterization Tests

When you need to modify a legacy system without tests, it is normal not to know all the functionalities that will be impacted. Characterization tests capture the current behaviour of the system, whatever it may be, to create a safety net before refactoring.

In this specific context, AI can quickly generate test coverage that documents the current state of the code, without the biases a human might have about what "should" happen. This is one of the rare cases where white-box testing is not only acceptable but desirable.

Caution: These tests remain temporary. Once the refactoring is complete, they must be replaced by true behaviour tests.

Where AI Truly Shines

Beyond characterization tests, AI excels at tedious tasks where the developer remains in control:

Refactoring existing tests: Eliminating duplication, grouping setup code into reusable functions, improving readability.
Data generation: Creating realistic data sets, API mocks based on TypeScript interfaces.
Edge cases: Suggesting edge cases that we might have forgotten.

But when creating tests for new code, if it lacks an example or guide on how to write good tests, the AI will default to creating tests that add no value.

And that's the problem. To provide this guide to the AI, the developer must themselves know what a good test looks like. And even beyond that: every generated test must be reviewed. If you can't recognize a good test from a bad one, how can you do a code review of what the AI produces? You end up unquestioningly validating code that you don't really understand, which is exactly the opposite of what we're looking for.

Learn to Write Good Tests Before Automating

Returning to the situation at the client where the developer had started generating tests with AI. For their task, I decided to review the tests with them to show how to structure them correctly.

Step 1: Write Only the Test Names

The first step is to write only the test names. Not the implementation. Just the names.

We looked at the service together, the functionality they had implemented, and I asked them to think: "What are the possible behaviours from a user's point of view?"

This question forces you to think in terms of behaviours, not implementation.

For a login page example, it could be:

"When the user enters a valid password, they access their dashboard"
"When the user enters an invalid password, an error message is displayed"
"When the user has too many failed attempts, their account is locked"

These phrases emphasize the desired behaviour. This is the stage where you really need to think seriously about the different possible cases.

Tip: These behaviour cases are often part of the story acceptance criteria. QA, when planning their manual tests, have probably already had these thoughts. Involve them! The Product Owner can also help you define these cases. We don't have the ultimate knowledge of the product. Take advantage of everyone's expertise.

Something super important: if the code has already been implemented before defining the tests, do not try to rely mainly on the implementation to define them. There is a risk of relying too much on the "how" instead of the "what." The starting point must always be: "What is the expected behaviour from the user's point of view?"

Step 2: Structure the Skeleton with Comments

Once the test names are written (and ideally validated with the team), we move on to the structure. But not directly to the code!

The second step is to write the test skeleton with comments for each section.

For example, for the invalid password test:


test('given an invalid password, when logging in, should display error message', () => {

  // Given: simulate a user entering a bad password

  // When: submit the login form

  // Then: verify that the error message is displayed

});

The given-when-then format of the name naturally guides us to structure the test:

Given: What needs to be prepared?
When: What action triggers the behaviour?
Then: What is the expected assertion?

This step helps ask the right questions without yet diving into the implementation. We are making a draft of what it should do.

Advice: These different sections must truly reflect the test name. If this is not the case, you are probably too close to the implementation.

Step 3: Implement the Test

Now that we have our structure, we can implement each section. We can add more detailed comments if necessary before writing the code:


test('given an invalid password, when logging in, should display error message', () => {

  // Given: simulate a user entering a bad password

  // - Display the login form

  // - Prepare for login refusal


  // When: submit the login form

  // - Fill in the email and password fields

  // - Click the login button


  // Then: verify that the error message is displayed

  // - Search for the error message in the DOM

  // - Validate that the user is not redirected

});

It is at this moment that we think about the technical details: fixtures, necessary mocks (as few as possible!), DOM selectors, etc.

This three-step approach emphasizes the kind of tests that truly bring value. It is inspired by BDD (Behaviour-Driven Development), which structures tests around expected behaviours rather than implementation details. And if we apply these steps before writing the code, we are doing TDD (Test-Driven Development). The tests then guide the design rather than documenting it after the fact.

To learn more: BDD (Behaviour-Driven Development) and TDD (Test-Driven Development)

Use AI as a Co-Pilot, Not a Pilot

So, should AI be banned? Absolutely not. But we must make the distinction between "AI doing the work for you" and "AI helping you go faster." The developer must remain in control. They must understand what a good test is and have the skills to write it themselves. Otherwise, how can one evaluate if the generated test is good? What should change? Why is this mock problematic?

The developer defines the what, the AI accelerates the how.

Conclusion

AI is a formidable accelerator, but it does not replace human understanding of technological products.

Automating tests with AI is a bit like using a GPS. It's convenient for going fast, but if you don't watch the road, you'll end up in the ditch. The developer remains the pilot. It is you who know the context, the business stakes, and what truly matters to your users.

Before delegating to AI, master the basics: think in behaviours, structure your tests, evaluate what is produced. Once this skill is acquired, AI becomes a formidable ally.

A test is not there to prove the code works. It is there to allow the code to change without fear. And no AI can guarantee that for you.