AI tools can draft browser automation fast, but speed is not the same as quality. AI written Selenium Playwright tests often look polished at first glance while still hiding weak locators, fragile waits, missing assertions, and unrealistic data setup. If those drafts enter your suite without review, they can increase flaky failures and lower trust in automation.

This tutorial gives QA engineers and SDETs a practical review checklist for AI-generated UI tests. The goal is simple: keep the useful acceleration, but filter out the patterns that usually break in CI or miss real product risk. You can use the checklist during pull requests, pair reviews, or before accepting any generated test into your framework.

Why AI written Selenium Playwright tests need validation

Generated UI tests are usually built from common examples found in public code. That means the output often favors short, generic patterns instead of your application’s real synchronization rules, naming conventions, page objects, fixtures, and assertion strategy. A test might run once on a developer laptop and still be a poor production-quality test.

  • Locators may target brittle CSS paths or visible text that changes often.
  • Wait logic may rely on timing guesses rather than observable application state.
  • Assertions may prove only that the page loaded, not that the workflow succeeded.
  • Test data may skip roles, permissions, edge cases, or cleanup needs.
  • The code may ignore team standards for reuse, reporting, and maintainability.

That is why review must focus on behavior and reliability, not only on whether the script compiles.

Checklist category 1: validate the test purpose first

Before reading the code line by line, confirm what the test is supposed to protect. A generated script is low value if it automates a flow that is already covered better elsewhere or if it validates trivial behavior while missing business risk.

  • Can you describe the user risk or requirement in one sentence?
  • Does the scenario belong at UI level, or would API or component coverage be cheaper and more stable?
  • Is the test independent enough to run repeatedly without hidden prerequisites?
  • Does it duplicate another existing Selenium or Playwright test?

If the purpose is unclear, reject or rewrite the test before spending time on syntax cleanup.

Checklist category 2: inspect locators for resilience

Weak locators are one of the most common failures in AI-generated browser tests. Models often choose whatever selector appears easiest in sample HTML, not what is most stable in your product.

  • Prefer stable attributes such as data-testid, data-qa, or accessible roles with meaningful names.
  • Reject deep CSS chains like div > div:nth-child(3) > span.
  • Review text-based selectors if the UI is translated or marketing copy changes often.
  • Ensure the locator is specific enough to avoid matching multiple elements.
  • Confirm the selector strategy matches your framework conventions.

Playwright usually makes role-based and test-id locators easier to express, while Selenium suites often rely on page objects and explicit locator constants. The rule is the same in both: choose selectors that survive normal UI change.

// Better Playwright pattern
await page.getByTestId('checkout-submit').click();
await expect(page.getByRole('heading', { name: 'Order confirmed' })).toBeVisible();
// Better Selenium pattern
By submitButton = By.cssSelector("[data-testid='checkout-submit']");
By confirmationHeading = By.xpath("//h1[normalize-space()='Order confirmed']");

wait.until(ExpectedConditions.elementToBeClickable(submitButton)).click();
wait.until(ExpectedConditions.visibilityOfElementLocated(confirmationHeading));

Checklist category 3: review waits and synchronization

AI-generated tests often fail because they guess timing. Look for blind sleeps, immediate assertions after navigation, and actions performed before the UI is actually ready. Reliable tests wait on state, not elapsed time.

  • Prefer explicit waits for visibility, enabled state, network completion, or URL changes.
  • Check whether loading spinners, toasts, and async saves are handled deliberately.
  • Make sure retry behavior is not hiding a real product bug.
  • Confirm the test does not depend on animation timing or arbitrary delays.

A useful review question is: what exact signal tells us the page is ready for the next step? If the code cannot answer that, it is probably fragile.

Checklist category 4: strengthen assertions

Many generated tests stop at shallow checks such as page title, URL, or one visible element. Those checks are not enough when the real business outcome involves saved data, role-based access, totals, validation messages, or downstream workflow changes.

  • Assert the user-visible outcome of the flow, not just an intermediate click.
  • Validate error messages for negative tests, not only success paths.
  • Check persisted state through UI, API, or database hooks when appropriate.
  • Make assertions specific enough that a partial failure does not still pass.

Good UI automation answers, with evidence, whether the workflow succeeded for the right reason.

Checklist category 5: verify data, environment, and cleanup

Generated tests often assume ideal data. In real environments, accounts, feature flags, permissions, and cleanup matter. A test that leaves data behind or depends on yesterday’s state will eventually become unstable.

  • Is the test data created intentionally and cleaned up after the run?
  • Does the user role match the scenario?
  • Are feature flags, region settings, or environment dependencies explicit?
  • Can the test run in parallel without colliding with other tests?
  • Does the script avoid hard-coded secrets and environment-specific URLs?

This is where many AI drafts need the most human correction because the model does not know your environment rules.

Checklist category 6: review maintainability and framework fit

An AI-generated test can be technically correct and still be a maintenance burden. Review how well it fits your project structure.

  • Move repeated flows into page objects, helpers, or fixtures if your framework uses them.
  • Use your team’s naming, logging, and reporting style.
  • Remove dead code, duplicated setup, and commented experiments.
  • Check whether the test belongs in smoke, regression, or a focused feature pack.
  • Confirm that failure messages will be understandable in CI.

A generated test should become normal team-owned code, not a special snowflake that only works because nobody wants to touch it.

Common mistakes reviewers should catch quickly

  • Selectors copied from browser dev tools with positional CSS paths.
  • Assertions that only check the presence of one container element.
  • Test steps that mix multiple business scenarios into one long flow.
  • Hard-coded credentials, dates, order IDs, or environment assumptions.
  • Network interception used in a way that removes too much real behavior from the UI test.
  • Missing negative coverage for validation and permission boundaries.

If you see two or three of these signals together, it is usually faster to rewrite the test from the scenario intent than to keep patching the generated draft.

A simple review workflow for teams

One practical approach is to treat AI as a first-draft assistant, not as an autonomous test author.

  • Generate the first draft from a clear scenario description.
  • Review the draft using this checklist before it reaches the main branch.
  • Run the test locally and in CI with realistic data.
  • Track failure patterns so future prompts and review rules improve.
  • Keep only the generated tests that provide meaningful signal.

This process preserves speed without giving up engineering discipline.

Conclusion

AI written Selenium Playwright tests can absolutely save time, but only when QA engineers review them like production code. Start with purpose, then validate locators, waits, assertions, data handling, and maintainability. If a generated test survives that checklist, it is much more likely to help your suite instead of quietly weakening it.