TWIL July 25th 2025

July 25, 2025

After spending most of the week trying to get a workflow setup with Copilot/Claude 4 to migrate a bunch of legacy class components to function components, I ended up giving up and just doing it quicker/better myself.

Since we have ~40 class components still in the codebase, my end goal was to set up a semi-automated pipeline we could run them through one-by-one, allowing the team to rid ourselves of a large chunk of tech-debt without the fairly large time investment that would usually require.

I started off by setting up a copilot-instructions.md with some process/code quality expectations about the project, and asking it to migrate a component I already wrote tests for. It did a perfectly acceptable job of that, but seeing as writing tests is the hard/time-consuming part there's not much benefit to offloading the relatively simple refactoring to an LLM.

Out of curiosity, I tried just telling it to write unit tests for the next component, with predictably terrible results. After rejecting the over-mocked, over-detailed mess, I set about writing a reusable prompt in a markdown file which I could just point it to in the future and say 'Follow the instructions in {markdownFile} for {component}'. After some experimentation I ended up on basically:

You are going to refactor the provided class component to a function component using the following steps. After each step, stop and ask for confirmation before proceeding unless instructed otherwise.

1. Read the component and summarise the functionality you think should be tested/how you would test it
2. Write the tests and run them to ensure they pass. If there are any failures, ask for confirmation before applying fixes.
3. Run linting tools and ensure they pass. If there are any failures, ask for confirmation before applying fixes.
4. Prompt the developer to manually review the tests and commit them if they're happy with the result.
5. Refactor the provided class component to a function component.

- Ideally you will not need to change the test file. However it is acceptable to make necessary changes to mocks, e.g. switching from mocking `withRoute` to mocking `useHistory`

6. Run linting tools and ensure they pass. If there are any failures, ask for confirmation before applying fixes.

Despite what I'd consider to be fairly clear step-by-step instructions and the guidelines provided by copilot-instructions.md, it frequently ran into the same issues of over-mocking and over-detailed testing I had with just asking it to write tests without further guidance. Some particularly egregious examples:

Ignoring instructions to not mock stores
- In copilot-instructions.md I explain that we prefer using stub data inserted into the real stores to mocking them
- Despite this, every generated test overview resulted in me having to remind it not to mock the stores
Mocking unnecessarily
- On a related note, every test it generated mocked at least a few things which didn't need to be mocked at all.
- It seems to think that anything referenced in the component must be mocked.
Ignoring instructions to use stubs from a given directory
- In copilot-instructions.md I explain stubs for data to add to stores can be found in jest/stubs
- Despite this it repeatedly generated its own test data, often requiring a few iterations to make it conform to the correct type
Explicitly testing minor interactions covered impicitly by other tests/that props are passed to child components
- In copilot-instructions.md I explain we use BDD style tests
- Despite this, it wanted to generate a lot of tests along the lines of 'passes props to child component' or 'correctly interacts with mobx store'
- These should be tested implictly by testing the component's behaviour, and a lot of them were for functionality so trivial it's arguably pointless to test (e.g. just forwarding a prop to a child component)

Fixing these issues by further prompting or just making manual changes negated most time savings from automating the test writing, and felt a lot more frustrating than just understanding the component and writing tests myself.

Even if the test quality was up to snuff, I think I'd still prefer writing them myself. Reading through a component and deciding what does/doesn't need to be tested myself gave me a much better understanding of how the component worked than trying to fact check the LLM. Some people seem to think we don't need to understand the code as long as LLMs do, but I'm of the opinion those people are going to be doing a lot of very painful debugging in a year or so.

Overall I think using an LLM for this, at least if you want to maintain the testing style used by humans in your project, probably isn't worth it. I spent a few days trying to prompt it in such a way that it didn't require repeated manual interventions which negate the expected time savings, and even if I succeeded I probably would've understood the components less than if I wrote the tests manually.