こんにちは! I’m Haris, an Engineering Manager at MeetsMore. This post is part of our Advent Calendar 2025 celebration. I’ll be writing in English, but if you’re interested in joining our multilingual Engineering team, please read on!
Proving that a piece of logic does what it is supposed to do is a constant struggle. Often, testing comes in the form of explicit, hard-coded cases with hard-coded expectations, e.g.:
expect(add(1, 2), 3) expect(add(5, 3), 8) // ...infinitely many more cases
This approach can work if you have well-documented cases:
// See INCIDENT REPORT 253: passing -3 and 3 to our addition function // blew up our servers and made everyone unhappy. expect(add(-3, 3), 0)
But expecting your hard-coded example tests to cover all possible inputs (such as all integers) is impossible. The limitations of manual testing continues to be a problem in real-world code. Let’s look at a real MeetsMore example involving phone numbers.
Phone numbers and encodings
In Japanese software, it is common to receive different encodings for the same character: 1vs. 1 for one. These are referred to as Zenkaku (full-width) and Hankaku (half-width). There are variations for numbers, hyphens, kana marks, and more. Because users switch between software, keyboards, and input methods, it’s common to receive mixed-width text accidentally:
123ー4567-8910
For data integrity, let’s say we want our system to normalize all phone numbers to Hankaku. We create a function zenkakuToHankaku that normalizes Zenkaku and mixed strings into Hankaku.
A (simplified) implementation might look like:
const zenkakuToHankaku = (input: string): string => input.replace(/[\uFF0D\u30FC\u2010\u2013\u2014\u2015\u2212 /*...*/]/g, (match) => ZENKAKU_TO_HANKAKU[match] ) }
But how do we know we’ve covered everything? Traditional manual testing might look like this:
describe('convert mixed mapped phone numbers to hankaku', () => { it('should preserve phone number formatting despite different hyphen mappings', () => { const phoneNumbers = [ '000-0000-0000', // ASCII hyphen '000-0000-0000', // Full-width hyphen '000‐0000‐0000', // Hyphen '000–0000–0000', // En dash '000—0000—0000', // Em dash '000−0000−0000', // Minus sign '000ー0000ー0000', // Katakana-Hiragana prolonged sound mark ] phoneNumbers.forEach((phoneNumber) => { const converted = zenkakuToHankaku(phoneNumber) expect(converted).toMatch(/^\d{3}-\d{4}-\d{4}$/) expect(converted).toBe('000-0000-0000') }) }) })
This only tests one phone number format (000-0000-0000) with various hyphen combinations. But what about different numbers like 123-4567-8910 and all their possible variations? Worst of all, all of our tests missed cases involving mixed hyphens within a single phone number, simply because we didn't think to test for them.
Much like our toy example of adding two integers, the input space for our phone number normalizer is too large for manual testing. We instead need to programmatically explore this input space and build our tests automatically. This is property testing.
Property testing
First, some definitions.
Property testing is a field of theoretical computer science concerned with designing super-fast algorithms for approximate decision-making, where the decision refers to properties or parameters of huge objects. Given the possible inputs of a field S and a property P, and we want to know whether S ∈ P, we sample small parts q of S and check P, where q ≪ |S|. If S is ε-far from satisfying P, meaning that at least an ε-fraction of S would need to be changed for it to satisfy P, then these samples will likely expose the violation. — Goldreich, Oded, Introduction to Property Testing. 2017. https://en.wikipedia.org/wiki/Property_testing
In more practical software engineering terms, we invert the traditional testing approach. Instead of testing specific example inputs, we state general truths (properties P) about a very large input space S, then test those truths against randomly generated examples, q, drawn from S.
To make this concrete, let’s apply this concept to our phone number normalizer above.
Refactoring into property-based tests
Using fast-check (https://fast-check.dev), we build generators representing our input space:
import fc from 'fast-check' const hyphens = fc.constantFrom('-', '-', '‐', '–', '—', '−', 'ー') const halfDigits = fc.constantFrom(...'0123456789'.split('')) const fullDigits = fc.constantFrom(...'0123456789'.split(''))
Then, we compose them:
const mixedDigits = fc.oneof(halfDigits, fullDigits) const mixedPhoneNumberGenerator = fc .tuple( fc.array(mixedDigits, { minLength: 3, maxLength: 3 }), hyphens, fc.array(mixedDigits, { minLength: 4, maxLength: 4 }), hyphens, fc.array(mixedDigits, { minLength: 4, maxLength: 4 }), ) .map(([p1, h1, p2, h2, p3]) => `${p1.join('')}${h1}${p2.join('')}${h2}${p3.join('')}`)
This will generate any number of valid phone-number-like inputs we desire within our mixed digits and hyphens space. It can be hard to visualize this as the inputs are no longer hard-coded. So, let’s run our generator:
fc.sample(mixedPhoneNumberGenerator, 10).forEach(console.log) /* '868-9686–3108', '080–8479–5947', '782ー9308‐8292', '150—8023−4198', '900—8868−6155', '553–8334‐3036', '332–3950—4409', '017—6528ー8975', '738ー8791-9636', '333‐6651ー2926' */
Already, you can see that these test cases are much better than our hard-coded ones. We also generated mixed hyphen inputs automatically as a natural result of our properties . This is automatic edge case detection.
Using our generators
In real tests, we can use our generators using a small helper exercise function:
const exercise = (sampleSize: number) => <T>(generator: fc.Arbitrary<T>) => (name: string, test: (sample: T) => boolean | void) => { it(`${name} (${sampleSize} samples)`, () => { fc.assert(fc.property(generator, test), { numRuns: sampleSize }) }) } const exerciseMixedPhoneNumbers = exercise(300)(mixedPhoneNumberGenerator) exerciseMixedPhoneNumbers( 'should normalize phone numbers despite mixed zenkaku/hankaku mappings', (generatedNumber) => { const normalized = zenkakuToHankaku(generatedNumber) const expected = generatedNumber .replace(/[^\d0-9]/g, '') .replace(/[0-9]/g, (s) => String.fromCharCode(s.charCodeAt(0) - 65248), ) return ( normalized === `${expected.slice(0, 3)}-${expected.slice(3, 7)}-${expected.slice(7, 11)}` ) }, )
Running this using jest yields:
✓ phone number zenkaku hyphen normalization (300 samples) (10 ms)
We just tested 300 different phone numbers with little extra effort.
Breaking our tests
For illustration, let’s break our implementation purposely.
const normalized = zenkakuToHankaku(generatedNumber).replace(/-/g, 'ー') // bad mapping
Running now:
✕ phone number zenkaku hyphen normalization (300 samples) Property failed after 1 tests Counterexample: ["000-0000-0000"] Shrunk 12 time(s)
Not only did our property test catch the error instantly, it provided us a counter example and a seed for deterministic generation, e.g. within CI/CD pipelines. Running the same broken test gives us more counter examples: "000-0000-0000", "000-0000-0000".
Our property tests teach us something more fundamental about our code, instead of just confirming what we already know from our hard-coded tests. In this case, it shows us that any sequence of 3-4-4 digits should be separated by a - separator.
General discovery
This general pattern discovery is a powerful benefit of our new testing approach and can lead to strong downstream benefits in software development, e.g.:
- Speeding up the Exploration phase and validating the Expansion phases in Kent Beck’s 3X model https://medium.com/@kentbeck_7670/the-product-development-triathlon-6464e2763c46
- Allowing for more powerful refactoring transformations in Robert C. Martin’s TPP https://en.wikipedia.org/wiki/Transformation_Priority_Premise
- Automating some common testing methods like Martin Fowler’s Strangler Fig Pattern https://martinfowler.com/bliki/StranglerFigApplication.html
Why this matters in the age of AI
Why not just ask an LLM: Generate 300 phone numbers with mixed Japanese encodings? Let’s estimate the cost.
Using a token estimator like https://platform.openai.com/tokenizer, 300 generated phone numbers is ~3,900 tokens. Assuming output-only costs from https://platform.openai.com/docs/pricing, our estimated costs are:
GPT-4.1: 3,900 × $12 / 1,000,000 ≈ $0.0468 (~4.7 cents or ~73円) GPT-4.1-mini: 3,900 × $3.20 / 1,000,000 ≈ $0.01248 (~1.25 cents or ~194円)
This is before adding input prompt overhead, formatting, and tokens from JSON structuring. Because it is fundamentally expensive and nondeterministic to rely on LLM generation, we will have to hard-code the output into our tests, bringing us right back to original testing problem.
Testing a million phone numbers
By contrast, our property-based generator costs nearly $0, runs instantly, runs on every CI/CD cycle, is deterministic and reproducible, and can scale to millions of generated inputs. For example:
exercise(1000000)(mixedPhoneNumberGenerator)(/* ... */) ✓ phone number zenkaku hyphen normalization (1000000 samples) (14611 ms)
We just tested our zenkakuToHankaku function against one million phone numbers in under 15 seconds, at near zero costs. In a LLM, this would be prohibitively expensive. GPT-4.1 would cost ≈ $156 or ~24,230円 and GPT-4.1-mini would cost ≈ $42 or ~6,523円.
Back to fundamentals
AI is powerful, but it cannot replace fundamental techniques like property-based testing. When correctness matters, such a testing strategy gives us:
- Generality over specifics
- Robustness over brittleness
- Confidence over hope
Programmatically exploring vast input spaces is not just beneficial, it can be the only way at times. Of course, you can ask an LLM to write the generators for you.
Hiring
If this is interesting and you want to help MeetsMore further build its testing story, we are actively hiring engineers right now: https://corp.meetsmore.com/. Thanks for reading.