Balancing address lines for e-commerce in PHP

Delivery addresses are notoriously difficult to deal with. One issue we’ve encountered with our greetings card shop is long lines in an address that would get cut off when printed on the address label, so we have to manually tweak them to get the address to fit.

To reduce some of that manual work, I added a little helper that tries to balance the address lines where possible, i.e. take words off long lines and add them to neighbouring lines that have some space to spare.

The PHP function to do the line balancing looks like this:

<?php

class LineBalance
{
    /**
     * @param string $lineA
     * @param string $lineB
     * @param int    $targetMaxLength
     *
     * @return string[]
     */
    public static function balance(
        string $lineA,
        string $lineB,
        int $targetMaxLength = 35
    ): array {
        while (mb_strlen($lineA) >= $targetMaxLength) {
            $lineAWords = preg_split('/\s+/u', $lineA);
            $lineALastWord = array_pop($lineAWords);
            $prependedLineB = "{$lineALastWord} {$lineB}";
            if (mb_strlen($prependedLineB) >= $targetMaxLength) {
                break;
            }
            $lineA = implode(' ', $lineAWords);
            $lineB = $prependedLineB;
        }
        while (mb_strlen($lineB) >= $targetMaxLength) {
            $lineBWords = preg_split('/\s+/u', $lineB);
            $lineBFirstWord = array_shift($lineBWords);
            $postPendedLineA = "{$lineA} {$lineBFirstWord}";
            if (mb_strlen($postPendedLineA) >= $targetMaxLength) {
                break;
            }
            $lineB = implode(' ', $lineBWords);
            $lineA = $postPendedLineA;
        }
        return [$lineA, $lineB];
    }
}

(It’s implemented as a static method on a class to make it easier to auto-load as a helper function.)

The function takes two lines of text and a target max line length, and tries to balance the two lines to get them both under the target max line length if possible.

It does this by making two passes, one for each line. First it checks if the first line is over the target max length, and tries to pop words off the end of that line and prepend them to the second line, so long as that won’t push the second line over the length limit.

The second pass does the same thing but backwards – it tries to shift words off the beginning of the second line and on to the end of the first, again so long as this won’t push the first line over the target max length.

The word-splitting for both is done with preg_split() on a pattern for any amount of white-space: /\s+/u. This has the benefit of collapsing white-space in the lines.

The result might be best demonstrated with a unit test case:

<?php

class LineBalanceTest extends TestCase
{
    /**
     * @dataProvider balanceProvider
     *
     * @param string $inputLines
     * @param string $expectedBalance
     */
    public function testBalance(string $inputLines, string $expectedBalance): void
    {
        self::assertSame(
            $expectedBalance,
            implode(
                "\n",
                LineBalance::balance(...explode("\n", $inputLines))
            ),
        );
    }

    /**
     * @return array[]
     */
    public function balanceProvider(): array
    {
        return [
            [
                <<<STR
short line
another short line
STR,
                <<<STR
short line
another short line
STR,
            ],
            [
                <<<STR
a very long line with a lot of words that will go over the limit
a short line
STR,
                <<<STR
a very long line with a lot of words that will
go over the limit a short line
STR,
            ],
            [
                <<<STR
a short line
a very long line with a lot of words that will go over the limit
STR,
                <<<STR
a short line a very long line with
a lot of words that will go over the limit
STR,
            ],
            [
                <<<STR
a very long line with a lot of words that will go over the limit
a very long line with a lot of words that will go over the limit
STR,
                <<<STR
a very long line with a lot of words that will go over the limit
a very long line with a lot of words that will go over the limit
STR,
            ],
        ];
    }
}

With that general helper function, we can attempt to balance lines in delivery addresses a little bit, like this:

<?php

class Address {

		/* ... */

    public function balance(): self
    {
        $balanceLineTwoCity = LineBalance::balance(
            $this->line_two,
            $this->city,
        );
        $this->line_two = $balanceLineTwoCity[0];
        $this->city = $balanceLineTwoCity[1];

        $balanceLineOneLineTwo = LineBalance::balance(
            $this->line_one,
            $this->line_two,
        );
        $this->line_one = $balanceLineOneLineTwo[0];
        $this->line_two = $balanceLineOneLineTwo[1];

        return $this;
    }

}

This applies the helper function semantically to the address components, which look like this:

line_one
line_two
city
post_code
country

It tries to balance line_two with the city first to potentially free up some space, and then tries to balance line_one with line_two.

As an example with some made-up test data, it does something like this:

FOO FOOBAR
BLK 123 MAIN  STREET   EAST SQ  45 #67-890
Some   Place
123456
United Kingdom

-->

FOO FOOBAR
BLK 123 MAIN STREET EAST SQ 45
#67-890 Some Place
123456
UNITED KINGDOM

The result is not a perfectly formatted address, but it’s a good compromise to balance the lines and prevent parts of the address being cut off on the address label.


Tech mentioned