13 Czerwca, 2021

1424 s艂贸w 7 min.

Prepare your table data to CSV/Excel exports

Prepare your table data to CSV/Excel exports

I neglected this blog a little bit 馃槃 but I don’t want to abandon it. I decided to take what Andrzej Krzywda said about blogging to my 鉂わ笍. Andrzej said that the good topic to write a blog post is to describe what you learned in your work. So today I would like to describe how I solved the problem of modifying data in tables. Yeah, I know it may sound trivial, but I created an interesting solution, and get some good feedback about it. So I decided to share it on my blog.

(This is my first blog post in English, but ultimately, all posts will be translated into English and Polish.)

There are many cases when you need to export your application data. Like for example exports to the excel or CSV file. Those formats are very popular and really simplifies the work with data. You can import them to many different programs, and tools, and make some calculations, charts, etc.

Data exporting always requires properly prepare the raw data. Maybe you need to do format decimal places in numbers, maybe do the roundings, hide some columns, translate values, etc. And how to do that? The first solution that comes to my head was just the loop. I ain’t see a better choice here, but specific implementations may be very different.

Ok, so the loop will give you access to each row, and tuple. You can just modify values in tuple, depends on value or key.
Let’s take simple data with average expenses per years, they are made up by me:

system_id price type_of_expense year
1 14.324 4 2018
2 1014.321 1 2019
3 10.3333 1 2020
4 2.0 3 2021

Of course the data are stored in simple array, so they look like this:

$data = [
    ['system_id' => 1, 'price' => 14.324, 'type_of_expense' => 4, 'year' => 2018],
    ['system_id' => 2, 'price' => 1014.321, 'type_of_expense' => 1, 'year' => 2019],
    ['system_id' => 3, 'price' => 10.3333, 'type_of_expense' => 1, 'year' => 2020],
    ['system_id' => 4, 'price' => 2, 'type_of_expense' => 3, 'year' => 2021]
];

Ok, and now we need some modifications here. First, in target export, we need to hide the system_id column. Next round prices to two decimal places, and change separator from dot to the comma. Next, replace type_of_expense to a user friendly text value.

Sounds like plan 馃挭

Let’s start with hiding the system_id column:

foreach ($data as &$row) {
    if (isset($row['system_id'])) {
        unset $row['system_id'];
    }
}

Ok, so that should work, it’s simple, so everything is fine.
Let’s try the next case, price value formatting:

foreach ($data as &$row) {
    if (isset($row['system_id'])) {
        unset $row['system_id'];
    }

    if (isset($row['price'])) {
        $row['price'] = number_format($row['price'], 2, ',', ' ');
    }
}

Ok, next case is done. Now replace the type_of_expence to real value. Let’s assume, that we are using php enum library, that I described in one of my previous posts:

foreach ($data as &$row) {
    if (isset($row['system_id'])) {
        unset $row['system_id'];
    }

    if (isset($row['price'])) {
        $row['price'] = number_format($row['price'], 2, ',', ' ');
    }

    if (isset($row['type_of_expense'])) {
        $typeOfExpense = new TypeOfExpence($row['type_of_expense']);
        $row['type_of_expense'] = $typeOfExpense->getKey();
    }
}

And, the next… Wait! I can see some problems. This code on each next case becomes more and more unreadable. The second problem is that maybe we need to do some more exports with different data. Here we just hardcoded all cases in the loop. If we will need to use this code for another case, we will have to make a second loop and duplicate some cases.

Let’s try to do something with this. This was the step when I had started refactoring, and I remembered something from devupgrade course, with S艂awek Sob贸tka and Jakub Pilimon. They told something about splitting logic into three spaces. Those spaces are “what”, “how”, and “why” something supposed to change.
In our case, “why” always depends on the column name which is a key of the row array.
For “what” we know, that the “value” needs to change. And for “how”, we have some different cases, so it will be variable thing.

Then I had started by creating a new class that will hold all cases and executes them on the specific data:

interface RowModifier
{
    public function modify(array $row): array;
}

interface ColumnModifier
{
    public function supports($key): bool;
    public function modify($value);
}

class Modifier
{
    public function __construct(
        private array $rowModifiers,
        private array $columnModifires
    ) {}

    public function modify(array $data): array
    {
        $dataClone = $data;
        foreach ($dataClone as &$row) {
            $row = $this->modifyRow($row);

            foreach ($row as $key => &$value) {
                $value = $this->modifyValue($key, $value);
            }
        }

        return $dataClone;
    }

    private function modifyRow(array $row): array
    {
        foreach ($this->rowModifiers as $rowModifier) {
            $rowClone = $rowModifier->modify($row);
        }

        return $rowClone;
    }

    private function modifyValue($key, $value)
    {
        foreach ($this->rowModifiers as $rowModifier) {
            if ($rowModifier->supports($key)) {
                $valueClone = $rowModifier->modify($value);
            }
        }

        return $valueClone;
    }
}

First I started from create two interfaces for specific modifiers. Why column, and row? Because if we need to remove the column, we need access to the entire row to unset the specific key in the array. That’s may help us in some other cases like for example value modifying that depends on few columns.
I always make a clone of the data that will be changed because I’m a fan of the immutable approach. It sometimes avoids undesirable data changes.
This class has a modify method, which is iterating each row, and tuple, and apply all predefined modifiers, that are injected by the constructor.

The ColumnModifier interface has two methods. The supports method will told us “when” to change the data. The $value argument tolds us what will be changed, and the modify method implementation will told us “how” something will be changed.

During writing this post, I had an idea. If you want to get rid of two types of modifiers, you may always put the row reference (& prefix) as a second modify parameter, that will allow you to modify entire rows. It’s not a validated idea, so be careful.

Ok, and now let’s rewrite our cases to modifiers:

class FormatPrice implements ColumnModifier
{
    public function __construct(
        private string $supportsKey
    ) {}

    public function supports($key): bool
    {
        return $this->supportsKey === $key;
    }

    public function modify($value)
    {
        return number_format($value, 2, ',', ' ')
    }
}

class RemoveColumn implements RowModifier
{
    public function __construct(
        private string $columnName
    ) {}

    public function modify(array $row)
    {
        if (isset($row[$this->columnName])) {
            unset $row[$this->columnName];
        }
    }
}

class FillEnumValue implements ColumnModifier
{
    public function __construct(
        private string $supportsKey,
        private string $enumClass
    ) {}

    public function supports($key): bool
    {
        return $this->supportsKey === $key;
    }

    public function modify($value)
    {
        return (new $this->enumClass($value))->getKey();
    }
}

All right, now it looks very clean, and readable. All of the cases are described by the class name, and all classes will do exactly one thing. The code is even extensible, by just creating new modifiers. Awesome! 馃槃 馃挭

Ok, and now let’s try to use this:

$modifier = new Modifier(
    new FormatPrice('price'),
    new RemoveColumn('system_id'),
    new FillEnumValue('type_of_expense', TypeOfExpense::class)
);

And how clear is that? At first glance, you can see what’s happened here. We have a modifier definition, that will format the price and remove the system_id column, and fill enum int with more user friendly value.

There is one more thing to improve. The new word, makes coupling to the specific implementation. I think this is a good case to use builder design pattern right there:

class ModifierBuilder
{
    private array $rowModifiers = [];
    private array $columnModifiers = [];

    public function addPriceFormatter(string $supportsKey): self
    {
        $this->columnModifiers[] = new FormatPrice($supportsKey));

        return $this;
    }

    public function addColumnRemover(string $columnName): self
    {
        $this->rowModifiers[] = new RemoveColumn($columnName);

        return $this;
    }

    public function addEnumValueFiller(string $supportsKey, string $enumClass): self
    {
        $this->rowModifiers[] = new FillEnumValue($supportsKey, $enumClass);

        return $this;
    }

    public function build(): Modifier
    {
        return new Modifier($this->rowModifiers, $this->columnModifiers);
    }
}

And just use it:

class SomeClass
{
    public function __construct(
        private ModifierBuilder $modifierBuilder
    ) {}

    public function prepareData(array $data): array
    {
        $modifier = $this->modifierBuilder
            ->addPriceFormatter('price')
            ->addColumnRemover('system_id')
            ->addEnumValueFiller('type_of_expense', TypeOfExpense::class)
            ->build();

        return $modifier->modify($data);
    }
}

The builder is a great place to make some injections, like for example, TranslatorInterface implementation to build value translator modifier:

class ModifierBuilder
{
    public function __construct(
        private TranslatorInterface $translator
    ) {}

    public function addValueTranslator(
        string $supportsKey,
        string $translationPrefix
    ): self {
        $this->columnModifiers[] = new TranslateValue(
                $this->translator,
                $supportsKey,
                $translationPrefix
            );

        return $this;
    }
}

At the and our data should look like this:

price type_of_expense year
14,32 FOOD 2018
1014,32 TAXES 2019
10,33 TAXES 2020
2,00 HOME 2021

And that’s how I solved the problem 馃槃

Photo by Lukas Blazek on Unsplash