Practical Techniques to Reduce the Harm of Active Record

last revised November 3rd, 2023, originally published October 28th, 2023

Properties of Effective Business Software
Principle: Expose Behavior, Not State
Typical Approach
Preferred Approach
Providing Guardrails
Principle: Encapsulate Local and Distributed State Changes
Typical Approach
Out of Sync Behaviors
Transactional Consistency
Preferred Approach
Encapsulate State Changes in Models
Exclusively Use Repositories and Transactional Outbox Event Dispatch
Principle: Prefer Passing Independent Variables to Methods.
Typical Approach
Preferred Approach
Prefer Flat Structures
Prefer to Pass Individual Fields Instead of Data Models
Summary
Additional Resources

Data model objects such as Active Record models have many negative impacts to software.

extreme coupling of code to database schema
increased cost to isolation testing
longer lead times for code refactoring
longer lead times for database restructuring
high opportunity for creating inconsistent state

This article presents a few effective mitigation strategies to reduce the negative business impact of this approach.

For direct factual content about the short and long-term costs of using Active Record for fully automated business processes, read this thorough article.

Properties of Effective Business Software

The software that enables critical business value often has properties which are a critical liability. For example, the failure of the system to quickly adapt to changing business requirements is blamed on shortcuts that were taken in the past.

I assert that this is often an incorrect assessment.

Software often fails –not because the business cut corners in order to meet business demands– but because the software was not written to support the speed of change that the business needs.

In order to fix such systems, developers secure the budget for a system rewrite. Then they end up building a system with the exact same medium and long-term life cycles.

Corners MUST be cut if the cost to change a system is too high.

Business software has a single goal, to empower the business in its mission. It generally needs to be reliable and to enable low lead times for business process changes.

While this seems to be common sense, I assert that we are not measuring our software development approach against these metrics. When we apply this measurement, our software falls short.

Just as declarative programmers discover ways to improve their code, behavioral modeling can help us to improve our data-model code. The following are some easy principles to apply to build much better systems using data model objects.

Principle: Expose Behavior, Not State

Due to massive leakage of the data model, the cost of comprehending and modifying all code coupled to data-model objects is significantly increased. Encapsulation is completely absent and its benefits are lost.

Typical Approach

ORM implementations including –but not limited to– ActiveRecord often expose database schema as public properties on data model objects.

$order->discount_code

$order is an ActiveRecord entity and discount_code is a database column in the orders table.

Use cases are implemented as one or more “service objects" working together to mutate these model objects field by field.

$order = Order::create();

$order->discount_code = value;
$order->other_field = value;
$order->something_id = value;
$order->other_id = value;
$order->timestamp = value;

return $order;

Preferred Approach

Instead of assigning values to public properties, define life cycle behaviors.

$order = Order::place(
    $discountCode,
    $otherField,
    $somethingId,
    $otherId,
    $timestamp
);

class Order extends ORM
{
    public static function place(...): self
    {
        $order = Order::create();
        
        $order->discount_code = value;
        $order->other_field = value;
        $order->something_id = value;
        $order->other_id = value;
        $order->timestamp = value;
        
        return $order;
    }
    
    public function pay($paidAt): void
    {
        // assign paid at timestamp
    }
}

Define ‘state change’ methods on the models and encapsulating the implementation details related to these behaviors.

This pays off in a few ways. It’s easier to build models to the correct state in both production and testing scopes. It also disincentivizes deeply nested control structures.

Testing in isolation improves because test cases do not need to manually assign many fields which become out-of-sync with production systems.

It’s now possible to construct entity state for tests using production code.

$order = Order::place(...);
$order->finalize(...);
$order->pay(...);

The important behavior is encapsulated comfortably inside the object boundary and yet, if the production code changes tests won’t need to be updated.

Providing Guardrails

Other developers still have access to the public properties. This is part of the negative cost of Active Record. Some implementations support removal of public property access. Make use of this feature to gain the benefits of encapsulation.

Disable access to public properties and force usage of behavioral methods.

This reduces the leakage of database schemas in the application and emphasizes behaviors over direct state manipulation.

Principle: Encapsulate Local and Distributed State Changes

Increasingly we implement message-driven systems in order to handle concerns such as scale, complexity, governance, and more. It’s critical that we ensure both local and distributed state changes occur correctly and atomically.

Typical Approach

Active Record models utilize their direct access to a database handle to store changes. Event dispatchers are used to dispatch messages to a bus.

$order->save();

$this->eventDispatcher->dispatch(
    new OrderWasPaid(...)
);

Another common approach is to use “observers" to react to state changes.

public function onPaid(Order $order): void
{
    $this->events->dispatch(
        new OrderWasPaid(...)
    );
}

There are issues with both approaches.

Multiple behavioral routes often result in similar conclusions. For example, an order can become PAID when a customer “checks out" on the website or when a recurring payment is charged automatically at the end of every month.

Out of Sync Behaviors

When multiple routes arrive at the same conclusion, they must duplicate behavior. The dispatch of the events or correct assignment of fields may become out of sync. This chance is increased as new developers are added to the project.

Transactional Consistency

If the system fails between the storage of local data and the dispatch of events, critical message consumers will not receive the event. Important business processes may not be triggered.

It’s possible an Order becomes paid without order fulfillment processes becoming triggered. The customer now must contact support and ask why their purchase stalled.

Preferred Approach

To mitigate these issues, prefer the following techniques.

Encapsulate State Changes in Models

Couple and encapsulate relational and distributed state changes.

Don’t:

mutate the order model
call save()
then dispatch an event

Instead, encapsulate the distributed state change within the data model itself. So that there is only ‘end point’ for this state change.

public function pay($paidAt): void
{
    // assign paid at timestamp
    $this->recordedEvents[] = new OrderWasPaid(...);
}

public function flushEvents(): Collection
{
    $eventsToFlush = $this->recordedEvents;
    $this->recordedEvents = [];

    return Collection::of($eventsToFlush);
}

Now, we have encapsulated both local state changes and distributed state changes into a single entry-point, making it impossible to pay an order without buffering the correct event for dispatch.

Exclusively Use Repositories and Transactional Outbox Event Dispatch

Do not use the Active Record save() method outside of repositories. Implement a repository interface.

interface OrderRepository
{
    public function findById($orderId): Order;
    public function store(Order $order): void;
}

Within the repository implementation:

begin a database transaction
store the local state changes
dispatch the pending events to an outbox table
commit the database transaction

class OrmOrderRepository implements OrderRepository
{
    public function __construct(
        private readonly OutboxEventDispatcher $outboxEventDispatcher,
    ) {
    }

    public function store(
        Order $order
    ): void {
        /* 1. Open a database transaction */
        DB::transaction(function() use ($order) {
            /* 2. Store the model */
            $order->save();
            
            /* 3. Dispatch Events to a relational db table. */
            $this->outboxEventDispatcher->dispatch(
                $order->flushEvents()
            );
        });
    }
}

If you aren’t familiar with the outbox pattern. Then I recommend immediately learning about transaction outbox dispatch from Frank de Jonge. The core concept is that events are stored in a database table in the same commit with local state changes. An external “relayer" process then reads from this table and dispatches the events to a message bus, resulting in improved resiliency.

Exclusively using this OrderRepository provides important guarantees.

If a transaction state change occurs, the related event WILL be stored in the outbox table for dispatch. This is true no matter how many use cases trigger pay behavior.
Event dispatch isn’t subject to network connectivity errors between the application and message bus. If an error occurs, the messaged will be dispatched once the system recovers.

Principle: Prefer Passing Independent Variables to Methods.

Because data models carry an incredible amount of context, they make poor function arguments. Unit test suites suffer in applications which make heavy use of data model function arguments which can result in systems which resist refactoring.

Typical Approach

Often, functions accept data models, which may read or write to any field, and then pass the models on for further mutation.

class FirstBehavior
{
    public function do(...): Model
    {
        $entity = $this->createEntity(...);
        
        // other validations
        // other assignments
        
        $this->secondBehavior->do($entity);
    }
}

class SecondBehavior
{
    public function do(Model $entity): Model
    {
        // do stuff using the $entity
    }
}

While this is sometimes desirable or unavoidable, it comes at a cost.

Unit test suites must build a data model which matches the function’s expectations.
The code becomes resistant to refactoring as functions become part of chains of dependent events.

Preferred Approach

Avoid passing data models to functions where practical.

Prefer Flat Structures

If we receive a request for an order, we first process the order, then we process a charge for the order.

The following example couples the processing of the order with the processing of the charge using nesting.

Nesting Calls
-> ProcessOrder -> ProcessCharge

class ProcessOrder
{
    public function process(...): Order
    {
        $order = Order::create(...);
        
        // validations
        // mutations
        // decisions  
     
        $this->processCharge->process($order);
        
        return $order;
    }
}

class ProcessCharge
{
    public function process(...): Charge
    {
        // ...
        return $charge;
    }
}

The following example processes the order and the charge independently, utilizing a third scope (implemented as a command handler) to represent the whole of the process.

Instead of tracing through ProcessOrder and following it to ProcessCharge in order to understand the steps of this business process, we now have a single location which represents the process.

Instead of using nested calls, we have made the process more flat.

class ProcessOrder
{
    public function process(...): Order
    {
        // ...
        return $order;
    }
}

class ProcessCharge
{
    public function charge(...): Charge
    {
        // ...
        return $charge;
    }
}

class MakeOrderCommandHandler
{
    public function handle(MakeOrder $command): void
    {
        $order = $this->processOrder->process(...);
        $charge = $this->processCharge->process(...);
    }
}

Flat Calls
-> ProcessOrder
-> ProcessCharge

The coupling between ProcessOrder and ProcessCharge has been removed from and relocated to MakeOrderCommandHandler.

Prefer to Pass Individual Fields Instead of Data Models

The following examples show the difference between passing data models as arguments versus individual fields.

function makeDecision(Order $order): void
{
    if (
        PaymentMethod::CreditCard()->equals($order->paymentMethod())
    ) {
        // do thing
        return 'outcome 1';
    }
    
    // do other thing
    return 'outcome 2',
}

function usingArgument(PaymentMethod $method): void
{
    if (PaymentMethod::CreditCard()->equals($paymentMethod))
    {
        // ...
    }
}

Some benefits of only passing the necessary arguments:

Testing in isolation is far simpler. There’s no need to create dummy models or to interpret the state that they need to be in.
The tests become less brittle. They’re coupled to fewer vectors for change.
It encourages a flatter, easier to comprehend design.

Summary

These are just some mitigation patterns for improving reliability and reducing lead-time for changes.