Inspired by Felix Geisendorfer blog post I implemented a database FSM (Finite-State Machine) with Postgresql. I brought some improvements to Felix’s implementation but before reading the following I recommend you to read carefully the original post.

TL;DR Here are the changes I did:

  • Introduced a versioning control of the FSM.
  • Optimized performances and data storage.
  • Added some protection against user mistakes

To keep it simple I’ll use exactly the same FSM graph than Felix: an ordering process with payment and shipment steps.

Types optimization

First I replaced the storage of states/events from text to enum. Because I have a finite number of states and transitions I can properly store them as custom enum. It reduces the storage of these to a lighter and constant 4 bytes.

CREATE TYPE order_state AS ENUM (
  'start',
  'awaiting_payment',
  'awaiting_shipment',
  'awaiting_refund',
  'shipped',
  'canceled',
  'error'
);
CREATE TYPE order_event AS ENUM (
  'create',
  'pay',
  'ship',
  'refund',
  'cancel'
);

If later I have to add new state or event I can easily do:

ALTER TYPE order_event ADD VALUE 'my_new_event';

Obviously it impacts the order_events table:

CREATE TABLE order_events (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  order_id uuid NOT NULL DEFAULT uuid_generate_v4(),
  event order_event NOT NULL DEFAULT 'create',
  time timestamp DEFAULT now() NOT NULL
);

As you noticed I replaced the order_id type from INT to uuid. In brief because it’s out of scope: I never use serial for fields going out of database.

Transition mechanism improvements

In his implementation, Felix Geisendorfer is using a switch statement to implement the state transition. It’s a fine and straightforward solution. However, if you have more states/events it could become hard to maintain. Furthermore if you want to version your FSM, you’ll often have to CREATE OR REPLACE FUNCTION. Instead I created a mapping table with 3 columns:

  • state: the current state
  • event: the event which causes the transition
  • next_state: the resulting state
CREATE TABLE order_events_transitions (
  state      order_state NOT NULL,
  event      order_event NOT NULL,
  next_state order_state NOT NULL,
  PRIMARY KEY (state, event, next_state)
);

AN: I can limit transitions to only one path for two given states with PRIMARY KEY (state, event).

Any transition which is not in this table will be resolved as error state in our new transition function:

CREATE FUNCTION order_events_transition(_state order_state, _event order_event)
RETURNS order_state LANGUAGE sql AS $$
SELECT COALESCE(
  (SELECT next_state FROM order_events_transitions WHERE state=_state AND event=_event),
  'error'::order_state
);
$$;

AN: There is maybe a better way then “COALESCE” to implement the “default”…

OK! It’s now time to write valid transitions:

INSERT INTO order_events_transitions VALUES
  ('start',             'create', 'awaiting_payment' ),
  ('awaiting_payment',  'pay',    'awaiting_shipment'),
  ('awaiting_payment',  'cancel', 'canceled'         ),
  ('awaiting_shipment', 'cancel', 'awaiting_refund'  ),
  ('awaiting_shipment', 'ship',   'shipped'          ),
  ('awaiting_refund',   'refund', 'canceled'         );

FSM graph versioning

In real life my use cases and business processes are moving (fast). Probably yours as well. Since the FSM I build is the implementation of this processes, each change IRL impacts my database design.

For instance a state awaiting_approval could be added before allowing to pay.

If I change my transition function and table without taking care of past order_events, I will completly break my FSM. Indeed because I’m moving the create event, all the already recorded transitions will raise an error state. It’s not maintainable!

How to handle multiple versions of my FSM graph ?

Introducing version control in my design will solve this issue.

First I create a table to store the versions and their status:

CREATE TYPE order_fsm_version_status AS ENUM (
  'live',
  'deprecated',
  'obsolete'
);
CREATE TABLE order_fsm_versions(
  version integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  status  order_fsm_version_status NOT NULL DEFAULT 'live'
);

In the meanwhile I create a function to get the last FSM up-and-running version:

CREATE FUNCTION order_fsm_last_version()
RETURNS integer LANGUAGE sql AS $$
SELECT version FROM order_fsm_versions WHERE status='live' ORDER BY version DESC LIMIT 1;
$$;

Both order_events and order_events_transitions will need to reference this version:

CREATE TABLE order_events (
  id       bigint      GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  order_id uuid        NOT NULL DEFAULT uuid_generate_v4(),
  event    order_event NOT NULL DEFAULT 'create',
  version  integer     NOT NULL DEFAULT order_fsm_last_version(),
  time     timestamp   NOT NULL DEFAULT now(),
  FOREIGN KEY (version) REFERENCES order_fsm_versions(version)
);

CREATE TABLE order_events_transitions (
  state      order_state NOT NULL,
  event      order_event NOT NULL,
  version    integer     NOT NULL DEFAULT order_fsm_last_version(),
  next_state order_state NOT NULL,
  PRIMARY KEY (state, event, version, next_state),
  FOREIGN KEY (version) REFERENCES order_fsm_versions(version)
);

The transition function and its aggregate also have to take care of this version number:

CREATE FUNCTION order_events_transition(
  _state order_state,
  _event order_event,
  _version integer DEFAULT order_fsm_last_version()
) RETURNS order_state LANGUAGE sql AS $$
  SELECT COALESCE(
    (SELECT next_state
     FROM order_events_transitions
     WHERE state=_state AND event=_event AND version=_version),
    'error'::order_state);
$$;
CREATE AGGREGATE order_events_fsm(order_event, integer) (
  SFUNC = order_events_transition,
  STYPE = order_state,
  INITCOND = 'start'
);

And finally we have to restrict some transitions according to version status. It is done in the order_events_trigger:

CREATE FUNCTION order_events_trigger_func() RETURNS trigger
LANGUAGE plpgsql AS $$
DECLARE
  next_state order_state;
  transition_status order_fsm_version_status;
BEGIN
  SELECT status FROM order_fsm_versions WHERE version=new.version INTO transition_status;
  IF transition_status = 'deprecated'::order_fsm_version_status THEN
    RAISE NOTICE 'version % is deprecated', new.version;
  END IF;
  IF transition_status = 'obsolete'::order_fsm_version_status THEN
    RAISE EXCEPTION 'version % is obsolete', new.version;
  END IF;

  SELECT order_events_fsm(event, version ORDER BY id)
  FROM (
    SELECT id, event, version FROM order_events WHERE order_id = new.order_id
    UNION
    SELECT new.id, new.event, new.version
  ) s
  INTO next_state;

  IF next_state = 'error'::order_state THEN
    RAISE EXCEPTION 'invalid order event';
  END IF;

  RETURN new;
END
$$;

I can now introduce new versions of my FSM graph and deprecate previous one. When I insert row in order_events it is using by default the last live version of the graph.

INSERT INTO order_fsm_versions (version, status) VALUES (2, 'live') RETURNING *;
UPDATE order_fsm_versions SET status='deprecated' WHERE version = 1;

INSERT INTO order_events_transitions (state, event, next_state, version) VALUES
  ('start',             'create',  'awaiting_approval', 2),
  ('awaiting_approval', 'approve', 'awaiting_payment',  2),
  ('awaiting_payment',  'pay',     'awaiting_shipment', 2),
(...)

INSERT INTO order_events (order_id, event, time) VALUES
  ('0d687d74-bab3-4c76-beed-0f55ec8a3af2', 'create',  '2017-07-23 00:00:00'),
  ('0d687d74-bab3-4c76-beed-0f55ec8a3af2', 'approve', '2017-07-23 00:00:00'),
  ('0d687d74-bab3-4c76-beed-0f55ec8a3af2', 'pay',     '2017-07-23 12:00:00');

Conclusion

I wrote a small SQL script which is testing all these features. I hope it’s useful even if you can’t use it out of the box. Let me know if you have feedback.

Finally, keep in mind that it is still experimental. Consider it as a POC and not like something production-ready!

R.