The Neos CR (Content Repository) is the piece of functionality that Neos relies on to build its content tree. Without the Neos CR, no content could be created and no page could be rendered. It has worked very well for us in the past and still does, but we find it increasingly hard to add the features which we envision for Neos in the next years. For quite some time, we have been experimenting with the CQRS and Event Sourcing patterns to tackle these issues.
To gain deeper insights, we met with Mathias Verraes, locked him and us in a room at DE-CIX for two days, and thought about how to remodel the Neos CR using CQRS/ES. We went into the workshop with lots of ideas and lots of questions at the same time – and we wanted to challenge our experiments we've created so far. And we have been challenged (in a great way)!
DDD and CQRS
Neos and Flow have been among the loudest advocates for Domain Driven Design for almost as long as it has existed, and the paradigm is deeply rooted within the way Flow and Neos work. Flow is a DDD framework by heart, and consequently many Flow / Neos developers are very comfortable thinking in these patterns. Designing systems using entities, repositories, services and such is something that comes very naturally to us.
Because of that, discovering just how different the write and read side of the data model can be was, at first, surprising for us. Before the workshop, we had used our DDD-inspired understanding of aggregates as "collections" of entities. We had built quite large aggregates that dealt with every issue related to the things we considered entities before. This approach, however, resulted in large classes which basically replicated what had been done with entities before. Mathias encouraged us to make aggregates as small as possible, and have them deal with exactly one thing - such as claiming a unique URL. A very basic, but all the more important insight was that not everything has to be in an aggregate.
„CQRS aggregates are not entities. They are boundaries that encapsulate a problem space and deal with exactly one thing.“
With our new understanding, we are now able to design systems that have an extremely decoupled write model. It should be "pure", which means we aim towards building small components that do not have side effects. We use aggregates only to ensure that hard constraints are met, such as "a user name must be unique". Aggregates are the places where the inherent asynchronicity in CQRS/ES systems is resolved, by making them responsible for exactly one specific use-case. A "UsernameUniquenessAggregate" would know about every event relevant to it, and, from its aggregate history, would respond to that event in the correct way - in our example, by allowing the assignment of the username or denying it.
Constraints, Hard and Soft
A new way of thinking about constraints quickly emerged. From our discussion about aggregates, we learned that there is, in reality, a surprisingly small number of "hard" constraints in our system. Hard constraints are things that absolutely, positively need to happen in a certain way (such as, assigning a username to only one user) to ensure the consistency of the system. There are, however, a great many things that only seem to be hard constraints at first sight, but are actually not.
How bad is it really, for example, if a node is renamed by two users at the same time? Do we need an aggregate to ensure that these events happen in a specific order, and are they dependent upon each other? They are not, of course. The node will be renamed and have either one or the other name - no matter what happens, we can only fulfill the request of one user in case of conflicting operations.
Consistency - Eventually
Thinking about constraints necessarily leads to thinking about consistency. How bad is it, really, if the system is not immediately consistent - meaning that after a write occurs, reads yield the same result? Your first instinct would be to respond very!, right? We see things a bit differently now. Is it really bad if the search index is not updated immediately, or even if it takes a few seconds after the user clicked "publish" until his changes are actually available for other users? Even our username example from above might not be a hard constraint after all, if we implement the system to be able to deal with such cases. Looking at consistency in a new way allows to loosen constraints, which leads to reduced complexity in the write model.
A Place for Cross-Cutting Concerns
We furthermore discussed cross-cutting concerns such as security or logging. In a CQRS system, the object to handle them is the command pipeline. The command pipeline is a place where commands are put before a command handler processes them. Services (such as an authorization service) can subscribe to the command pipeline to perform additional checks or execute functionality on commands before they are turned into events. However, since these services can only have access to the projections and therefore to the read side of the model, there is no guarantee that the data they read will be up to date that very moment - there could have been an event that will change that exact piece of data, but it hasn't been projected yet. Therefore, only soft constraints should be handled by services that subscribe to the command pipeline.
If you've followed along, you probably just raised an eyebrow. Authorization as a soft constraint? There's no guarantee that it will work as expected? What a mess!
Well, no. Eventual consistency does not mean that soft constraints are ensured "sometimes", or "maybe". It means that they are always ensured once the system is in a consistent state.
There can be a gap - when an event has been recorded, but the read model has not been updated yet - where a soft constraint cannot be enforced, but the important thing to remember is that once the read model has been updated, it will be. Now, we need to think about the gap - is it very important that, if an administrator takes away your permission, you are not able to perform an action if you try it 100ms later? In most scenarios, the answer is probably no. Once the system is consistent again - maybe that takes 200ms, maybe a second - soft constraints will be enforced.
Our heads are still spinning a bit from all the new stuff we've learned during our workshop, so we'll probably need a few days to digest that. Mathias, thank you very much for discussing with us and bringing our understanding of CQRS/ES to the next level. This will help us a great deal in our goal to rewrite the Neos CR as an eventsourced, decoupled application that will hopefully allow us to implement the Features that we envision for Neos in the future.
Additionally, join the Neos Conference 2017 if this is interesting to you; we'll cover CQRS and Event Sourcing in depth there!