In any system that contains multiple services, there’s a number of different ways to segregate services and the contexts (objects) over which they operate. Whether or not you’re actively doing domain modeling with bounded contexts, or shooting from the hip, eventually you’re probably going to have a situation where multiple services care about the same object – or projections of the same object.
Think of a profile service and a billing service. Both might be interested in the customer object, and both might need to know when the name changes. The customer might go into their profile on the public website and change their name, or they might call into the billing department and tell an agent about their name change (say, if they get married for instance). It’s not unreasonable for the profile service to back the public website and the billing service to back the software that the billing team uses.
What follows is an architecture that I’ve worked on with our enterprise architect at my current client, that allows services to maintain their own projections of entities that cross bounded contexts/services and stay eventually consistent with each other.
Publish the event; Query the change
The idea is pretty straightforward: each service has a database where it keeps its entities, and whenever it needs to update one of those entities, it pushes that change to the service that is the source-of-truth for that particular entity. From there, the source-of-truth service will publish a change event that is consumed by all other subscribers via a fanout or topic exchange.
The change event contains the id of the entity that was changed, and subscribers then query for the latest state of the entity from the source-of-truth service.
Here’s an example sequence diagram to talk through the primary scenario. Here’s what’s happening… There are 3 services: Billing, Profile, and Other, and they all care about a customer object. Billing is considered the source-of-truth service for a customer.
- First Profiles starts up and GETs all of its customer data from Billing (this would only happen once)
- Second Other starts up and GETs all of its customer data from Billing (again, this would only happen once)
- Profiles makes a change to a customer, so it PUTs an update to a customer on Billing. Billing returns a 200 with the updated customer, idiomatically for a RESTful service interface.
- Billing saves the change and publishes the Id of the customer to the relevant exchange
- Profiles receives the event to update the customer
- Profiles GETs the customer from billing and upserts it into its local database
- Other receives the event to update the customer
- Other GETs the customer from bulling and upserts it into its local database
- All services are consistent
It’s important to note that the response from Billing to Profiles and Other could be different, could be the same, and Profiles/Other could save their own projection of the customer. I would love to (and intend to) try this with Billing serving a GraphQL endpoint.
What I like about it…
- Requests no-longer need to leave the service to get dependent data. They query their own database for the latest they have. No aggregate latency.
- Requests that need searching/sorting/filtering/paging can all be done in-service a single database, against only the data that the service cares about.
- Each service can still be designed like any other RESTful service and be exposed to public consumers without having to make the exchange(s) available.
- It scales well and easily. Any number of services can subscribe to the same exchange and get updates and the scaling factors are a result of queueing technology you’re using (ie. RabbitMQ, ActiveMQ, Azure, AWS, etc.).
- It’s consistent. All services receive updates the same way and all services get the same data by querying the source. Spinning up new services that depend on data is fast and safe.
- It’s easy to monitor and maintain in production, again as a result of the queuing technology you’re using.
- Services can be truly isolated and developed independently, from the service, backend, and database. This lets them also use the most appropriate technology for that particular service.
- It’s easy to understand.
What I dislike about it…
- It’s really chatty
- In order to avoid race conditions and similar issues, you can’t send the change in the event. So, every change to an entity sends a message to a client, which results in a request from that client. There are options that can be considered to mitigate this, but they break a lot of the simplicity of the solution.
- A service posting a change to an entity, also receives the event to update the entity. You can include something in the message indicating the originating service so that a service ignores its own updates. You can’t use routing keys in the exchange to exclude a subscriber though.
- Initial seeding of data. This introduces, but does not fully address, the issue where a service needs to sync all or most entities of a certain type. Possible solutions that I’ve looked at are:
- SSIS package that runs on service start-up
- Easy to write
- Annoying to maintain
- Annoying to deploy
- Bulk GET endpoints (probably with paging/filtering) on the source-of-truth service
- Easy to implement
- Easy to deploy
- Easy to maintain
- Not necessarily efficient
- SSIS package that runs on service start-up
- It requires eventual-consistency in order for it to be a valid solution.
- Data is duplicated, a lot, across the system.
Well there you have it. This architecture is currently in development and we’re starting to see it proven out in a live system. In the next post, I’ll show and talk through a proof-of-concept implementation that can be shared publicly.
Hope it helps!