Sharing Entities Between Services

In any system that contains multiple services, there’s a number of different ways to segregate services and the contexts (objects) over which they operate. Whether or not you’re actively doing domain modeling with bounded contexts, or shooting from the hip, eventually you’re probably going to have a situation where multiple services care about the same object – or projections of the same object.

Think of a profile service and a billing service. Both might be interested in the customer object, and both might need to know when the name changes. The customer might go into their profile on the public website and change their name, or they might call into the billing department and tell an agent about their name change (say, if they get married for instance). It’s not unreasonable for the profile service to back the public website and the billing service to back the software that the billing team uses.

What follows is an architecture that I’ve worked on with our enterprise architect at my current client, that allows services to maintain their own projections of entities that cross bounded contexts/services and stay eventually consistent with each other.

Publish the event; Query the change

The idea is pretty straightforward: each service has a database where it keeps its entities, and whenever it needs to update one of those entities, it pushes that change to the service that is the source-of-truth for that particular entity. From there, the source-of-truth service will publish a change event that is consumed by all other subscribers via a fanout or topic exchange.

The change event contains the id of the entity that was changed, and subscribers then query for the latest state of the entity from the source-of-truth service.

An Example

Here’s an example sequence diagram to talk through the primary scenario. Here’s what’s happening… There are 3 services: Billing, Profile, and Other, and they all care about a customer object. Billing is considered the source-of-truth service for a customer.

I couldn’t think of a name for another service that would care about customers
  • First Profiles starts up and GETs all of its customer data from Billing (this would only happen once)
  • Second Other starts up and GETs all of its customer data from Billing (again, this would only happen once)
  • Profiles makes a change to a customer, so it PUTs an update to a customer on Billing. Billing returns a 200 with the updated customer, idiomatically for a RESTful service interface.  
  • Billing saves the change and publishes the Id of the customer to the relevant exchange
  • Profiles receives the event to update the customer
    • Profiles GETs the customer from billing and upserts it into its local database
  • Other receives the event to update the customer
    • Other GETs the customer from bulling and upserts it into its local database
  • All services are consistent

It’s important to note that the response from Billing to Profiles and Other could be different, could be the same, and Profiles/Other could save their own projection of the customer. I would love to (and intend to) try this with Billing serving a GraphQL endpoint.

What I like about it…

  • Requests no-longer need to leave the service to get dependent data. They query their own database for the latest they have. No aggregate latency.
  • Requests that need searching/sorting/filtering/paging can all be done in-service a single database, against only the data that the service cares about.
  • Each service can still be designed like any other RESTful service and be exposed to public consumers without having to make the exchange(s) available.
  • It scales well and easily. Any number of services can subscribe to the same exchange and get updates and the scaling factors are a result of queueing technology you’re using (ie. RabbitMQ, ActiveMQ, Azure, AWS, etc.).
  • It’s consistent. All services receive updates the same way and all services get the same data by querying the source. Spinning up new services that depend on data is fast and safe.
  • It’s easy to monitor and maintain in production, again as a result of the queuing technology you’re using.
  • Services can be truly isolated and developed independently, from the service, backend, and database. This lets them also use the most appropriate technology for that particular service.
  • It’s easy to understand.

What I dislike about it…

  • It’s really chatty
    • In order to avoid race conditions and similar issues, you can’t send the change in the event. So, every change to an entity sends a message to a client, which results in a request from that client. There are options that can be considered to mitigate this, but they break a lot of the simplicity of the solution.
    • A service posting a change to an entity, also receives the event to update the entity. You can include something in the message indicating the originating service so that a service ignores its own updates. You can’t use routing keys in the exchange to exclude a subscriber though.
  • Initial seeding of data. This introduces, but does not fully address, the issue where a service needs to sync all or most entities of a certain type. Possible solutions that I’ve looked at are:
    • SSIS package that runs on service start-up
      • Easy to write
      • Annoying to maintain
      • Annoying to deploy
    • Bulk GET endpoints (probably with paging/filtering) on the source-of-truth service
      • Easy to implement
      • Easy to deploy
      • Easy to maintain
      • Not necessarily efficient
  • It requires eventual-consistency in order for it to be a valid solution.
  • Data is duplicated, a lot, across the system.

Conclusion

Well there you have it. This architecture is currently in development and we’re starting to see it proven out in a live system. In the next post, I’ll show and talk through a proof-of-concept implementation that can be shared publicly.

Hope it helps!

Query-by-POST

Just a quick one today.

When trying to implement a REST endpoint that does some filtering, it’s generally pretty easy and obvious. Just add filters as query string parameters. For example:

GET /api/employees?lastName=Smith

… and the response should be an HTTP 200, with a collection of employees with the last name Smith. Standard fare. You can continue to add filtering parameters and it’s all really straightforward.

What about if you wanted to query for all employees named Smith, that started before 01/01/2019? That’s more like a search than a filter. For searching, a common pattern exists that some of my peers and I have come to call “Query-by-POST”. I can’t seem to find decent documentation on it, so I’m doing that now. It looks something like the following:

POST /api/employees/searches
{
"lastName" : "Smith",
"hireDate" : {
"lessThan" : "01/01/2019"
}
}

… and the response is:

HTTP/1.1 303 See Other
Location: https://.../searches/results/<id>

That is, you’re POSTing a new search to the API, and the API is returning a redirect to the results it created.

The Id of the search results can be anything you want. Ideally, it should actually represent some kind of resource. I’ve used an encoded list of the ids from the search result, and it worked well. If it’s computationally expensive, then you can persist the results as any other resource as well. In order for the endpoint to be RESTful though, you should get back the same resource (results) each time you call the results endpoint.

Under inspection, one thing probably looks odd: you’re POSTing to the searches collection and redirecting to the search results resource… instead of just returning search results to the original request. That’s twice as many HTTP Requests as when you POST to create any other resource. Here’s why…

Normally, when you POST a resource to an endpoint like this:

POST /api/employees
{
“id” : “”,
“firstName” : “Jane”,
“lastName” : “Smith”,
“hireDate” : “12/30/2018”
}

You will often get back a 201 with the employee object that contains the populated Id. You’re POSTing the object to the same collection that it will be located at.

With the search endpoint, you are actually creating a request for the system to create search results for you. If it were going to return anything in the body to that endpoint, it could reasonably return your search object (with the lastName and hireDate comparison) as well as the Location header, and it would be idiomatically RESTful. Because it’s actually/logically creating a resource somewhere else, it redirects you to it.

Hope it helps!