This blog is about Java (advanced Java topics like Reflection, Byte Code transformation, Code Generation), Maven, Web technologies, Raspberry Pi and IT in general.

Donnerstag, 18. Juni 2015

The Importance of structuring Microservices


It's very important to group microservices very well. Otherwise you will get a mess. Several microservices should form a larger part of the functionality.

Example: Image Functionality in a Microservice System
Lets approach the problem by example. Assume we want to build an image processing functionality for our system. Each microservice should do exactly one thing. So we will end up in the following microservices:

  • image scaling
  • image watermarking  
  • image storing
  • image retrieving 

All these image services are very tightly coupled - the storing microservice and the retrieving microservice uses the same database and the same files. Of course it should be possible to use each of them in another context or by itself. However in your own system you will consider it as an image-sub-system. Therefore this sub-system should have its own abstraction. Its own API - this API is an own microservice.

All non image related microservices don't know anything about the individual image microservices. They know only about the image API microservice. This API could group together the typical use cases. For example to resize an image and watermark it.
You don't want to have this code in any other place than in your image microservices. Otherwise your complete microservice system is very tightly interconnected. This is very bad. Imagine that all images will be scaled & watermarked and that you have performance issues because of high network traffic. One simple optimization is to merge the scaling microservice and the watermarking microservice together. Then you have to send the image over the network one time less. If you utilise an image API microservice then the code must be changed only once and only one microservice needs to be redeployed. On the other hand if all other microservices used the internal image microservices then you have to change the code several times and several microservices need to be redeployed.

Abstraction like in a monolith
This abstraction and grouping of functionality is nothing else like modules in monoliths. In a monolith all the image functionality would be in a package "image". This package would have following sub packages: "scaling", "watermarking", "storing" and "retrieving". Likewise in the microservice architecture you don't want other parts of the monolith use these sub packages directly. Otherwise you have interconnected modules and it will be hard to change internal details of the image implementation. Everything should be behind a high level interface and everything else should be hidden and not be used anywhere else. Then it's possible to rewrite the complete image package without the need to adapt code in other packages. The only thing you need to take care of is to support the interface.

I think it's very appealing that this concept of abstraction is assignable to a monolith and to a microservice architectures. If the concepts were completely different then something is most likely wrong.

The module abstraction is often violated in monoliths and the complete codebase is very interconnected. It's bad, but it's manageable. Because you have the complete code in your IDE with all the useful refactoring tools. But if you have separated code bases it's very hard to refractor anything. So do it right in the first place and don't move it to a later date.

Microservice grouped by Domains
One way to group the system is to group it by domains. A domain is a (mostly) independent part of the system. It's very important that you get your domains right. Because if you don't then the problem is that for nearly every new feature several domains must be changed. But this should be only an exceptional case.
For example the user part of a system consists of: login, registration, password reset mail, user profile and so on. All these basic user functions are more or less equal for each system. Therefore the user-sub-system should be also usable for any other system. This only works if this domain is completely independent. The domain is not allowed to have any dependency to any other domain. Otherwise you have to remove those dependencies to make the user-sub-system usable for another system.
Another domain could be the product-domain. In this domain all the data of a product is saved: price, stock count, description and so on. The user-domain and the product-domain must be independent of each other.

How to deal with features that require two domains
Lets approach the problem again with an example: a shopping cart. A shopping cart is individual to each user. Therefore it must be in the user=domain. The problem is now that you also need the data from the product-domain. Otherwise the shopping cart is quite useless if you can't see the details of the stored products. But since the user-domain isn't allowed to have a dependency to the product-domain how to solve this problem?

Actually it's quite simple: with an aggregation microservice. In the user-domain all stored data is related to the user like added-date, user-id and so on. There is only one exception: the product-id is also stored.
The user-domain provides an API to get all items from a specific user. These API will be consumed by an aggregation microservice. The product-ids will be collected from the aggregation microservice and then the details of the products will be fetched from the product-domain. The data will be merged and then the shopping cart can be displayed or the data can be passed to the next microservice. By that you can implement the shopping cart feature so that both domains are still independent.

The aggregation microservice/layer is nothing else than a Lego™ block which connects two other Lego™ blocks together. If you use third party APIs then you do exactly the same. You create some kind of Lego™ block which aggregates the data from several third party APIs. It helps to think if you treat your own microservice APIs like they are third party APIs. Because then you will get the independence of the microservices right. You won't violate the independency.

Performance Issue with the Aggregation layer
Let's assume that it's common that a shopping cart has thousands of items. Additionally you need to support to sort the items in a shopping cart by price. Then the aggregation microservice has to load thousands of items from the user-domain, load thousands of product details from the product-domain, merge the data, sort them by price and then throw away thousands of items except for 100 which will be showed to the user (pagination). That's very inefficient.

The only way to solve the problem is with data duplication. It's necessary to store the price of the item in the shopping cart database as well. Then you can do the pagination in the database and only get out the right 100 items.
That means that you have to supply the additional data if an item is stored into the shopping cart. Additionally the data in the shopping cart database must be updated if the price is changed in the product database. To do that you need events. If a product is updated an event must be thrown. An event listener will take this event and update the price in the shopping cart database.

Eventual Consistency
This means that you have only an eventual consistency. Because the price could be changed but the corresponding event could be still in the queue. Then the item will sorted in a wrong way. The price itself is correct, because the price which will be displayed will be loaded from the product-domain with the other product details.
There is no way around this problem. Only if you use just one database, do distributed transactions, or do the inefficient loading of thousand records. But that conflicts with scalability and/or the microservice architecture mindset. Therefore it must be alright for all data you duplicate that they are shortly out of sync. In some cases you can't do it. For example payment data. But then don't duplicate the data.

Conclusion
You need guidelines which microservices are allowed to talk to which microservices. Structure the communication workflow and use grouping and abstraction. Otherwise you will not know which microservices rely on a specific microservice - directly or indirectly. That's a very bad spot to be in. Another thing you probably need is a good monitoring. So that you can see what microservice calls were made for a user request and how long they take. Because if you don't have this data then you have a very hard time to find performance issues. Consider to use Zipkin.
Design the system very careful and have good high level documentation for each microservice. 
If you get all of the things right then you should be fine and have much fun with your microservice system :-)

Kommentare:

  1. Dieser Kommentar wurde vom Autor entfernt.

    AntwortenLöschen
  2. I've read with interest this post and found the image service a very good pattern on how to hide the complexity of several operations done behind the scenes in the image service.

    The cart service could be expanded to hold reference to payment service, because the end-purpose of a cart is not to sum up products (like a grocery list), but to be checked out.
    It would be nice to see visually instead of textually the interactions (image processing, cart check-out) between the services for achieving their purposes.

    AntwortenLöschen