- How do you choose the best database for a microservice?
- The CAP Theorem for Distributed Databases
- **Database vs. Service Requirements**
- What are the tips for choosing the correct database for a microservice?##
- Tip #1 - Consider the CAP Theorem
- Tip #2 - Gather all requirements upfront
- Tip #3 - Use Amplication 😁 💜
- Wrapping up
- Can microservices use multiple databases?
- Can microservices use SQL databases?
- Should I use a relational or a NoSQL database for my microservice?
- What are the trade-offs between using a single database for all microservices and multiple databases?
Microservices have been the go-to application architecture that many software projects have adopted due to the numerous benefits they offer, ranging from:
- Service decoupling
- Faster development times
- Faster release times
- Tailored datastores
Hence, developers can select the right tools and platforms that help deliver the best performance in each specific microservice. One aspect to consider when doing so is eliminating the use of a monolithic data-store architecture in the application. Microservices favour independent service components where each service can run on its own runtime and connect to its own database.
This means you're encouraged to share data between microservices rather than using an extensive single database for all your microservices, as shown below.
**Figure: A microservices architecture**
However, this raises the question, How should you pick the correct (distributed) database for each microservice?
How do you choose the best database for a microservice?
To answer this question, you need to understand that different types of databases are made to cater to different purposes and requirements.
Therefore, you must consider factors such as performance, reliability, and data modelling requirements in your decision-making process to ensure that you select the correct database.
The CAP Theorem for Distributed Databases
It's important to understand that when selecting a database, you must consider its Consistency, Availability, and (network) Partition tolerance capability.
This is also known as the CAP Theorem, and it's vital to be aware that there are tradeoffs in database design where one of these factors will always be impacted by the other two. In a nutshell, the CAP theorem proposed that any database in a distributed system can have some combination of the following properties:
- (Sequential) Consistency: Distributed Databases that satisfy this property will always return the same data (latest committed data) from all DB nodes/shards, which means that all your DB clients will get the latest data regardless of the node they query.
- Availability: Distributed Databases that satisfy this property guarantee to always respond to read and write requests in a timely manner from every reachable node.
- (Network) Partition Tolerance: Distributed Databases that satisfy this property guarantee to function even if there is a network disconnection between the DB nodes (which partitions the DB nodes into two or more network partitions).
These three factors make up modern distributed databases, but the CAP Throrem states that no database can satisfy all three characteristics. Any database implementation can choose two of those characteristics at the expense of the third.
Distributed Databases therefore fall into one of the following combinations:
- CA (Consistency + Availability): Your database can serve the most recent data from all the nodes while remaining highly available.
- CP (Consistency + Partition Tolerance): Your database can serve the most recent data from all the nodes with a high resilience to network errors.
- AP (Availability + Partition Tolerance): Your database nodes always respond timely and can respond well even in the face of network failures. But it doesn't guarantee returning the last updated data from every node. These databases adopt a principle known as "Eventual Consistency," where the data is replicated eventually and not instantly (eventual consistency is a weaker form of consistency compared to sequential consistency, which is the "C" in CAP Theorem).
So, it's essential to understand the CAP theorem before selecting a database. The table below showcases some popular distributed databases according to their "CAP Theorem preference".
By evaluating your non-functional requirements, you can use this as a guide to understanding the direction you need to look at.
Figure: CAP Theorem preferences in popular databases
Database vs. Service Requirements
I've covered this topic above, but apart from the CAP Theorem, it's essential to understand that selecting the correct database for your microservice ultimately depends on your service requirements. This is also known as polyglot persistence. It's where you utilize different databases for different services depending on the requirement of each service.
For example, your microservice might be read or write-intensive, need rapid scaling, or simply high durability. Therefore, it's essential to understand your requirements clearly before deciding on a database.
Performance (Read/Write) Requirements
The first aspect you may need to look at is performance.
If you're building a microservice that needs to be high-performing, you'll likely need a database that can meet that exact demand.
For example, suppose you're building your microservice using an API Gateway and AWS Lambda. In that case, your service can scale infinitely, so you'll need a database that can scale as your Lambda functions scale. If you fail to do so, you'll create a bottleneck in your database-level service, which could lead to inter-service latencies and timeout errors as your system cannot scale.
So, in such cases, it's essential to consider the number of IOPS (Input/Output Operations Per Second) your service will process. Here are some typical numbers for operations per second:
- Very high — Greater than one million IOPS
- High — Between 500,000 and one million IOPS
- Moderate — Between 10,000 and 500,000 IOPS
- Low — Less than 10,000 IOPS
So, it's essential to consider the IOPS you'll be processing in your service before picking a database.
The next requirement to look at is latency. Latency refers to the delay that has occurred when serving a read/write request.
For latency, the typical numbers are:
- Low — Less than one millisecond
- Moderate — one to 10 milliseconds
- High — Greater than 10 milliseconds
If you're building microservices that need instant communication, you'll likely need to adopt a low-latency database.
For example, let's say you're modelling a Search Service:
Figure: A Product Searching Service
Ideally, a search operation cannot take more than a few seconds, regardless of the payload. Therefore, in such cases, you'll need to pick a database that supports delivering responses in the defined period.
Data Modelling Requirements
One of the most significant advantages of choosing microservices over monolith is that developers get to define different data models for different services. A typical microservices architecture may consist of data models comprising key-value, graph, time-series, JSON, streams, search engines, and more.
For example, if you were modelling an e-commerce app with microservices, you could have a data requirement as follows:
Figure: Metric requirement for services
Some of your services would need very high read performance with low latency, while others can tolerate a moderate level of latency.
Each of these services could have a data model as follows:
Figure: Modelling microservice data structures
For example, DynamoDB is a strong candidate for the Cache Server as it requires very high read performance (less than 1 ms) and high write performance with low latency.
You should formalize the performance requirements for your microservices in terms of acceptable latency and IOPS to ensure you're selecting the correct database for your microservice.
What are the tips for choosing the correct database for a microservice?##
Tip #1 - Consider the CAP Theorem
When you pick a database, look into its workings and identify its location in the CAP theorem. Proceed with the database only if it meets your expectations in the CAP Theorem, as there will always be tradeoffs.
Tip #2 - Gather all requirements upfront
It's essential to understand the requirements of your microservice before you pick a database for it. If your microservice is write-heavy but not read-heavy, you could consider utilizing two databases (one for reading, one for writing) and communicating with them using Eventual Consistency and the CQRS (Command Query Responsibility Segregation) pattern.
Apart from that, gain an insight into the acceptable latency and IOPS your database will need.
Tip #3 - Use Amplication 😁 💜
Consider using tools like Amplication to build your microservices. Amplication lets you bootstrap and build microservices in just a few clicks while allowing you to select specific databases such as PostgreSQL, MySQL, and MongoDB for each particular service, depending on your requirements. Swapping a database in favour of another is just four clicks. This allows you to experiment and test with different databases very quickly, which can be a game changer for testing multiple databases per service until you find the most suitable one.
Microservices have gained a significant advantage over monoliths due to their capability to support loosely coupled services, where each service can be developed, tested, and maintained in isolation while using a separate datastore that is most suitable for that microservice.
Hence, it's essential to understand how to pick the most suitable database for each microservice. You need to dive into aspects like IOPS, Latency, and Data Modeling and gain a strong understanding of the CAP Theorem to ensure that you pick the correct database. You should strive to build your services using architectures and platforms that will allow you to easily swap databases in the future.
By doing so, you're on the right path to building highly scalable and high-performing microservices that can serve requests at optimal capacity.
Can microservices use multiple databases?
Yes, you are highly encouraged to use separate databases for your microservices as this helps break down the monolith data store and lets you independently scale your database services up and down based on your requirements.
Can microservices use SQL databases?
You can choose between SQL, Key-Value, and Graph databases for your microservice. It depends on your requirements.
Should I use a relational or a NoSQL database for my microservice?
There is no "one size fits all" and no silver bullet. It depends on the requirements that you wish to satisfy. Consider using a normalized relational database if consistency is more important than performance. If performance is important, consider using a NoSQL database.
What are the trade-offs between using a single database for all microservices and multiple databases?
With a single database for all of your microservices, it's challenging to scale parts of your database. And, sometimes, different services might have different access patterns and need other data models that cannot be implemented if you use a single database for all your microservices.