7 Tips to Build Scalable Node.js Applications

Yuval Hazaz
Yuval Hazaz
May 16, 2023
7 Tips to Build Scalable Node.js Applications7 Tips to Build Scalable Node.js Applications

Scaling should not be an afterthought when it comes to building software. As the number of users of an application increases, the application should scale and handle the increased payloads effectively.

Many technologies can be used to build such scalable applications, and Node.js is a popular choice among the rest. Node.js is a JavaScript framework created on Chrome's V8 JavaScript engine and, if utilized correctly, can be used to build highly scalable mission-critical applications. This article will discuss several tips which can be helpful when it comes to building scalable applications using Node.js.

1. Worker threads & concurrency

Node.js executes JavaScript code in a single-threaded model. However, Node.js can function as a multithreaded framework by utilizing the libuv C library to create hidden threads (see the event loop) which handle I/O operations, and network requests asynchronously. But, CPU-intensive tasks such as image or video processing can block the event loop and prevent subsequent requests from executing, increasing the application's latency.

Therefore, to handle such scenarios, worker threads were introduced in Node.js v10 as an experimental feature and a stable version was released in Node.js v12.

How Does The Worker Thread Work?

A worker thread is an execution thread within a Node.js process with an isolated environment consisting of an event loop. This ensures it can run parallel with other threads to perform expensive operations without blocking the main event loop.

The parent thread creates worker threads to execute resource-intensive tasks isolated from other threads. This ensures that the parent thread operates smoothly without blocking any operations.

Creating a worker thread is as simple as importing the worker_threads library and creating a new object.

const { Worker, isMainThread, parentPort } = require('worker_threads');

if (isMainThread) {
  // Main thread
  console.log('Starting worker threads...');

  // Start a worker thread
  const worker1 = new Worker(__filename);

  // Send data to the worker threads
  worker1.postMessage({ id: 1, start: 0, end: 99 });

  // Listen for messages from the worker threads
  worker1.on('message', (message) => {
    console.log(`Main thread received message from worker ${message.id}: ${message.result}`);
} else {
  // Worker thread
  parentPort.on('message', (data) => {
    console.log(`Worker ${data.id} received data: start=${data.start}, end=${data.end}`);

    // Perform a computationally expensive task
    let result = 0;
    for (let i = data.start; i <= data.end; i++) {
      result += i;

    // Send the result back to the main thread
    parentPort.postMessage({ id: data.id, result });

The snippet above depicts a real example of a worker thread in Node.js.

The primary (parent) thread creates a worker thread using the code in the same file and then passes the data to the worker using the message channel. Hereafter, the worker thread executes the assigned task using the data sent via the message channel. After the expensive operation has finished its execution in the worker thread, the result is sent back to the main thread, where a callback function executes and processes the result.

This example can be extended to multiple worker threads with different operations by creating more worker instances with varying locations of script given for the source parameter in the worker thread constructor. Similarly, any CPU-intensive tasks can be distributed among different worker threads, ensuring that the main thread's event loop is not blocked.

Instantly generate
production-ready backend
Never waste time on repetitive coding again.
Try Now

2. Scaling out to multiple servers/clusters

When an application faces a spike in demand, horizontal scaling can become handy. However, when it comes to Node.js applications, they can be scaled using two techniques.

  1. Scale with clustering.
  2. Scale across multiple servers.

Scale with clustering

Clustering is commonly used to scale a Node.js application horizontally within the same server. It allows developers to take full advantage of a multi-core system while reducing application downtime and outages by distributing the requests among the child processes.

You can create child processes that run concurrently with the application sharing the same port, which helps scale the application within the same server. Clusters can be implemented using the built-in cluster module of Node.js or a library like PM2, widely used in production applications.

const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  console.log(`Master ${process.pid} is running`);

  // Fork workers.
  for (let i = 0; i < numCPUs; i++) {

  cluster.on('exit,' (worker, code, signal) => {
    console.log(`worker ${worker.process.pid} died`);
} else {
  // Workers can share any TCP connection
  //, In this case,, it is an HTTP server
  HTTP.createServer((req, res) => {
    res.end('Hello World\n');

  console.log(`Worker ${process.pid} started`);

The snippet above highlights the use of horizontal scaling via clustering. It uses the built-in cluster module to fork and create child processes based on the available CPU count.

As shown above, you must write a decent amount of code to handle clustering, even for a simple application. Unfortunately, this approach is not maintainable for complex mission-critical production applications.

Therefore, libraries like PM2 can come in handy to handle all the added complexity behind the scenes. All we need to do is add PM2 on the server globally and run the application with PM2, which will spawn as many processes as possible in the system.

Additionally, it provides many more features to manage and visualize the processes. More details on PM2 can be found in their official documentation.

Scale with multiple servers

A Node.js application can be scaled horizontally using multiple servers as long as the application is running as an independent process. A load balancer can be introduced to handle scaling across servers, where the load balancer will distribute the requests among servers depending on the load.

However, using a single load balance is not a good practice as this creates a single point of failure. Therefore, it is best to introduce multiple load balancers pointing to the same server, depending on the application's criticality.

3. Breaking the application into microservices

Microservices is a software architecture pattern that breaks the application into smaller, independent, functional units where each unit can function and scale independently without affecting other services. Additionally, it helps improve scalability in several ways.

Scalability at the component level

Since each service is independent of other services, they can be scaled independently, which reduces the complexity of scaling the whole application. Furthermore, it allows the developers to scale each microservice according to the client's requirements. For example, you can select and only scale the microservices with high traffic.

Improved reliability

Breaking down a large monolith application into smaller, independent microservices makes observing and tracing each service easier. This improved observability can help identify and isolate issues within the system, making it easier to implement failover strategies that minimize the impact on the overall application.

Additionally, because each microservice operates independently, a failure in one service will only affect that specific service and not the entire application.

Efficient scalability

The small size of the services allows better resource utilization where needed. Hence, when the demand increases, the application can effectively scale while utilizing the available resources.

Additionally, microservices help contribute to the overall efficiency of development time. It can also result in faster deployments and releases. Moreover, developers can also utilize third-party tools to simplify microservice generation.

Tools like Amplication help developers generate fully functional services based on Typescript and Node.js with popular technologies such as (but not limited to) NestJS, Prisma, PostgreSQL, GraphQL, and MongoDB while ensuring your services are highly scalable and secure.

4. Optimizing static assets through a CDN

Node.js is fast when it comes to handling dynamic content like JSON objects. However, Node.js tends to underperform when you try to manage static assets such as images. Also, serving static content from the Node.js applications will be resource-intensive and could increase the application's latency.

To avoid this, you can use services like Nginx or Apache to serve static content. In addition, these web servers can also optimize serving static content and cache it on the web server.

Secondly, you can use specific services like Content Delivery Networks (CDNs) that are built to serve static content. CDNs have edge (POP) locations worldwide, bringing the static content closer to the users, thus improving the latency significantly while freeing the Node.js application to handle the dynamic requests.

We'll give it a shot!

Whoa, we're half way there... with this article. So, if you're pumped up after reading this article so far, let's keep the energy going and show some love for our project!

We're living on a prayer, working hard to build amazing tools for the Node.js community. But we can't do it alone! We need your support to keep pushing the boundaries and creating extraordinary experiences for developers like you.

Give us a shout-out and a 🌟 to the Amplication repository on GitHub.


5. Stateless authentication

Stateless authentication is an authentication technique in which most session details, such as user properties, are saved on the client side, while the server has no information on any previous requests.

It is generally implemented using token-based authentication via a JWT (JSON Web Token), which contains information (basic information + authorities) about the logged-in user. The JWT token is created and validated using a private and public key combination.

The client will send the JWT token to the server on every request made to the server, where the server validates the token to authenticate and verify the request. This lets the Node.js backend scale without any dependencies on the user sessions, as the server has to match the signature against the hash of the payload and the header generated using the private key.

This process reduces the workload on the server side to validate user requests and makes the authentication process scalable, as it has no dependencies on any specific server.

6. Use of timeouts for I/O operations

The performance of an application can be affected by external services despite the app's resilience and high performance. Therefore, it is vital to implement timeouts to ensure that the application is not waiting for a response from a different service for an extended time. To do so, developers can utilize the built-in timeout operations in the third-party services.

const axios = require('axios');

// set timeout to 5 seconds; after no response for 5 seconds, the request is timed out to process request faster.
const instance = axios.create({
  timeout: 5000

  .then(response => {
  .catch(error => {
    if (error.code === 'ECONNABORTED') {
      console.error('Timeout occurred');
    } else {

The snippet above depicts an example of a timeout.

It ensures that all API requests that execute through Axios will automatically be timed-out if it doesn't receive a response from the service after 5 seconds. It is implemented using the timeout attribute of the library and avoids blocking subsequent requests.

7. Implement tracking, monitoring & observability to debug and solve performance issues actively

Identifying performance bottlenecks of applications is crucial when building a scalable application. Therefore, implementing tracing, monitoring, and observability can provide insights into the actual bottlenecks and can help resolve the issues quickly.

Tracing is the ability to track a request in each stage. It helps to trace the exact sequence of events in a request before it finishes its execution. This provides developers with the information they need to determine areas of concern regarding latency. They can inspect and fix parts of the request before it becomes a significant issue.

Monitoring refers to tracking fundamental metrics of the application, such as response time, error rates, and resource utilization so that developers are kept aware of all activities within the application. They can then utilize the generated logs to troubleshoot errors within the application while using the metrics to identify areas of bottlenecks.

Finally, developers can combine tracing + monitoring and gain a holistic standpoint of the application through observability. Observability helps developers determine the internal state based on the generated outputs (metrics + logs). Hence, developers can easily find and isolate performance issues and fix them quickly.

Many open-source and commercial tools are available to implement tracing and monitoring in applications to improve the application's observability. These tools provide extensive insights into the application's performance and can help identify issues and improve the application's performance.


In this article, we discussed 7 tips for building scalable Node.js applications. Adopting these techniques will improve the performance and scalability of your Node.js application.

However, it is essential to identify and isolate bottlenecks before jumping in with a solution. Also, it's critical to understand that bottlenecks could change as your application scales. Thus, you may notice hot spots in different application parts at different scales. So, it is vital to maintain observability and regularly monitor the critical factors to identify the issues early.

How Amplication can help

Amplication is an open-source platform that helps you build backend services without spending time on repetitive coding tasks and boilerplate code. Instead, Amplication auto-generates a fully functional, production-ready backend based on TypeScript and Node.js.

Whether you build a single service or a microservices architecture, Amplication allows you to build at any scale.

With Amplication, development teams can create multiple services, manage microservices communication, use Kafka, connect to storage, or add an API Gateway.

Amplication can sync the generated code with a monorepo where each service goes to a different folder or with various repositories. You can manage dozens or hundreds of services with maximum consistency from a single source of truth and centralized management and visibility.