Building Resilient Systems: Securing your API's from DoS attacks

12/5/20244 min read

In the era of cloud computing, APIs, and distributed applications, systems are required to handle a massive influx of traffic, both legitimate and malicious. Without effective rate limiting, services are prone to performance degradation, crashes, and exploitation. Rate limiting is essential to protect systems, maintain a fair allocation of resources, and ensure the reliability of services.

Without an appropriate rate-limiting mechanism, various issues can arise leading to disproportionate resource consumption by certain users. Below are these challenges in the real-world.

1. Resource Exhaustion: Protecting Infrastructure

2. Defending Against Denial of Service (DoS) Attacks

3. Preventing Brute-Force Login Attacks

4. Cost Control and Fair Usage

5. Handling Traffic Spikes and Scalability Issues

6. Protecting API Stability and Consistency

Rate limiting is not just a feature, but a necessity in modern systems to ensure performance, security, fairness, and cost control. By throttling traffic and enforcing usage limits, rate limiting safeguards systems against abuse and overload, ensuring they remain available, responsive, and scalable even in the face of heavy demand. Whether it’s APIs like Twitter and Google Maps, or platforms like Facebook and GitHub, rate limiting forms a key part of the infrastructure that keeps these services running smoothly for millions of users.

Rate-Limiting Methodologies:

1. Fixed Window Rate Limiting: Limits requests in a fixed time window. Simple but can allow bursts at window boundaries.

2. Sliding Window Rate Limiting: Tracks requests in a continuous rolling window. Smoother traffic control compared to fixed window.

3. Token Bucket Algorithm: Allows bursts by adding tokens to a bucket at a constant rate. Requests consume tokens, and excess requests are throttled or denied.

4. Leaky Bucket Algorithm: Enforces a strict, constant rate by “leaking” requests at a fixed rate. Excess requests are dropped if they exceed the processing rate.

5. Concurrent Rate Limiting: Limits the number of concurrent requests or connections. Focuses on managing server load.

Rate Limiting at Enterprise Scale:

In large-scale enterprise applications, rate limiting is not a one-size-fits-all solution. With a mix of transactional APIs that handle varying levels of read and write operations, a more sophisticated approach is required to ensure fairness, prevent system overload, and cater to different client demands. Let’s break down the specific needs of such systems and how they can be addressed, particularly with NGINX.

Key Use Cases:

1. Differentiated Rate Limits for Read-Heavy and Write-Heavy APIs

Enterprise applications often host APIs with varying workloads. For example, a user profile service could have write-heavy operations like user registration and read-heavy operations like fetching user authorizations. Applying a standard rate limit for both operations may not provide the necessary flexibility. A blanket rate limit can leave room for inefficiency:
- Write-heavy operations (e.g., new user registration) need stricter controls to prevent system overload.
- Read-heavy operations (e.g., fetching user authorizations) can typically tolerate higher traffic without strain.

2. Multiple Rate Limits per Client and Operation

Not all clients are the same, and their rate limits shouldn't be either. Enterprise systems need the capability to define different rate limits for different clients or even for specific operations of the same client.
- Example: Client A might need a stricter limit for creating users (e.g., 50 users/min) compared to Client B, who may need to create 50,000 users per minute.
- This requires a dynamic assignment of rate limits at runtime, giving you the flexibility to adjust based on the client’s needs and traffic patterns.

3. Rate Limiting at the Cluster Level

In modern distributed systems, APIs are often deployed across multiple instances, and rate limiting should be applied at the cluster level, not just at individual nodes. This ensures that the combined traffic hitting the entire API is controlled, no matter which instance receives the requests.

Required Capabilities for Enterprise Rate Limiting:

To support these use cases, a rate-limiting solution for enterprise-scale applications should include the following capabilities:

1. Unique Client Identification: The system should be able to uniquely identify clients making requests. This could be done using API keys, JWT tokens, or IP-based identification.

2. Support for Multiple Rate Limits with Runtime Assignment

There should be the flexibility to define multiple rate limits for different clients or operations and assign these limits dynamically at runtime. Additionally, the ability to toggle rate limits on and off for specific clients or API methods should be available for more granular control.
This allows for real-time adjustments and tailored rate limiting based on client performance, use cases, or SLAs.

3. Cluster-Level Rate Limiting

When using load balancers, such as NGINX deployed on VMs, the rate limiting needs to be applied across the entire cluster to avoid overloading individual instances or unfair distribution of traffic.
This would ensure that no matter how requests are routed, the overall system remains within its capacity limits.

NGINX: Exploring Out-of-the-Box Options for Rate Limiting

NGINX, one of the most popular load balancers and reverse proxies, provides powerful out-of-the-box capabilities to implement rate limiting at an enterprise scale. Let’s explore how NGINX can address the key requirements identified:

1. Identifying Clients Uniquely

This can be done in a variety of ways depending on your systems approach of identifying/authorizing clients.

For example, a simpler system which issues a static token to clients and expects them to pass it off in the header we can read as below

In system which use JWT tokens to authorize clients we can use the below code snippet to return the top-level payload or any claims.

2. Defining and Assigning Multiple Rate Limits

NGINX supports the definition of multiple rate limit zones, which allows you to apply different rate limits to different types of clients or operations. By using NGINX’s built-in limit_req_zone directive, you can assign different rate limits for different methods or paths dynamically.
Once we have identified the ratelimits we need to assign the same to the client using a key val store in nginx in the below format

"ClientUniqueIdentifier:ratelimitzone" ex:- "abc123:1" where abc123 is the uniqueclientidentifier retruned from the njs code piece and 1 is the ratelimit zone

Below logic will decide which zone would be choosen by the client and since client is mapped to only one zone only one of the maps will give a value and other would be empty

3. Cluster-Level Rate Limiting: To achieve rate limiting at the cluster level, NGINX can be configured with shared state using the limit_req_zone directive, where the rate limits are enforced across all instances of the NGINX nodes in a cluster.

Conclusion

For enterprise-scale applications, standard rate limiting approaches often fall short of the necessary flexibility required to manage diverse client needs, traffic patterns, and system architectures. By leveraging NGINX’s built-in rate-limiting capabilities, combined with flexible configuration options, you can tailor rate limits to meet the demands of both read-heavy and write-heavy APIs, dynamically assign limits based on client requirements, and enforce limits at the cluster level for distributed systems.

These solutions not only ensure fair usage of system resources but also protect your infrastructure from overloads, denial-of-service attacks, and unpredictable traffic spikes, all while maintaining optimal performance and user experience.

GitHub Repo for sample code: ThameemIshraq/ratelimiting-nginx: Rate limiting in Nginx