Designing a URL Shortener (TinyURL)

1. Introduction

A URL Shortener is a service that transforms a long URL into a significantly shorter URL and redirects users to the original URL when the short link is accessed. Well-known examples include TinyURL, Bitly, and Google URL Shortener (deprecated).

From an interview perspective, designing a URL shortener is a classic system design problem because it looks simple but touches almost every important concept in distributed systems: scalability, high availability, data modeling, caching, consistency, traffic skew, and fault tolerance.

In software engineering, traffic skew (more commonly referred to as data skew or workload imbalance) is the uneven distribution of network requests, data, or computational load across a system's resources, such as servers, partitions, or processing nodes. This imbalance can lead to significant performance bottlenecks and system inefficiencies.

2. Requirements

A good system design always begins with clearly defining requirements. This prevents overengineering and guides architectural decisions.

2.1 Functional Requirements

Assume the URL https://www.examlio.com/q=India&s=Delhi&c=Delhi&l=hi is the original URL. Your service creates a shorter alias: https://tinyurl.com/y7keocwj. When you click the alias, it redirects you to the original URL.

1. The system must allow users to submit a long URL and receive a short URL in response. (URL shortening: Given a long URL, return a much shorter URL.)
2. When a user accesses the short URL, the system must redirect them to the corresponding long URL. (URL redirecting: Given a shorter URL, redirect to the original URL.)
3. The system should optionally support custom aliases, where users can choose their own short key instead of a generated one.
4. It should also allow URLs to have an expiration time, after which the short link becomes invalid.
5. For real-world usage, the system should support basic analytics, such as counting the number of times a short URL is accessed.

2.2 Non-Functional Requirements

What is the traffic volume?

100 million URLs are generated per day.

1. The system must be highly available, since broken short links are unacceptable once they are published.
2. Redirects should be extremely fast, typically requiring single-digit millisecond latency.
3. The system must be horizontally scalable, as the number of redirects can grow to billions per day.
4. The design must tolerate failures gracefully and avoid single points of failure.
5. From a consistency standpoint, the system can tolerate eventual consistency for analytics, but the mapping between short URL and long URL must be reliable once created.

3. High-Level Estimation

Write operation: 100 million URLs are generated per day.
Write operations per second: 100 million ÷ 24 ÷ 3600 ≈ 1,160 writes per second.

Read operation: Assuming the ratio of read operations to write operations is 10:1, the read operations per second are: 11,600 reads per second.

Assuming the URL shortener service runs for 10 years, it must support 100 million × 365 × 10 = 365 billion records.

Assume the average URL length is 100 bytes. Storage requirement over 10 years:
365 billion × 100 bytes
= 365 x 10⁹ × 10² bytes
= 365 x 10¹¹ bytes
= 36.5 x 10¹² bytes
= 36.5 TB. (1 TB = 1.0 x 10¹² B)

Unit Symbol Conversion

Byte B 1 B

Kilobyte KB 1 KB = 1,024 B

Megabyte MB 1 MB = 1,024 KB = 1,048,576 B

Gigabyte GB 1 GB = 1,024 MB = 1,073,741,824 B

Terabyte TB 1 TB = 1,024 GB = 1,099,511,627,776 B = 1.0 x 10¹² B

Unit	Symbol	Conversion
Byte	B	1 B
Kilobyte	KB	1 KB = 1,024 B
Megabyte	MB	1 MB = 1,024 KB = 1,048,576 B
Gigabyte	GB	1 GB = 1,024 MB = 1,073,741,824 B
Terabyte	TB	1 TB = 1,024 GB = 1,099,511,627,776 B = 1.0 x 10¹² B

4. API Design

In this section, we discuss the API endpoints.

4.1) URL Creation API

The URL creation API accepts a long URL as input and returns a short URL as output. This API represents the write path of the system. While it is write-heavy by nature, its overall request volume is relatively low compared to redirects.

POST /api/v1/data/shorten
Request parameters: { longUrl: "" }
Response: Returns the generated short URL.

The primary responsibility of this API is to generate a globally unique short key and ensure that the mapping between the short key and the long URL is persisted reliably. Once a short URL is returned to the client, the system must guarantee that this mapping will remain valid for its intended lifetime.

Because correctness is more important than raw throughput on this path, the API typically performs synchronous persistence to the database before responding. This ensures that no short URL is ever issued without a durable backing record.

4.2) Redirect API

The redirect API is invoked when a user accesses a short URL. This API represents the read path of the system and dominates overall traffic. Its design prioritizes low latency and high throughput above all else.

GET /api/v1/shortUrl
Response: Returns the corresponding long URL for HTTP redirection.

When a request arrives, the system looks up the short key, resolves it to the original long URL, and returns an HTTP redirect response. A 302 redirect is commonly used instead of a 301 to ensure that the redirect decision remains under the control of the service.

This allows the system to collect analytics, apply expiration rules, or change redirect behavior dynamically. The redirect API performs minimal logic and avoids any unnecessary processing to keep response times consistently low.

301 Redirect
A 301 redirect indicates that the requested URL has been permanently moved to the long URL. Since the redirection is permanent, the browser caches the response, and subsequent requests for the same short URL are not sent to the URL shortening service. Instead, the browser redirects the request directly to the long URL server.

302 Redirect
A 302 redirect indicates that the URL has been temporarily moved to the long URL. As a result, subsequent requests for the same short URL are first sent to the URL shortening service and then redirected to the long URL server.

Each redirection method has its own pros and cons. If the priority is to reduce server load, a 301 redirect is preferable because only the first request reaches the URL shortening service. However, if analytics are important, a 302 redirect is a better choice, as it allows the service to track click counts and traffic sources more effectively.

5. Data Model

At its core, a URL shortener is a distributed key–value system. Every operation in the system revolves around a single fundamental mapping:

short_key → long_url

This mapping is immutable after creation in most systems, which greatly simplifies the design and enables aggressive caching.

The short key is the unique identifier of a shortened URL and acts as the primary key in the data store. It is what the system receives on every redirect request and therefore must support extremely fast lookups.

Real-world systems attach additional metadata to each short URL. These fields are not required for basic redirection but are essential for operational and business requirements.

A typical extended model includes:

- Creation timestamp, which records when the short URL was generated. This is useful for auditing, cleanup jobs, and analytics.
- Expiration timestamp, which defines the lifetime of the short URL. Once expired, the redirect should return an error or fallback page.
- Click count, which tracks how many times the short URL has been accessed.

Logically, the model becomes:

ShortKey (PK)
LongURL
CreatedAt
ExpiresAt
ClickCount

The critical point is that only the short key participates in lookup paths. All other fields are either informational or updated asynchronously.

Base62 encoding converts data into a shorter, URL-friendly alphanumeric string and is reversible. In contrast, hashing produces a fixed-length, irreversible fingerprint used for data integrity and security.

Base62 is primarily used for representation (such as short URLs), whereas hashing is used for verification (such as password storage), relying on one-way functions that prevent data reconstruction.

6. Short URL Generation Strategy

The most important design decision is how to generate the short URL key.

6.1. Auto Increment ID + Base62 encoding (Recommended)

A widely used and preferred approach is auto-increment ID + Base62 encoding. The system generates a unique numeric ID, which is then converted into a Base62 string using characters [a–z][A–Z][0–9]. This produces compact, URL-friendly keys.

What is Base62?

Base62 is a number encoding system that represents a decimal number using 62 distinct characters:

26 lowercase letters: a–z
26 uppercase letters: A–Z
10 digits: 0–9

So the base (radix) is: 62 = 26 + 26 + 10

Just like:

Base10 uses digits 0–9
Base16 (Hex) uses 0–9 + A–F
Base62 uses 0–9 + a–z + A–Z

In Base10, the number 345 means: 3 × 10² + 4 × 10¹ + 5 × 10⁰
In Base62, the idea is identical, but instead of powers of 10, we use powers of 62, and digits are replaced by characters.

A typical Base62 mapping looks like this:

Index Range Character Mapping

0–9 '0'–'9'

10–35 'a'–'z'

36–61 'A'–'Z'

Note: The exact ordering can vary by implementation, but consistency matters, not the order itself.

Let's encode a decimal number into Base62.

Example: Encode 125
125 ÷ 62 = 2 remainder 1
2 ÷ 62 = 0 remainder 2

Read remainders from bottom to top and map remainders to characters:

2 → '2'
1 → '1'

Base62(125) = "21"

Suppose your database generates an auto-increment ID: ID = 10,000

Convert to Base62:

10000 ÷ 62 = 161 remainder 18
161 ÷ 62 = 2 remainder 37
2 ÷ 62 = 0 remainder 2

Mapping remainders:

2 → '2'
37 → 'B'
18 → 'i'

Final Base62 string: "2Bi"
This becomes your short URL key.

Decoding: Base62 → Decimal Number

Decoding is the reverse process. Example: Decode "2Bi"

Using character indices:

'2' → 2
'B' → 37
'i' → 18

Apply positional values:

2 × 62² + 37 × 62¹ + 18 × 62⁰
= 2 × 3844 + 37 × 62 + 18
= 7688 + 2294 + 18
= 10,000

So we recover the original ID.

Index Range	Character Mapping
0–9	'0'–'9'
10–35	'a'–'z'
36–61	'A'–'Z'

With Base62 encoding, a 7-character key can represent more than 3.5 trillion unique URLs, which is sufficient for most systems.

For a key of length 7, each position is independent. So total possible combinations: 62⁷

62⁷ = 3,521,614,606,208 That is approximately: 3.5 trillion unique URLs

Position 1 → 62 choices
Position 2 → 62 choices
…
Position 7 → 62 choices

Because none of the Position constrain each other, the total number of combinations is: 62 × 62 × 62 × 62 × 62 × 62 × 62 = 62⁷

An alternative approach is hashing the long URL using algorithms like MD5 or SHA-256. While hashing is fast, it introduces the risk of collisions, which complicates the design. For this reason, ID-based encoding is generally preferred in production systems.

Aspect	Hash + Collision Resolution	Base-62 Conversion
Short URL Length	Fixed short URL length	Short URL length is not fixed; it grows as the ID increases
Unique ID Generator	Does not need a unique ID generator	Depends on a unique ID generator
Collision	Collision is possible and must be resolved	Collision is not possible because the ID is unique
Predictability	It is not possible to infer the next short URL because it does not depend on an ID	It is easy to infer the next short URL if IDs increment by 1, which can be a security concern

Hashing and Collision Resolution (Optional)

To shorten a long URL, we can also implement a hash function that converts the long URL into a 7-character string. A straightforward approach is to use well-known hash functions such as CRC32, MD5, or SHA-1.

Hash Function	Hash Value (Hexadecimal)
CRC32	a3f1c92e
MD5	9f86d081884c7d659a2feaa0c55ad015
SHA-1	f572d396fae9206628714fb2ce00f72e94f2258f

As shown in Table, even the shortest hash value (produced by CRC32) is still too long, exceeding the required 7 characters. This raises the question: How can we make it shorter?

One simple approach is to take the first 7 characters of the hash value. However, this method introduces the risk of hash collisions, where two different long URLs generate the same short URL.

To resolve collisions, we can use a recursive strategy:

- Append a predefined string to the original long URL.
- Recompute the hash.
- Generate a new short URL.
- Repeat the process until a unique short URL is found.

Although this method effectively eliminates collisions, it is expensive because it requires querying the database to check whether a short URL already exists for every request.

To improve performance, a Bloom filter can be used. A Bloom filter is a space-efficient probabilistic data structure that tests whether an element is a member of a set. It significantly reduces unnecessary database lookups by quickly identifying whether a short URL is definitely not present or possibly present.

7. Database Design

A URL shortener behaves like a distributed key–value store. Both relational and NoSQL databases can work, but at large scale, NoSQL databases such as Cassandra or DynamoDB are often preferred.

These databases provide horizontal scalability, high write throughput, and built-in replication. The system usually favors availability over strict consistency, which aligns with the CAP trade-offs made by such databases.

The short key is the primary access pattern of the system.That means:
- Reads are always driven by the short key
- Writes (creation) also generate and store a short key

In relational databases, the short key is the primary index. In NoSQL systems, it is the partition key.

Assume:
- You have N database nodes
- You compute a hash of the short key

shard_id = hash(short_key) % N

The resulting shard_id tells you which database node should store and serve that key.

Sharding means splitting data across multiple database nodes so that:

- No single node stores all data
- Load (reads + writes) is spread horizontally
- The system can scale by adding more nodes

Assume:
4 database shards: DB0, DB1, DB2, DB3
Short keys: aZ3k9P, Qe72Lm, 09XaBc

hash(aZ3k9P) % 4 → DB2
hash(Qe72Lm) % 4 → DB0
hash(09XaBc) % 4 → DB1

8. Caching Strategy

Caching is the single most important performance optimization in a URL shortener. The system is inherently read-heavy, where redirect requests outnumber URL creation requests by several orders of magnitude.

Without caching, every redirect would require a database lookup, which would quickly become the dominant bottleneck in terms of latency, cost, and throughput.

The primary goal of caching here is to ensure that the redirect path remains memory-bound, not disk-bound.

A redirect operation is conceptually simple: short_key → lookup → long_url → HTTP redirect

However, at scale:

- A single popular short URL can receive millions of requests per hour
- Database reads are orders of magnitude slower than memory reads
- Databases are expensive to scale for read-heavy workloads

By placing a cache in front of the database, the system ensures that:

- Most redirects are served directly from memory
- The database is protected from read amplification
- Latency stays consistently low even during traffic spikes

In practice, a well-designed cache can serve 95–99% of redirect traffic.

In this pattern, the application explicitly controls cache access. When a redirect request arrives:

- The URL service first checks the cache using the short key
- If the entry exists (cache hit), the long URL is returned immediately
- If the entry does not exist (cache miss), the database is queried
- The redirect response is sent to the client
- The database result is written to the cache

The cache key is typically the short key itself, optionally prefixed with a namespace:

url:{short_key} → long_url

This avoids collisions and allows easy bulk invalidation or monitoring. Because the short key is already compact and unique, it makes an ideal cache key.

Although URL mappings may be valid indefinitely, caches are finite. Therefore:

- A TTL is often applied to cached entries
- Common TTL values range from minutes to hours
- Frequently accessed URLs naturally stay hot due to repeated access

Eviction policies such as LRU (Least Recently Used) work extremely well because access patterns are highly skewed—most traffic goes to a small percentage of URLs.

9. Scalability and Load Balancing

The URL shortener is designed with a stateless application layer. Each instance of the URL service does not store any user session data or request-specific state in memory. All state is externalized to shared systems such as the cache and the database.

This statelessness is a deliberate architectural choice because it enables horizontal scalability. Any incoming request can be handled by any service instance, which makes it trivial to scale the system by simply adding or removing instances based on traffic demand.

A load balancer sits in front of the URL service instances and distributes incoming traffic among them. Since requests are independent and stateless, the load balancer does not need session affinity or sticky sessions.

This allows common strategies such as round-robin or least-connections to work efficiently, ensuring even utilization of service instances and preventing any single instance from becoming a bottleneck.

The load balancer absorbs the spike and distributes requests across a larger pool of stateless services. The cache layer absorbs most of the read load, preventing the database from being overwhelmed.

Independent Scaling of System Layers

One of the key strengths of this architecture is that each layer of the system scales independently based on its own bottlenecks.

The application layer scales with CPU and network usage, which is driven by the number of incoming requests. When traffic increases, more service instances are added behind the load balancer without impacting other layers.

The cache layer scales based on read throughput and memory usage. As redirect traffic grows, additional cache nodes can be added to distribute the load and increase effective cache capacity.

The database layer scales based on data volume and write throughput. Sharding and replication allow the database to grow horizontally without affecting application logic.

10. Fault Tolerance and Reliability

Fault tolerance is a core requirement for a URL shortener because once a short URL is published, it becomes a permanent dependency for users, applications, and even other systems. Any downtime directly translates into broken links, which is unacceptable at scale.

High availability is achieved by replicating every critical layer of the system. The application layer runs multiple instances of the URL service, ensuring that the failure of a single instance does not impact request handling. Traffic is automatically rerouted to healthy instances by the load balancer.

The cache layer is also deployed in a replicated or clustered configuration. If a cache node fails, requests are transparently served by other nodes. While a cache failure may temporarily increase database load, it does not break correctness. This design treats the cache as a performance optimization, not a source of truth, which greatly improves system robustness.

The database layer is replicated to ensure data durability and availability. Writes are acknowledged only after the data is safely persisted, often to multiple replicas. This guarantees that once a short URL is returned to the client, it will not be lost even if a node fails immediately afterward.

Failures are isolated to individual components rather than cascading across the system. If part of the cache becomes unavailable, the system continues to function by falling back to the database. If a service instance crashes, the load balancer routes traffic to healthy instances.

This approach ensures graceful degradation, where performance may be temporarily impacted but availability is preserved.

In advanced setups, cross-region replication is used to protect against regional outages such as data center failures or network partitions. Data is asynchronously replicated to multiple geographic regions, allowing traffic to be shifted if an entire region becomes unavailable.

Although cross-region replication may introduce slight replication lag, this trade-off is acceptable because availability is prioritized over strict consistency.

These reliability mechanisms allow the system to undergo routine maintenance, rolling deployments, and unexpected failures without user-visible impact. The system continues to serve redirects even in degraded states, which is critical for maintaining trust.

11. Analytics and Asynchronous Processing

In a URL shortener, the redirect path is the most latency-sensitive operation. Any additional work performed synchronously during a redirect directly impacts user-perceived performance. For this reason, analytics processing must be completely decoupled from the redirect flow.

Rather than updating counters or analytics data inline, each redirect event is published asynchronously to a message queue. This allows the system to record analytics without blocking or delaying the redirect response.

When a redirect occurs, the URL service emits a lightweight event containing essential information such as the short key, timestamp, and optional metadata like IP address or user agent. This event is sent to a message queue in a fire-and-forget manner.

The redirect response is returned immediately after the event is published, ensuring that analytics overhead does not affect request latency.

A separate analytics pipeline consumes events from the queue and processes them independently. This pipeline is responsible for:

- Aggregating click counts - Updating dashboards - Generating reports

Because this pipeline operates asynchronously, it can scale independently of the core redirect service. If analytics traffic increases, more consumers can be added without impacting the redirect path.

Message queues provide built-in buffering, which protects the system during traffic spikes. If the analytics pipeline falls behind, events accumulate in the queue rather than slowing down redirects. This design introduces natural backpressure handling and improves overall system stability.

Analytics data is typically eventually consistent. Click counts may lag behind real-time by a few seconds or minutes, which is an acceptable trade-off. This allows the system to prioritize availability and performance over strict real-time accuracy.

12. Security and Abuse Prevention

Security and abuse prevention are critical in a URL shortener because the system sits at the intersection of user-generated content and public internet traffic. Without safeguards, the service can easily be misused for spam URL creation, phishing campaigns, malware distribution, or denial-of-service attacks. A robust design anticipates these threats and mitigates them proactively.

The URL creation API is the most common target for abuse. To prevent mass generation of short URLs, rate limiting is applied based on factors such as IP address, user account, or API key. This ensures that no single actor can overwhelm the system or create large volumes of malicious links.

Rate limits are enforced at the edge or application layer, allowing legitimate users to proceed while throttling abusive behavior early in the request lifecycle.

Before a long URL is shortened, it can be validated and checked against known blacklists of malicious domains or phishing sites. This helps prevent the service from becoming a vector for harmful content.

Validation may include basic checks such as URL format, supported schemes, and domain reputation. These checks are typically lightweight and can be extended over time as new threats emerge.

Short URLs are publicly accessible, which makes enumeration attacks a concern. If short keys are predictable or sequential, attackers can systematically crawl the key space to discover valid links.

To mitigate this, short keys are designed to be sufficiently random-looking. Techniques such as Base62 encoding with large key spaces, randomized starting offsets, or non-sequential ID generation make guessing valid URLs computationally impractical.

Security is not a one-time decision. Metrics such as URL creation rate, redirect patterns, and error rates are continuously monitored to detect suspicious behavior. These signals can be used to tighten rate limits or enhance validation rules over time.

Conclusion

In interviews, designing a URL shortener demonstrates strong fundamentals. Interviewers expect clarity on why reads dominate writes, why caching is essential, and why ID-based encoding is preferred over hashing.

A senior engineer clearly explains trade-offs, keeps the design simple, and focuses on scalability and reliability.

Mastering this design not only prepares you for interviews but also strengthens your understanding of real-world distributed system architecture.