Understanding WSGI vs ASGI: Performance and Concurrency

August 23, 2025

Python web frameworks have evolved significantly over the years, with the transition from WSGI (Web Server Gateway Interface) to ASGI (Asynchronous Server Gateway Interface) representing one of the most important shifts in how Python web applications handle concurrency and performance.

In this article, we’ll explore the fundamental differences between WSGI and ASGI, implement basic examples of both, and benchmark their performance to understand the real-world implications of choosing one over the other.

Understanding WSGI: The Synchronous Standard

WSGI has been the standard interface between web servers and Python web applications since PEP 333 was introduced in 2003 (later updated in PEP 3333). It provided a common ground that allowed Python web frameworks (like Flask and Django) to be server-agnostic, enabling them to work with various web servers like Gunicorn, uWSGI, and mod_wsgi.

A Simple WSGI Application

Let’s look at a basic WSGI application:

wsgi/main.py

def app(environ, start_response):
    response_body = b"Hello, WSGI World!"
    status = "200 OK"
    headers = [("Content-Type", "text/plain")]

    start_response(status, headers)
    return [response_body]

The WSGI interface is simple and straightforward:

A WSGI application is a callable (function or class) that takes two arguments:

environ: A dictionary containing CGI-like environment variables
start_response: A function that starts the response by sending status and headers

The callable must return an iterable of byte strings (the response body)

WSGI Middleware and Routing

WSGI also supports middleware - components that sit between the server and application to modify requests or responses. Here’s an example:

wsgi/middleware_routing.py

def simple_middleware(app):
    """A simple WSGI middleware that adds a header"""
    def middleware(environ, start_response):
        def custom_start_response(status, headers, exc_info=None):
            headers.append(("X-Custom-Header", "WSGI-Middleware"))
            return start_response(status, headers, exc_info)

        return app(environ, custom_start_response)

    return middleware


def router_app(environ, start_response):
    path = environ["PATH_INFO"]

    if path == "/":
        response_body = b"Hello from WSGI Root!"
    elif path == "/about":
        response_body = b"This is the About page (WSGI)."
    else:
        response_body = b"404 Not Found"
        start_response("404 Not Found", [("Content-Type", "text/plain")])
        return [response_body]

    start_response("200 OK", [("Content-Type", "text/plain")])
    return [response_body]


# Wrap the app with middleware
app = simple_middleware(router_app)

This pattern is foundational to how web frameworks like Flask and Django handle middleware and routing.

The WSGI Concurrency Problem

While WSGI works well for many applications, it has one significant limitation: it’s synchronous by nature. This means that:

Each request ties up a Python process/thread until the response is complete
I/O operations block the entire process
Scaling requires more processes/threads (e.g., adding more containers to our deployment or adding more gunicorn workers), which can be resource-intensive

Let’s see this in action with a simple example that demonstrates the blocking nature of WSGI:

wsgi/speedcheck.py

import time

def app(environ, start_response):
    path = environ["PATH_INFO"]

    if path == "/fast":
        body = b"Fast response from WSGI!"
        status = "200 OK"
    elif path == "/slow":
        time.sleep(5)  # blocks everything
        body = b"Slow response from WSGI after 5s!"
        status = "200 OK"
    else:
        body = b"404 Not Found"
        status = "404 Not Found"

    start_response(status, [("Content-Type", "text/plain")])
    return [body]

In this example, the /slow endpoint uses time.sleep(5) to simulate a slow operation. While this request is being processed, the entire worker thread/process is blocked, preventing it from handling any other requests. If we try to hit the /fast endpoint while the slow endpoint is being executed, the /fast endpoint would also be blocked.

Enter ASGI: The Asynchronous Evolution

ASGI was developed to address the limitations of WSGI, particularly regarding concurrency. It provides a standardized interface for asynchronous web servers and applications, enabling true concurrency without requiring multiple processes or threads.

A Simple ASGI Application

Here’s what a basic ASGI application looks like:

asgi/main.py

async def app(scope, receive, send):
    if scope["type"] == "http":
        body = b"Hello, ASGI World!"
        await send({
            "type": "http.response.start",
            "status": 200,
            "headers": [(b"content-type", b"text/plain")],
        })
        await send({
            "type": "http.response.body",
            "body": body,
        })

The ASGI interface differs from WSGI:

It’s asynchronous (uses async/await)
Takes three arguments:
- scope: A dictionary containing request information
- receive: An async function to receive events from the client
- send: An async function to send events to the client
Supports multiple protocols (HTTP, WebSockets, etc.) via the scope["type"]

ASGI Middleware and Routing

Similar to WSGI, ASGI supports middleware patterns:

asgi/middleware_routing.py

async def simple_middleware(app, scope, receive, send):
    """ASGI middleware wrapper"""
    async def middleware(receive, send):
        async def custom_send(message):
            if message["type"] == "http.response.start":
                headers = message.setdefault("headers", [])
                headers.append((b"x-custom-header", b"ASGI-Middleware"))
            await send(message)
        await app(scope, receive, custom_send)
    return middleware


async def router_app(scope, receive, send):
    if scope["type"] != "http":
        return

    path = scope["path"]

    if path == "/":
        body = b"Hello from ASGI Root!"
    elif path == "/about":
        body = b"This is the About page (ASGI)."
    else:
        body = b"404 Not Found"
        await send({
            "type": "http.response.start",
            "status": 404,
            "headers": [(b"content-type", b"text/plain")],
        })
        await send({"type": "http.response.body", "body": body})
        return

    await send({
        "type": "http.response.start",
        "status": 200,
        "headers": [(b"content-type", b"text/plain")],
    })
    await send({"type": "http.response.body", "body": body})


# Wrap with middleware
async def app(scope, receive, send):
    middleware = await simple_middleware(router_app, scope, receive, send)
    await middleware(receive, send)

Non-Blocking I/O with ASGI

The key advantage of ASGI is its support for non-blocking I/O. Let’s see a comparison similar to our WSGI example:

asgi/speedcheck.py

import asyncio

async def app(scope, receive, send):
    if scope["type"] != "http":
        return

    path = scope["path"]

    if path == "/fast":
        body = b"Fast response from ASGI!"
        status = 200
    elif path == "/slow":
        await asyncio.sleep(5)  # non-blocking sleep
        body = b"Slow response from ASGI after 5s!"
        status = 200
    else:
        body = b"404 Not Found"
        status = 404

    await send({
        "type": "http.response.start",
        "status": status,
        "headers": [(b"content-type", b"text/plain")],
    })
    await send({"type": "http.response.body", "body": body})

In this ASGI example, the /slow endpoint uses await asyncio.sleep(5) instead of time.sleep(5). This is a crucial difference: while the endpoint waits for 5 seconds, the server can still process other requests because the sleep is non-blocking.

Benchmarking WSGI vs ASGI

Let’s quantify the performance difference with a benchmark script that tests both WSGI and ASGI servers handling a mix of fast and slow requests:

benchmark/benchmark.py

import time
import httpx
import asyncio
from tabulate import tabulate

WSGI_BASE = "http://127.0.0.1:8000"
ASGI_BASE = "http://127.0.0.1:8001"

NUM_SLOW = 3
NUM_FAST = 15


def sync_test(base_url: str) -> float:
    client = httpx.Client(timeout=30.0)
    start = time.perf_counter()

    # Run all slow requests sequentially
    for _ in range(NUM_SLOW):
        client.get(f"{base_url}/slow")

    # Run fast requests sequentially
    for _ in range(NUM_FAST):
        client.get(f"{base_url}/fast")

    return time.perf_counter() - start


async def async_test(base_url: str) -> float:
    async with httpx.AsyncClient(timeout=30.0) as client:
        start = time.perf_counter()

        tasks = []
        # Fire all slow + fast requests concurrently
        for _ in range(NUM_SLOW):
            tasks.append(client.get(f"{base_url}/slow"))
        for _ in range(NUM_FAST):
            tasks.append(client.get(f"{base_url}/fast"))

        await asyncio.gather(*tasks)

        return time.perf_counter() - start


if __name__ == "__main__":
    results = []
    total_reqs = NUM_SLOW + NUM_FAST

    # Run sync tests
    elapsed = sync_test(WSGI_BASE)
    results.append(
        ["WSGI", "Sync", NUM_SLOW, NUM_FAST, f"{elapsed:.2f}s", f"{total_reqs/elapsed:.2f} req/s"]
    )

    elapsed = sync_test(ASGI_BASE)
    results.append(
        ["ASGI", "Sync", NUM_SLOW, NUM_FAST, f"{elapsed:.2f}s", f"{total_reqs/elapsed:.2f} req/s"]
    )

    # Run async tests
    elapsed = asyncio.run(async_test(WSGI_BASE))
    results.append(
        ["WSGI", "Async", NUM_SLOW, NUM_FAST, f"{elapsed:.2f}s", f"{total_reqs/elapsed:.2f} req/s"]
    )

    elapsed = asyncio.run(async_test(ASGI_BASE))
    results.append(
        ["ASGI", "Async", NUM_SLOW, NUM_FAST, f"{elapsed:.2f}s", f"{total_reqs/elapsed:.2f} req/s"]
    )

    # Print results as table
    print("\nBenchmark Results")
    print(
        tabulate(
            results,
            headers=["Server", "Mode", "# Slow", "# Fast", "Elapsed", "Throughput"],
            tablefmt="github",
        )
    )

This benchmark performs two types of tests for each server:

Synchronous test - sequential requests
Asynchronous test - concurrent requests

Understanding Gunicorn and Uvicorn

Before running our benchmarks, let’s take a closer look at the servers we’ll be using for our WSGI and ASGI applications: Gunicorn and Uvicorn.

Gunicorn: The Green Unicorn

Gunicorn (Green Unicorn) is a WSGI HTTP server for Python web applications. It’s widely used in production environments and has been a standard choice for deploying frameworks like Flask and Django.

Concurrency Model in Gunicorn

Gunicorn uses a pre-fork worker model, which means:

A master process manages multiple worker processes
Each worker handles one request at a time
Concurrency is achieved by running multiple worker processes

Gunicorn supports different worker types:

Sync workers: Handle one request at a time (default)
Gevent/Eventlet workers: Event-based workers that can handle concurrent connections but still within the WSGI paradigm
Tornado workers: Similar to the event-based workers above

Even with event-based workers, Gunicorn is still limited by the synchronous nature of WSGI. While gevent/eventlet can provide better concurrency than sync workers, they use monkey patching to achieve this, which can lead to complex debugging scenarios and compatibility issues.

A significant disadvantage of greenlets (used by gevent/eventlet) is the difficulty in profiling applications that use them. Standard Python profilers often struggle to properly trace and attribute execution time when greenlets are involved, making performance optimization challenging. This is because greenlets use cooperative multitasking that can confuse conventional profiling tools which expect traditional threading or process models. I had come across this issue some time back here and it is likely that there would still not be any solution for it.

Uvicorn: ASGI Server with Performance in Mind

Uvicorn is a lightning-fast ASGI server built on uvloop and httptools. It’s designed specifically for modern Python applications that leverage asynchronous programming.

Concurrency Model in Uvicorn

Uvicorn uses an event loop architecture:

A single process handles multiple connections concurrently
It leverages Python’s asyncio (or uvloop, which is an accelerated implementation)
Requests are processed asynchronously within the event loop

Uvicorn can also run in multiprocess mode, combining process-based concurrency with the event loop model.

Why Uvicorn Is Faster

Uvicorn’s performance advantages come from:

uvloop: A drop-in replacement for asyncio’s event loop that’s built on top of libuv (the same library that powers Node.js). It can be up to 2-4x faster than the standard asyncio loop.
httptools: Fast HTTP parsing built on the same C library (http-parser) used by Node.js.
Non-blocking I/O: Instead of dedicating a worker to each request, Uvicorn can handle many concurrent connections efficiently with minimal overhead.
Lower Memory Footprint: Since it doesn’t need to spawn multiple processes for concurrency, it can utilize resources more efficiently.

For applications with I/O-bound operations (database queries, API calls, file operations), Uvicorn’s event-loop-based architecture allows it to handle hundreds or thousands of concurrent connections with minimal resource usage compared to Gunicorn’s process-based approach.

Running the Benchmark

To run the benchmark, we’ll test both single worker and multiple worker scenarios:

Single Worker Test

A WSGI server running our WSGI application on port 8000 with a single worker:
```
uv run gunicorn speedcheck:app -b 127.0.0.1:8000 --workers=1
```
An ASGI server running our ASGI application on port 8001 (defaults to one worker):
```
uv run uvicorn speedcheck:app --host 127.0.0.1 --port 8001
```
Run the benchmark script
```
uv run python benchmark.py
```

Here are typical results from running this benchmark with a single worker:

Server	Mode	# Slow	# Fast	Elapsed	Throughput
WSGI	Sync	3	15	15.04s	1.20 req/s
ASGI	Sync	3	15	15.04s	1.20 req/s
WSGI	Async	3	15	15.05s	1.20 req/s
ASGI	Async	3	15	5.01s	3.59 req/s

Multiple Worker Test

A WSGI server running our WSGI application on port 8000 with 4 workers
```
uv run gunicorn speedcheck:app -b 127.0.0.1:8000 --workers=4
```
An ASGI server running our ASGI application on port 8001 (still using one worker)
```
uv run uvicorn speedcheck:app --host 127.0.0.1 --port 8001
```
Run the benchmark script again
```
uv run python benchmark.py
```

When we increase the number of WSGI workers to 4, we see significant performance improvements in the Async test scenario

Server	Mode	# Slow	# Fast	Elapsed	Throughput
WSGI	Sync	3	15	15.04s	1.20 req/s
ASGI	Sync	3	15	15.03s	1.20 req/s
WSGI	Async	3	15	5.02s	3.59 req/s
ASGI	Async	3	15	5.01s	3.59 req/s

The results clearly show that adding workers to a WSGI server can significantly improve its ability to handle concurrent requests. However, ASGI achieves similar concurrency with a single worker due to its event-loop architecture, making it more resource-efficient.

Analysis of Benchmark Results

The benchmark results reveal several key insights:

Sequential Requests: When making sequential requests, WSGI and ASGI perform similarly because they’re both limited by the total processing time (3 slow requests × 5 seconds each = 15 seconds).
Concurrent Requests with WSGI (Single Worker): With only one worker, even when using an asynchronous client with WSGI, we don’t see performance improvements because a single WSGI worker processes requests synchronously, creating a bottleneck.
Concurrent Requests with WSGI (Multiple Workers): When we increase to 4 workers, WSGI performance dramatically improves for concurrent requests. Each worker can handle one request at a time, so with 4 workers, we can process 4 requests simultaneously. This allows the 3 slow requests to be processed in parallel, reducing the total time to around 5 seconds (the duration of a single slow request).
Concurrent Requests with ASGI: ASGI shines regardless of worker count. With a single worker, ASGI can handle multiple concurrent requests efficiently through its event loop architecture. When running concurrent requests against an ASGI server, we see a dramatic improvement in throughput. Instead of taking 15+ seconds (the sum of all slow request times), it only takes about 5 seconds (the duration of a single slow request).

Real-World Performance Considerations

Our benchmarks clearly demonstrate ASGI’s advantage in handling concurrent requests, but there are several other important aspects to consider when evaluating real-world performance.

Request Processing Overhead

In production environments, the performance difference between Gunicorn and Uvicorn becomes even more pronounced under high load

Memory Usage: Gunicorn’s worker-based model requires more memory as each worker process has its own memory space. For example, a typical Django application might use 100-150MB per worker, so running with 4 workers would consume 400-600MB of RAM. Uvicorn, with its event loop architecture, can often achieve the same throughput with a single process using significantly less memory.
Worker Exhaustion: With Gunicorn, if all workers are busy processing long requests, new requests must wait in a queue until a worker becomes available. This often leads to timeout errors under high load, especially with slow database queries or external API calls.
Connection Handling: Uvicorn can maintain thousands of open connections simultaneously, making it ideal for applications with WebSockets or long-polling. Gunicorn would require an equivalent number of worker processes, which quickly becomes impractical.

I/O Wait Efficiency

Consider an application that makes three 1-second API calls for each request:

Gunicorn (WSGI): Each worker will be blocked for at least 3 seconds, limiting throughput to N/3 requests per second (where N is the number of workers).
Uvicorn (ASGI): Can make all three API calls concurrently, completing the request in just over 1 second. A single process can potentially handle hundreds of such requests concurrently.

Real-World Implications

The performance differences between WSGI and ASGI have significant implications for different types of web applications:

When to Use WSGI (Gunicorn)

Simple applications with quick responses
CPU-bound applications where asynchronous processing offers limited benefit
Applications with consistent, predictable request-response patterns
When using frameworks or libraries that don’t support ASGI
When your application doesn’t need to handle many concurrent connections

When to Use ASGI (Uvicorn)

Applications with long-running operations
APIs that need to handle many concurrent connections
Real-time applications with WebSockets
Microservices that make multiple upstream API calls
Applications that need to perform background tasks while handling requests
Services with unpredictable spikes in traffic where resource efficiency matters

Modern Python Web Frameworks

Most major Python web frameworks have adapted to support ASGI:

Django: Added ASGI support in version 3.0 (Sadly, Django Rest Framework still does not support ASGI)
FastAPI: Built specifically for ASGI
Starlette: A lightweight ASGI framework
Quart: An ASGI alternative to Flask
Flask: Still primarily WSGI, with some ASGI adapters available

Advanced Deployment Considerations

When deploying Python web applications in production, understanding the performance characteristics of your server is crucial.

Optimizing Gunicorn Deployments

If you need to use Gunicorn (perhaps due to framework compatibility or team expertise), consider these optimizations:

Worker Count: As our benchmark demonstrated, increasing the number of workers can significantly improve WSGI’s performance with concurrent requests. Each worker can handle one request at a time, so more workers mean more concurrent request handling. The general rule of thumb is (2 × CPU cores) + 1. This formula balances CPU utilization while providing some headroom for I/O operations. However, each worker consumes memory, so there’s a tradeoff between concurrency and resource usage.
Worker Class: For I/O-bound applications, consider using gevent or eventlet workers
```
gunicorn myapp:app --worker-class=gevent --workers=4
```
Timeouts: Configure appropriate timeouts to prevent worker processes from being tied up by slow clients
```
gunicorn myapp:app --timeout=30 --keep-alive=2
```

Optimizing Uvicorn Deployments

For Uvicorn deployments:

Enable uvloop: Ensure you have uvloop installed to get the maximum performance benefit:
```
uvicorn myapp:app --loop=uvloop
```
Worker Processes: For multi-core systems, you can run Uvicorn with multiple workers using Gunicorn as a process manager:
```
gunicorn myapp:app -w 4 -k uvicorn.workers.UvicornWorker
```
This combines the process management capabilities of Gunicorn with the asynchronous processing power of Uvicorn.

Lifespan Support: Leverage ASGI lifespan protocols for efficient application startup and shutdown:

@app.on_event("startup")
async def startup():
    # Initialize resources asynchronously
    await initialize_database_pool()

Conclusion

The transition from WSGI to ASGI represents a significant evolution in Python web development. While WSGI served the Python community well for years, ASGI brings Python web applications into the modern era of concurrency and asynchronous processing.

As our benchmarks demonstrate, ASGI’s ability to handle concurrent connections efficiently makes it the better choice for applications that need to handle multiple simultaneous requests, especially when those requests involve waiting for I/O operations. Uvicorn’s implementation of the ASGI standard leverages modern, high-performance components like uvloop and httptools to deliver impressive performance gains over traditional WSGI servers like Gunicorn.

Understanding the fundamental architectural differences between Gunicorn’s worker-based model and Uvicorn’s event-loop architecture is key to making the right choice for your application’s needs. The right server can dramatically impact your application’s performance, resource utilization, and ability to handle concurrent load.

Whether you’re building a new application or considering upgrading an existing one, understanding the differences between WSGI and ASGI, and their respective server implementations like Gunicorn and Uvicorn, will help you make an informed decision based on your application’s specific requirements and performance goals.

Last updated on August 23, 2025

Using C Bindings from Python