Skip to content

Benchmarks

A note on benchmarks

Measuring dependency injection overhead in isolation often produces performance numbers that may not fully reflect real-world patterns. For most practical applications, the DI container will rarely be your performance bottleneck. Database queries, network calls, and business logic usually dominate response times.

That said, optimizations compound: a faster DI layer, faster serialization, faster validation, etc. all add up to meaningful improvements.

While these results aim to be as objective as possible, Wireup is actively optimized for performance, so I expect it to perform well in this benchmark. Even so, I would not pick a DI container solely from performance benchmarks, but if you're happy with Wireup's features and want to see how it stacks up against the field, here are the results.

Benchmark Design & Stress Test

This benchmark uses an artificial workload to measure the overhead of the dependency injection container. By using empty services, the test focuses on how fast the library can resolve and inject dependencies without the results being hidden by application logic.

Testing is done within a FastAPI + Uvicorn environment to measure performance in a realistic web-based environment. Notably, this also allows for the inclusion of fastapi.Depends in the comparison, as it is the most popular choice by virtue of being the FastAPI default.

This setup tests also the overall dependency injection package of each library which includes container resolution, scoping, injecting into functions/route handlers as well as framework integration rather than a microbenchmark where you repeatedly resolve dependencies from the raw container instance in a tight loop. This benchmark intentionally uses non-trivial singleton/scoped graphs to stress test the containers.

The workload uses two separate, independent graphs: the singleton graph (Settings -> A -> B) and the scoped graph (C -> I) where each service depends on multiple others. H is a context manager, I is an async context manager.

This graph is intentionally non-trivial: large enough to emulate realistic container behavior, but still representative of practical applications (rather than 100-node chains). The exact shape is less important than the workload characteristics: multiple dependencies and lifecycle-managed resources, which stress container resolution, scoping, and teardown more than a simple linear graph.

Click to view the graphs

Singleton graph

graph LR
    Settings[Settings] --> A[Service A]
    A --> B[Service B]

Per-Request Injected graph

graph TD
    C --> D
    C --> E
    D --> E
    C --> F
    D --> F
    E --> F
    C --> G
    D --> G
    E --> G
    F --> G
    C --> H
    D --> H
    E --> I
    F --> I

For library authors

If you are benchmarking Wireup against your library, container.get(...) is not Wireup's canonical entry point. It is primarily intended as an advanced feature for users who want to access the container directly in edge cases, not as the main way to resolve dependencies.

For representative results, benchmark function-based injection via inject_from_container(...), which reflects Wireup's recommended dependency-injection usage pattern rather than service-locator style access.

Summary comparisons below always use Wireup (not Wireup Class-Based) as the reference to avoid cherry-picking the best-performing variant.

Benchmark Setup

The benchmarks were run on a local machine with 50 rounds of 100,000 requests per round per library. The tables and bar charts show the result of the Representative Median Run:

  1. All rounds are sorted by RPS.
  2. The median round is selected.
  3. RPS and Latency are taken from this specific run to ensure consistency.
  4. RSS (Memory) shows the peak usage observed across all rounds.

For each run, the server is started, a liveness probe must pass, warmup traffic is sent, and only then the measured run begins.

Actual requests per second (RPS) will change based on your hardware. The most important metric is how the libraries perform relative to each other.

Manual Wiring (No DI) represents the theoretical maximum performance. In this setup, services are manually instantiated within the route handler, bypassing DI containers entirely. This row exists purely to establish an upper bound on DI overhead, not as an endorsement of global state or manual wiring.

Wireup Class-Based represents the performance of Wireup when using the Class-Based Handlers for FastAPI.

Metrics

The benchmark measures the following:

  • RPS (Requests Per Second): The number of requests the server can handle in one second. Higher is better.
  • Latency (p50, p95, p99): The time it takes for a request to be completed, measured in milliseconds. Lower is better.
    • p50 (Median): Half of the requests are faster than this.
    • p95: 95% of requests are faster than this.
    • p99: 99% of requests are faster than this.
  • σ (Standard Deviation): Measures the stability of response times (Jitter). A lower number means more consistent performance with fewer outliers. Lower is better.
  • RSS Memory Peak (MB): The highest post-iteration RSS sample observed across runs. Lower is better. This includes the full server process footprint (Uvicorn + FastAPI app + framework runtime), not only service objects. Summary percentages and relative-throughput comparisons on this page are computed from the main Median Run tables, not the Stability or Total Time tables.

Hardware Environment

  • CPU: 12th Gen Intel(R) Core(TM) i7-12700K
  • Memory: 32 GB RAM
  • OS: Fedora Linux 43 (Workstation Edition); Kernel 6.18.13-200.fc43.x86_64

Execution Details

  • Python: v3.14.3
  • Server: Uvicorn with 1 worker process
  • Event Loop: uvloop
  • Load generator: hey v0.1.5 (must be installed and available on PATH)
  • CPU pinning: The benchmark runner pins the load generator to CPU 1 and the server process to CPU 2, both mapped to performance cores on this machine. For reproducibility on hybrid CPUs, pin benchmark processes to performance cores (P-cores), not efficiency cores (E-cores).
  • Startup liveness probe: Each server process is polled on /healthz before warmup and measurement begin.
  • Load Parameters: 50 concurrent connections
  • Warmup: 2,000 warmup requests per run, using the same concurrency as measured runs.
  • Verification: All endpoints are verified for correctness (status code 200 plus endpoint-level assertions on dependency values and scoping behavior).
  • Workload assertions: When BENCH_ASSERT=1, creation/lifecycle counters are checked against the expected workload shape and mismatches are reported as observed.
See exact package versions used
Package Version
wireup local@6f38e1d
fastapi 0.124.4
uvicorn[standard] 0.40.0
aioinject 1.10.2
dishka 1.7.2
dependency-injector 4.48.3
lagom 2.7.7
injector 0.24.0
fastapi-injector 0.9.0
svcs 25.1.0
that-depends 3.9.1
diwire 1.3.0

Feature Completeness

Not all libraries support the same features. Some required test simplifications and are marked with a in the tables and charts below. In general, these simplifications tend to favor the affected libraries because they skip work that fully modeled implementations still perform.

  • FastAPI: FastAPI DI is request-scoped by default; singletons are not a first-class DI concept. This benchmark uses the documented Depends + @lru_cache pattern for singletons, which is the recommended approach in the FastAPI documentation. See: Creating the settings only once with lru_cache.

  • Injector: Uses fastapi-injector for FastAPI integration. The library does not support async dependencies or request-scoped context managers, so services H and I are implemented as plain request-scoped objects (no enter/exit).

  • Lagom: Does not support async context managers. Service I (which should be an async iterator in the specification) is implemented as a sync iterator.

  • Dependency Injector: The resource lifecycle for H and I is not modeled per request. They are provided as context-local singletons without entering/exiting context managers.

Per-Request Injection Performance

Each request to /scoped creates new instances only for scoped services (C through I), without resolving any singleton services. This test emphasizes container lifecycle and graph traversal performance, as the container must create and tear down a dense dependency graph on every request.

In this benchmark, Wireup Class-Based operates at 99.38% of manual wiring throughput, and Wireup at 99.87%, placing both near the manual-wiring upper bound in this workload. For context, this corresponds to roughly 2.79x the throughput of FastAPI Depends and 1.29x the next closest library in this benchmark (Dishka).

Scoped Performance Scoped Performance

Project RPS (Median Run) P50 (ms) P95 (ms) P99 (ms) σ (ms) Mem Peak
Manual Wiring (No DI) 11,044 (100.00%) 4.20 4.50 4.70 0.70 52.93 MB
Wireup 11,030 (99.87%) 4.20 4.50 4.70 0.83 53.69 MB
Wireup Class-Based 10,976 (99.38%) 4.30 4.50 4.70 0.70 53.80 MB
Dishka 8,538 (77.30%) 5.30 6.30 9.40 1.30 103.23 MB
Svcs 8,394 (76.00%) 5.70 6.00 6.20 0.93 67.09 MB
Aioinject 8,177 (74.04%) 5.60 6.60 10.40 1.31 100.52 MB
diwire 7,390 (66.91%) 6.50 6.90 7.10 1.07 58.22 MB
That Depends 4,892 (44.30%) 9.80 10.40 10.60 0.59 53.82 MB
FastAPI Depends 3,950 (35.76%) 12.30 13.80 14.10 1.39 57.68 MB
Injector † 3,192 (28.90%) 15.20 15.40 16.10 0.58 53.52 MB
Dependency Injector † 2,576 (23.33%) 19.10 19.70 20.10 0.75 60.55 MB
Lagom † 898 (8.13%) 55.30 57.20 58.30 1.63 1.32 GB

Stability (Across Runs)

These values summarize all runs for each project in this test. Median P50/P95/P99 are the medians of those per-run latency percentiles, while Within ±3% shows the share of runs whose RPS stayed within 3% of that project's median-run RPS. Look for smaller Δ RPS, higher Within ±3%, and lower median tail latencies (P95/P99) for the most consistent behavior.

Project Min RPS Max RPS Δ RPS Within ±3% Med P50 (ms) Med P95 (ms) Med P99 (ms)
Manual Wiring (No DI) 10,910 11,112 1.84% 100.0% 4.20 4.50 4.70
Wireup 10,917 11,108 1.73% 100.0% 4.20 4.50 4.70
Wireup Class-Based 10,838 11,076 2.17% 100.0% 4.20 4.50 4.70
Dishka 8,466 8,639 2.02% 100.0% 5.30 6.30 9.40
Svcs 8,268 8,513 2.91% 100.0% 5.70 6.00 6.20
Aioinject 8,102 8,283 2.21% 100.0% 5.60 6.60 10.10
diwire 7,257 7,483 3.06% 100.0% 6.50 6.90 7.10
That Depends 4,817 4,968 3.09% 100.0% 9.80 10.20 10.60
FastAPI Depends 3,922 3,973 1.28% 100.0% 12.25 13.70 14.10
Injector † 3,136 3,225 2.79% 100.0% 15.20 15.40 15.90
Dependency Injector † 2,559 2,605 1.76% 100.0% 19.00 19.70 20.00
Lagom † 893 903 1.19% 100.0% 55.30 57.10 58.20

Time to Complete All Runs (Lower Is Better)

This aggregates measured request-phase runtime across all runs for each project in this test.

Project Total Time (HH:MM:SS) Total Time (s) + vs Fastest Avg Time / Run Runs
Manual Wiring (No DI) 07:33 452.97 +00:00 9.06s 50
Wireup 07:34 453.51 +00:01 9.07s 50
Wireup Class-Based 07:36 455.81 +00:03 9.12s 50
Dishka 09:45 585.49 +02:13 11.71s 50
Svcs 09:56 596.05 +02:23 11.92s 50
Aioinject 10:11 611.25 +02:38 12.23s 50
diwire 11:17 677.42 +03:44 13.55s 50
That Depends 17:03 1023.26 +09:30 20.47s 50
FastAPI Depends 21:06 1265.92 +13:33 25.32s 50
Injector † 26:06 1566.24 +18:33 31.32s 50
Dependency Injector † 32:20 1940.33 +24:47 38.81s 50
Lagom † 1:32:46 5566.48 +1:25:14 111.33s 50

Singleton Performance

Services are created once when the app starts and reused throughout. In this test, the endpoint injects Services A, B, and Settings from the graph. This tests the container's bookkeeping performance and how efficiently it can return existing instances.

In this benchmark, both Wireup Class-Based (99.93%) and Wireup (98.97%) operate very close to manual wiring throughput, showing very small overhead vs manual wiring in this workload. For context, this corresponds to roughly 2.15x the throughput of FastAPI Depends and 1.25x the next closest library in this benchmark (diwire).

Singleton Performance Singleton Performance

Project RPS (Median Run) P50 (ms) P95 (ms) P99 (ms) σ (ms) Mem Peak
Manual Wiring (No DI) 13,351 (100.00%) 3.40 3.60 3.90 0.72 52.98 MB
Wireup Class-Based 13,342 (99.93%) 3.40 3.60 3.80 0.73 53.73 MB
Wireup 13,214 (98.97%) 3.50 3.70 3.90 0.58 53.64 MB
diwire 10,532 (78.88%) 4.50 4.80 4.90 0.74 58.21 MB
Svcs 10,447 (78.25%) 4.50 4.80 5.00 0.75 67.02 MB
Injector † 10,269 (76.92%) 4.60 4.90 5.00 0.75 53.46 MB
Aioinject 10,219 (76.54%) 4.40 5.20 8.00 1.17 103.16 MB
Dishka 9,650 (72.28%) 4.70 5.30 8.20 1.21 105.03 MB
That Depends 7,792 (58.36%) 6.20 6.50 6.70 1.00 53.82 MB
Dependency Injector † 6,905 (51.71%) 6.80 7.30 7.70 0.60 60.42 MB
FastAPI Depends 6,153 (46.08%) 7.70 8.20 8.60 0.37 55.74 MB
Lagom † 2,936 (21.99%) 16.70 18.30 20.10 1.29 238.93 MB

Stability (Across Runs)

These values summarize all runs for each project in this test. Median P50/P95/P99 are the medians of those per-run latency percentiles, while Within ±3% shows the share of runs whose RPS stayed within 3% of that project's median-run RPS. Look for smaller Δ RPS, higher Within ±3%, and lower median tail latencies (P95/P99) for the most consistent behavior.

Project Min RPS Max RPS Δ RPS Within ±3% Med P50 (ms) Med P95 (ms) Med P99 (ms)
Manual Wiring (No DI) 13,169 13,460 2.18% 100.0% 3.40 3.70 3.80
Wireup Class-Based 13,168 13,439 2.03% 100.0% 3.40 3.70 3.80
Wireup 13,031 13,332 2.28% 100.0% 3.50 3.70 3.90
diwire 8,871 10,667 17.05% 98.0% 4.40 4.80 4.90
Svcs 10,291 10,536 2.34% 100.0% 4.50 4.80 5.00
Injector † 10,164 10,333 1.65% 100.0% 4.60 4.90 5.10
Aioinject 10,133 10,301 1.64% 100.0% 4.40 5.20 7.90
Dishka 9,551 9,732 1.88% 100.0% 4.70 5.30 8.20
That Depends 7,647 7,898 3.22% 100.0% 6.20 6.50 6.70
Dependency Injector † 6,852 6,991 2.00% 100.0% 6.80 7.30 7.70
FastAPI Depends 6,069 6,197 2.07% 100.0% 7.70 8.20 8.60
Lagom † 2,908 2,949 1.40% 100.0% 16.70 18.30 20.30

Time to Complete All Runs (Lower Is Better)

This aggregates measured request-phase runtime across all runs for each project in this test.

Project Total Time (HH:MM:SS) Total Time (s) + vs Fastest Avg Time / Run Runs
Manual Wiring (No DI) 06:15 374.52 +00:00 7.49s 50
Wireup Class-Based 06:15 375.27 +00:01 7.51s 50
Wireup 06:18 378.28 +00:04 7.57s 50
diwire 07:57 477.04 +01:43 9.54s 50
Svcs 07:59 478.80 +01:44 9.58s 50
Injector † 08:07 487.14 +01:53 9.74s 50
Aioinject 08:09 489.38 +01:55 9.79s 50
Dishka 08:38 518.09 +02:24 10.36s 50
That Depends 10:42 642.13 +04:28 12.84s 50
Dependency Injector † 12:04 724.05 +05:50 14.48s 50
FastAPI Depends 13:33 813.17 +07:19 16.26s 50
Lagom † 28:23 1703.37 +22:09 34.07s 50

Reproducibility

Prerequisite:

  • Install hey and ensure the hey binary is available on your PATH.

Run from repository root:

make bench

Enable workload-shape assertions:

make bench bench_assert=1

If you want to reproduce this exact run pass iterations=50 requests=100,000 to the make command.

This command reruns benchmarks and regenerates charts/tables/versions for this page.

Source Code

The benchmark code is available in the benchmarks/ directory.