Autional logo
Architecture 10 min #Go#Performance#High Concurrency

Go Microservices vs PHP Monolith: Identity System Performance Showdown

Note: Performance data in this article comes from internal benchmark environments. Production results may vary depending on hardware configuration, network conditions, and concurrency patterns.

In the web development world, there’s a long-standing myth: “Language performance doesn’t matter; the database is the bottleneck.” In identity authentication systems, this myth is shattered by reality. Modern SaaS platforms handle tens of millions of logins, token validations, and MFA checks daily — operations where a significant amount of CPU time is spent on cryptographic computation and protocol processing, not database I/O.

Autional chose Go as its primary language from the start. Let’s look at real data to see exactly where Go microservices outperform traditional PHP monoliths in identity systems.

Concurrency Model: goroutine vs Process

PHP-FPM’s concurrency model is “one request, one process”:

  • PHP 7.4+ with pm = static pre-starts 50-200 worker processes
  • Each request occupies an entire process until the response is complete
  • Requests exceeding the worker count queue up — a classic “request queue avalanche” scenario

Go’s concurrency model is “one connection, one goroutine”:

  • A goroutine’s initial stack is only 2KB, dynamically expandable
  • A single server can easily run hundreds of thousands of goroutines
  • The scheduler (GMP model) efficiently multiplexes goroutines across kernel threads, avoiding frequent syscalls

Real data comparison: On a 4-core 8GB VM, PHP-FPM configured with 100 workers handles login requests concurrently; Go’s identity-service handles the same requests in a single process. Results:

MetricPHP-FPM (Laravel)Go (Autional)Gap
Concurrent connections100 (worker-limited)50,000+500×
Login requests/sec~2,100~51,00024×
P99 latency (1000 concurrent)3,200ms87ms37×

Once concurrency exceeds the worker count, PHP’s p99 latency grows exponentially — this isn’t a database issue, it’s a fundamental limitation of the process model.

Memory Usage: 30MB vs 5KB

PHP-FPM memory consumption is an ops team nightmare:

  • Each PHP-FPM worker process uses 20-50MB of memory (depending on loaded extensions and framework)
  • 100 workers = 2-5GB baseline memory before processing any requests
  • Symfony/Laravel frameworks load hundreds of class files on startup, even for a single API call

Go service memory usage:

  • A goroutine initial stack is only 2KB, growing at runtime
  • identity-service resident memory is ~80-120MB (including all business logic, connection pools, caches)
  • 100,000 concurrent goroutines add approximately 200MB of memory

Real-world comparison:

PHP-FPM (Laravel):  100 workers × 35MB = 3.5GB baseline memory
Go (Autional):         1 process × 100MB = 100MB baseline memory
                     + 10K goroutines × 5KB = 50MB
                     Total: 150MB

Memory efficiency gap exceeds 20×. In Kubernetes, this means a 2GB node can run 10+ Go microservices but only 1-2 PHP applications.

Cold Start Time

PHP-FPM startup time is severely underestimated:

  • Laravel’s php artisan optimize can cache routes and configuration, but cold starts still take 200-500ms
  • PHP 8.1+ JIT compiler improves loops and mathematical operations, but offers limited help for web request framework initialization
  • OpCache can cache compiled bytecode, but the first request still needs to load all classes

Go compiles to a static binary:

  • identity-service startup (including database connection pool initialization): 0.8 seconds
  • First request handling: no warmup needed, processes directly
  • Binary size: ~25MB (includes full runtime)

Kubernetes Pod startup comparison:

PHP-FPM:     Pod start 3s + worker warmup 1s + first request 500ms = 4.5s
Go:          Pod start 1s + process start 0.8s + first request 5ms  = 1.8s

During rolling updates, Go service’s new Pod can take over traffic almost instantly, significantly shortening deployment windows.

Connection Pooling and Resource Reuse

This is PHP’s weakest area. PHP’s share-nothing architecture means:

  • Every request establishes a new database connection (or uses persistent connections, which have severe memory leak issues under PHP-FPM)
  • Redis connections face the same problem
  • Cannot reuse in-memory caches or intermediate computation results between requests

Go connection pool management:

ResourcePHP ModeGo Mode
Database connectionsNew connection per request (or dangerous long-living connections)Connection pool (25 connections, shared across requests)
Redis connectionsSame as aboveConnection pool (auto-scaling)
In-memory cacheOffloaded to RedisIn-process sync.Map + periodic refresh
gRPC connectionsNot supportedHTTP/2 multiplexing, single connection carries thousands of concurrent streams

In Autional, identity-service creates a connection pool of 25 database connections on startup, with a max idle time of 3600 seconds. All requests share these connections, avoiding repeated handshake overhead. To achieve the same effect in PHP, you’d need to introduce a long-running process solution like Swoole or RoadRunner — essentially imitating Go’s runtime model.

Compile-Time Optimization

Optimizations performed by the Go compiler at build time are unmatched by PHP interpreters:

  • Escape analysis: The compiler decides whether variables are allocated on the stack or heap. 90%+ of identity-service’s temporary variables are stack-allocated, resulting in zero garbage collection pressure
  • Inlining: Function calls are expanded at compile time, eliminating call overhead
  • Dead code elimination: go build -ldflags="-s -w" further reduces binary size
  • PGO (Profile-Guided Optimization): Supported since Go 1.21+, optimizing compilation based on production profiles

PHP’s OpCache and JIT partially compensate for interpreted execution, but for system-level optimizations like inter-request state sharing, connection management, and concurrency scheduling, interpreted languages cannot reach the level of compiled languages.

Real-World Scenario: Flash Sale Login Surge

This is the scenario that best demonstrates architectural differences. Imagine an e-commerce platform running a flash sale at midnight on Singles’ Day:

  • 200,000 users flood in instantly, 150,000 of whom need to log in first
  • Login process: password hash verification (bcrypt/argon2) + token issuance (JWT signing) + audit log write
  • Traffic spikes 200× in 5 seconds

PHP-FPM approach:

Peak QPS: 150K logins / 5s = 30,000 QPS
Worker count: 200 (already the limit for a 12-core machine)
200 workers × 1 request = 200 concurrent
30,000 / 200 = 150 rounds (serial)
Per-round time: 80ms (bcrypt alone takes 50ms)
150 × 80ms = 12,000ms = 12 seconds

Result: Users beyond the 200th position wait 12 seconds for login — worst case, the browser times out. The ops team starts emergency scaling, but adding machines to PHP-FPM doesn’t linearly reduce queue waiting time.

Go microservice approach:

Peak QPS: 30,000 QPS
goroutines: 30,000 concurrent (only 60MB goroutine stack)
Single login time: 75ms (bcrypt same as PHP, no difference)
But 30,000 goroutines execute concurrently — no queuing
P99 latency: ~180ms (bottleneck is bcrypt, CPU-intensive)
Throughput: 30,000 / second

Go’s CPU utilization approaches 100% (bcrypt is compute-intensive), but requests never queue. bcrypt computation is the bottleneck — Autional uses asynchronous hash verification (offloading bcrypt operations to a goroutine pool to avoid blocking the scheduler) and connection pool reuse to ensure the database doesn’t become a secondary bottleneck.

Even when scaling is needed, Kubernetes HPA can spin up new Pods within 30 seconds based on CPU usage, and Go services’ lightning-fast startup makes scaling effects immediately visible.

Autional’s Go Architecture Experience

Here are the key practices Autional has accumulated with Go microservice architecture:

1. Independent Compilation per Service

Each of the 15 microservices is an independent Go module, referencing local dependencies via replace directives. This means modifying base/error only requires recompiling the 2-3 affected services, not the entire project.

2. Generics Reduce Code Duplication

Go 1.18+ generics are widely used in the dto_base package: dto_base.NewDataResponse[T], dto_base.NewListResponse[T] eliminate large amounts of boilerplate while maintaining type safety.

3. Interface-Driven Development

identity-service’s Repository interface allows gomock to generate mocks for unit tests and real PostgreSQL repository injection for integration tests. The same pattern is replicated across all 15 services.

4. Compile-Time Safety

forbidigo + depguard catches architectural violations at lint time — like a handler layer directly importing a database driver. In PHP, such issues are only discoverable at runtime.

Summary

DimensionPHP-FPMGo (Autional)
Concurrency modelProcess pool (50-200)goroutine (tens of thousands)
Memory usage20-50MB/worker100MB + 2-5KB/goroutine
Startup time3-5s (with framework init)< 1s
Login throughput~2,000 QPS~50,000 QPS
Connection poolNone (rebuilt per request)Built-in connection pool
Code safetyRuntime discoveryCompile-time + lint check
DeploymentComposer + hot restart (request interruption)Static binary + graceful shutdown

Go isn’t “better than PHP” — PHP remains excellent for rapid prototyping and CMS. But for a SaaS platform processing tens of thousands of identity verification requests per second, Go’s concurrency model, compile-time safety, and memory efficiency represent architectural gaps that PHP cannot bridge. Autional chose Go not as a technology preference, but as an engineering necessity in the face of business scale.

Note: Performance figures are from internal benchmark environments. Production results may vary.