gRPC Security Practices for Internal Service Communication
One of the core challenges of microservice architecture is enabling secure and efficient communication between services. The REST + JSON approach may seem simple, but it exposes numerous problems in inter-service communication scenarios: high serialization overhead, lack of strong type constraints, and difficulty with streaming. Autional’s choice: REST for external, gRPC for internal.
Why gRPC for Internal Calls?
Efficiency Comparison: Protobuf vs JSON
Suppose identity-service needs to return user information to compliance-service:
JSON (REST):
{
"user_id": "01ARZ3NDEKTSV4RRFFQ69G5FAV",
"email": "user@example.com",
"display_name": "张三",
"roles": ["admin", "developer"],
"created_at": "2026-05-13T10:30:00Z"
}
Raw payload: approximately 180 bytes, requiring JSON encode/decode on every call.
Protobuf (gRPC):
message GetUserResponse {
string user_id = 1;
string email = 2;
string display_name = 3;
repeated string roles = 4;
google.protobuf.Timestamp created_at = 5;
}
Serialized: approximately 80 bytes (binary), no parsing overhead.
For tens of thousands of internal calls per second (authentication, permission checks, data validation), Protobuf’s serialization efficiency directly translates to lower CPU usage and faster response times.
Strongly Typed Contracts
REST API contracts are “documentation + convention”—Swagger/OpenAPI standardizes the description, but cannot verify at compile time whether the caller passed the correct parameter types.
gRPC contracts are .proto files—guaranteed at compile time:
- Caller and server generate code from the same
.protofile - Field type errors are caught at compile time
- New fields don’t affect existing callers (Protobuf backward compatibility)
- Deprecated fields marked
reservedcause compile errors if reused
In Autional, all .proto files are generated uniformly by scripts/generate-proto.ps1, and check-grpc-compliance.py in the CI pipeline ensures generated code is consistent with proto definitions—eliminating runtime bugs like “the doc says accept int, but the code passes string.”
Streaming
REST struggles to elegantly handle large data transfers:
- compliance-service exports audit logs: requires pagination API (
?page=1,?page=2…), n+1 HTTP calls - audit-service pushes real-time alert events: requires WebSocket or SSE, adding protocol complexity
gRPC natively supports four communication modes:
Unary: Request→Response (traditional RPC)
Server Streaming: Request→Streaming Response (large data export)
Client Streaming: Streaming Request→Single Response (batch upload)
Bidirectional: Bidirectional streams (real-time alerts, conversations)
In the compliance report export scenario, compliance-service calls audit-service’s ExportAuditLogs method, audit-service pushes data in batches via Server Streaming, and compliance-service writes to CSV as it receives—without waiting for the full dataset to load into memory.
Autional’s gRPC Security Architecture
Transport Security: TLS / mTLS
Autional’s internal gRPC communication enables TLS by default:
grpc:
enabled: true
port: 12018
tls:
enabled: true
cert_file: "/certs/server.crt"
key_file: "/certs/server.key"
ca_file: "/certs/ca.crt"
Upgraded to mTLS (mutual authentication) in production: each service has its own client certificate, and the server verifies the caller’s identity. This prevents unauthorized internal calls—even if an attacker breaches network isolation, they cannot call gRPC endpoints without a valid certificate.
Authentication: JWT + API Key Dual Mode
Internal inter-service calls have two authentication scenarios, and Autional supports both modes:
JWT (User Context Propagation):
When the gateway forwards a user request to internal services, the user_id and tenant_id from the JWT token are passed downstream via gRPC metadata:
// Injected in the gRPC client interceptor
md := metadata.Pairs(
"authorization", "Bearer "+token,
"x-tenant-id", tenantID,
"x-user-id", userID,
)
ctx := metadata.NewOutgoingContext(ctx, md)
API Key (Service-to-Service Trust):
For internal calls that don’t carry user context (e.g., scheduled tasks triggering compliance scans), a pre-provisioned API Key is used:
md := metadata.Pairs("x-api-key", internalAPIKey)
ctx := metadata.NewOutgoingContext(ctx, md)
Unified Interceptor Chain
Autional’s gRPC server is created via the grpc_mw.NewServer factory method, which auto-injects a four-layer interceptor chain:
Client Request
↓
[Recovery] ← panic recovery, prevents a single request from crashing the entire service
↓
[Logging] ← records method, duration, status
↓
[Metrics] ← Prometheus metrics: request_count, latency_histogram
↓
[Auth] ← validates JWT or API Key, injects user_id/tenant_id
↓
Business Handler ← actual gRPC method implementation
The Auth interceptor automatically skips health check endpoints (/grpc.health.v1.Health/*), ensuring Kubernetes liveness probes are always reachable:
// Health check whitelist inside grpc_mw.NewServer
if info.FullMethod == "/grpc.health.v1.Health/Check" ||
info.FullMethod == "/grpc.health.v1.Health/Watch" {
return handler(ctx, req) // skip auth
}
Full-Link Tracing: OpenTelemetry
Inter-service call chains are complex, and debugging latency issues requires full-link tracing. All Autional gRPC calls are injected with W3C Trace Context:
Client Side:
conn, err := grpc.NewClient(addr,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithStatsHandler(otelgrpc.NewClientHandler()),
)
Server Side: grpc_mw.NewServer auto-injects otelgrpc.NewServerHandler(). This way, when a request from the gateway traverses the gRPC call chain, Jaeger displays the complete trace:
Gateway → [identity-service: GetUser] → [profile-service: GetProfile] → [compliance-service: CheckCompliance]
↑ Span: 45ms
↑ Span: 12ms
↑ Span: 8ms
↑ Trace: 3a2b1c4d5e6f...
Each Span records the caller service name, method name, status code, and duration. When P99 latency spikes, you can quickly identify which downstream service method is slowing down the overall response.
In Practice: Compliance Scan Authentication Chain
Using compliance-service executing GDPR data export as an example, here’s the complete gRPC call chain:
1. Admin initiates export request (HTTP → gateway)
2. Gateway forwards to compliance-service (HTTP)
3. compliance-service calls identity-service (gRPC):
→ GetUser(user_id) → gets user basic info
→ ListUserRoles(user_id) → gets role list
4. compliance-service calls profile-service (gRPC):
→ GetProfile(user_id) → gets extended attributes
5. compliance-service calls audit-service (gRPC):
→ ExportAuditLogs(user_id, stream) → streams audit logs
6. compliance-service assembles data → generates export file → uploads to storage-service
Steps 3-5 are all gRPC calls, each carrying the same Trace ID. If GetUser in step 3 fails, compliance-service can quickly return an error (rather than timing out) and log the failing gRPC status code:
level=ERROR msg="gdpr export failed" user_id=01ARZ...
error="rpc error: code = NotFound desc = user not found"
step=get_user grpc_code=NotFound
gRPC vs REST Division in Autional
Autional does not recommend using gRPC for end-user-facing APIs:
| Scenario | Approach | Reason |
|---|---|---|
| Browser → Backend | REST + JSON (gateway proxy) | Browsers can’t call gRPC directly, need grpc-web proxy |
| Mobile → Backend | REST + JSON | Adding gRPC layer offers limited value for mobile |
| Service-to-Service | gRPC + Protobuf | Highest efficiency, type safety, streaming support |
| Third-party API | REST + OAuth 2.0 | Industry standard, ecosystem compatibility |
| Webhook Callback | HTTP POST + JSON | Easy for receivers to process |
| Real-time Push | WebSocket / SSE | Browser-friendly |
Summary
gRPC’s role in Autional internal communication can be summarized as:
- Efficiency: Protobuf binary serialization, 50%+ smaller payload than JSON, lower CPU overhead
- Security: TLS/mTLS transport encryption + JWT/API Key dual-mode authentication + unified interceptor chain
- Reliability: Compile-time type safety, backward-compatible proto changes, CI-enforced consistency
- Observability: OpenTelemetry full-link tracing, each Span records method name, status code, and duration
If you have more than 5 microservices and inter-service calls are becoming frequent—now is the best time to introduce gRPC.