Scaling WebSocket Connections: Lessons Learned

Practical insights from scaling WebSocket infrastructure to handle 10K+ concurrent connections

WebSocket Scaling Redis Architecture

The Challenge

When our XR platform started growing, we hit the classic WebSocket scaling wall at around 5,000 concurrent connections. Here’s how we solved it.

Problem 1: Connection Limits

Each server instance has a limit on file descriptors. We hit this at ~4,000 connections per instance.

Solution: Increased ulimit settings and implemented horizontal scaling with a load balancer.

# /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535

Problem 2: Sticky Sessions

WebSocket connections need to stay on the same server, but traditional load balancing doesn’t account for this.

Solution: IP hash-based routing in our load balancer:

upstream websocket_servers {
    ip_hash;
    server ws1.example.com:8080;
    server ws2.example.com:8080;
    server ws3.example.com:8080;
}

Problem 3: Cross-Server Communication

Users on different servers needed to communicate with each other.

Solution: Redis Pub/Sub for message broadcasting:

// Publish message to all servers
await redis.PublishAsync("room:123", message);

// Each server subscribes and forwards to local clients
subscriber.Subscribe("room:*", (channel, message) => {
    BroadcastToLocalClients(channel, message);
});

Results

After implementing these solutions:

  • Scaled from 5K to 15K concurrent connections
  • 99.9% uptime over 6 months
  • Average latency reduced from 150ms to 45ms

Key Takeaways

  1. Plan for horizontal scaling from the start
  2. Use Redis Pub/Sub for cross-server messaging
  3. Monitor connection counts and memory usage closely