Scaling gRPC Server Streaming to 100K Subscribers on GKE (Java Spring Boot Reactive)

admin管理员组
文章数量:1024354

I'm building a gRPC server streaming application using Java, Spring Boot, and reactive programming that acts as a wrapper around Google Cloud Pub/Sub. The application has two methods: /publish and /subscribe. I expect a small number of publishers and a large number of subscribers (80K - 100K).

Here's how it works:

Publishers use the /publish method to send messages to a specific topic.
The MQ server publishes the message to the specified Pub/Sub topic.
The MQ server also subscribes to the same topic and receives the message.
The MQ server then streams the message to all connected subscribers.

Setup:

gRPC Server: Java with Spring Boot and reactive programming.
GKE: e2-standard-4 machine type.
OS Limits: Increased ulimit to 20000.

The Problem:

I'm using the below script to simulate a large number of subscribers locally:

#!/bin/bash

# Number of requests to execute in this instance
NUM_REQUESTS=$1
SUBSCRIBER_OFFSET=$2

# Loop to start NUM_REQUESTS grpcurl requests in the background
for i in $(seq 1 $NUM_REQUESTS); do
  subscriber_id=$((SUBSCRIBER_OFFSET + i))

  grpcurl -cert /home/client-cert.pem -key /home/client-cert-key.pem \
  -import-path /home \
  -proto contract.proto \
  -d "{\"topic_name\": \"test\", \"subscriber_id\": \"subscriber-$subscriber_id\"}" \
  mq.abc:443 SubscriptionService.Subscribe &

done

echo "All grpcurl connections have been started in the background."

When I try to establish more than 5,000 connections, I get the following error:

Failed to dial target host "mq.abc:443": context deadline exceeded

Active connections also start dropping after a while.

Troubleshooting:

Increased ulimit to 20000 to allow more open files/sockets.
Increased the connection timeout on the client side with -connect-timeout option.

Questions:

Could the context deadline exceeded error be related to network latency, request timeouts, or server overload?
Are there specific configurations for Spring Boot Reactive gRPC to handle a large number of concurrent streams efficiently?
Are there limits on concurrent connections imposed by GKE's networking or my Ingress configuration?
What strategies can I use to optimize gRPC performance for a large number of subscribers in a reactive Spring Boot application (e.g., connection pooling, flow control)?
Are there alternative solutions I should consider for this use case, given the high fan-out requirement?

Any help would be greatly appreciated!

Here's how it works:

Publishers use the /publish method to send messages to a specific topic.
The MQ server publishes the message to the specified Pub/Sub topic.
The MQ server also subscribes to the same topic and receives the message.
The MQ server then streams the message to all connected subscribers.

Setup:

gRPC Server: Java with Spring Boot and reactive programming.
GKE: e2-standard-4 machine type.
OS Limits: Increased ulimit to 20000.

The Problem:

I'm using the below script to simulate a large number of subscribers locally:

#!/bin/bash

# Number of requests to execute in this instance
NUM_REQUESTS=$1
SUBSCRIBER_OFFSET=$2

# Loop to start NUM_REQUESTS grpcurl requests in the background
for i in $(seq 1 $NUM_REQUESTS); do
  subscriber_id=$((SUBSCRIBER_OFFSET + i))

  grpcurl -cert /home/client-cert.pem -key /home/client-cert-key.pem \
  -import-path /home \
  -proto contract.proto \
  -d "{\"topic_name\": \"test\", \"subscriber_id\": \"subscriber-$subscriber_id\"}" \
  mq.abc:443 SubscriptionService.Subscribe &

done

echo "All grpcurl connections have been started in the background."

When I try to establish more than 5,000 connections, I get the following error:

Failed to dial target host "mq.abc:443": context deadline exceeded

Active connections also start dropping after a while.

Troubleshooting:

Increased ulimit to 20000 to allow more open files/sockets.
Increased the connection timeout on the client side with -connect-timeout option.

Questions:

Could the context deadline exceeded error be related to network latency, request timeouts, or server overload?
Are there specific configurations for Spring Boot Reactive gRPC to handle a large number of concurrent streams efficiently?
Are there limits on concurrent connections imposed by GKE's networking or my Ingress configuration?
What strategies can I use to optimize gRPC performance for a large number of subscribers in a reactive Spring Boot application (e.g., connection pooling, flow control)?
Are there alternative solutions I should consider for this use case, given the high fan-out requirement?

Any help would be greatly appreciated!

本文标签： Scaling gRPC Server Streaming to 100K Subscribers on GKE (Java Spring Boot Reactive)Stack Overflow

版权声明：本文标题：Scaling gRPC Server Streaming to 100K Subscribers on GKE (Java Spring Boot Reactive) - Stack Overflow 内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://it.en369.cn/questions/1745591667a2157926.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

369IT编程

Scaling gRPC Server Streaming to 100K Subscribers on GKE (Java Spring Boot Reactive) - Stack Overflow

更多相关文章