Designing a backend (cloud) server to avoid 'hotspot' scenarios

I'm trying to design a real-time group chat application specifically targeted towards large groups (>50 users) in each chatroom. Not all users will be actively chatting at once, but one can expect many users to simply idle/listen and receive updates as chats come into the chatrooms.

I've worked out a prototype that is not cloud-oriented and am in the process of redesigning for a cloud-based system.

I expect to have one 'redirecting/load balancing' server (LBServer) that redirects to a series of backend 'chat' servers (CServers). When the user requests to join a particular chatroom from the client, the client will connect to the LBServer and the LBServer will reply with the connection information for a particular CServer that maintains an instance of the chatroom in memory. Then the client will disconnect from the LBServer and connect to the CServer. This connection to the CServer is persisted for as long as the user remains in the chatroom. The CServer is responsible for updating a backend database that logs chatroom state as well as notify the other clients connected to itself of updates in the chatroom.

You can already envision if too many users exist in one chatroom (so one CServer must maintain persistent connections to all these users), that a 'hotspot' scenario will unfold if activity in the room increases past the threshold of the CServer's processing speed to keep up with all updates.

At this point, I've come up with one naive solution so that my system is still scalable. I could load up a larger CServer instance, copy over the state of the chatroom, and request all users in the 'hot' CServer to reconnect to the new larger instance. I don't believe this is the correct way to handle the scalability of such a system.

I have a few questions:

Given that I wish to the real-time nature of the chat, is there a more appropriate way to design my backend system to avoid having to persist connections to one server instance?

Do I even need to bother isolating each chatroom's processing to occur all on one CServer when I'm keeping track of state in a databaes already? I want to leave room open for users to be able to participate in multiple chatrooms simultaneously. If we use my current model, the client will have to maintain multiple connections to my cloud (one for each chatroom the user is in). This sucks for the client end. As a revision, I'm envisioning clients maintaining connections to 'universal' CServers that will listen for changes in chatrooms the users is currently in and update them accordingly.

All feedback and input would be extremely appreciated, and I would be glad to elaborate on anything that is unclear. Thanks.

9
задан animuson 15 November 2013 в 18:25
поделиться