Don’t try this at home: Why software engineers shouldn’t build chat in-house
Before I helped co-found Sendbird as the Chief Technology Officer, I built chat as a feature for two products. Both were consumer apps for families that required a real-time social component. In the first product, users were using comment threads as though they were chat rooms, so we decided to build chat in order to update the popular model of online community, based almost exclusively on forum posting, to be more immediately engaging. Both projects were not successful. This article will share the lessons I learned trying and, ultimately, failing to build chat for two products.
When consulting with Sendbird clients and partners, I often find myself asking technology companies whether they’re in the business of chat or in the business of marketplaces, communities, on-demand services, or media. If chat is not your core product, it’s likely it’s not better to build it in-house.
Is building chat simple?
As I learned through experience, software engineers tend to approach building chat and messaging with optimistic expectations that are later disappointed by the necessity to build rich moderation and to compete with feature-rich messenger apps – features that add to the complexity of managing and maintaining server traffic.
One of the primary mistakes I see in developers trying to implement chat in-house is either reaching for a polling based solution or totally misunderstanding that managing a large number of persistent connections is totally different from scaling a request / response REST API. Typically, when a software engineer wants to model a chat server for their product, they’ll quickly come up with some simple pseudocode:
User A sends a message to the server; the server forwards the message to its recipient, user B.
This is a highly simplified server-side pseudo-code. And although this pseudo-code is quite simple, it contains the essential component of chat: sending and receiving messages to and from other users.
After a few quick improvements to the pseudo-code, you might think that building chat is simple. After all, the pseudo-code already accomplishes the essential function.
Now, you might move on to chat’s cutting-edge technology or the “fun” work like network programming, distributed computing at a large scale, and features like natural language processing (NLP) and an AI-assisted chatbot. Excited by the prospect, engineers might decide to build their own chat system in-house.
This was my experience to a tee – long before I started Sendbird. I was optimistic both times and both times I failed.
The only UIKit you need.
Bad alternatives: Low-level messaging libraries and Managed services
Technically, both versions of chat worked – barely. But I didn’t build either chat feature well enough to successfully and significantly impact the business metrics important to our apps at that time – primarily, retention and engagement as the engines for better user acquisition.
To be honest, after the failure of the first chat project, I thought the root cause of the failure was working with an overly complex library. I used a low level messaging library that would afford me a lot of flexibility in the implementation and was popular for its performance, but I discovered that it only had very limited support for mobile devices and required much more time to build. Even though it technically supported mobile, it was extremely unstable and I had to put in a lot of effort to fix the instabilities for the client side.
So when I started the second project, I chose a simple managed service to focus on the most important part of the chat. A managed service is better than a low-level library in some respects. It takes care of communication, back-end stability, and multi-platform support. I simply followed some instruction to integrate and add more chat features on the top of their platform. But adding more chat features was actually more difficult than I expected. It solved an underlying connectivity problem but that you had to replicate the chat architecture across each platform in exactly the same way, increasing execution risk, maintenance burden and ultimately TCO.
When we released both projects to production, they failed to attract users to the messaging feature. Basically, modern consumer messaging applications such as iMessage, Facebook Messenger, and WhatsApp have set the bar for quality and any chat implementation that does not meet or exceed these expectations is doomed to be perceived as poorly made.
I made the mistake of assuming that users will use chat and keep using it because the core component – sending and receiving messages – is available and reliable. What’s apparent now, however, is that consumers now take the features of the popular messaging apps for granted and see them as the standard for a satisfying chat experience.
The challenge: competing with messengers for quality and with your architecture for quantity
About six billion people are using messengers at least once every month across the world. There are several leading messengers competing in the market and each one has evolved to satisfy user needs over many years. This gives them a significant head start over any business that wants to add chat functionality to their application. Such a huge lead means that chat systems that are built in-house simply cannot provide the same quality of experience that users expect — and deserve.
Then, there’s quantity – of users, messages, features and more. Every feature must be carefully designed to cooperate with other features. For example, typing indicators must be aware of users sending a message or going idle so that other participants don’t see it spinning forever.
More often than not, engineering managers realize they need to re-design the system architecture repeatedly to accommodate new features. Product teams recognize that they’re no longer creating new innovative features. Instead, they’re benchmarking existing messengers and replicating those features in their own project.
Example 1. Read Receipts
Take read receipts as an example. These indicate whether a recipient of a message has read it or not. It’s a must-have feature. Basically, every single messenger supports it.
The read receipt is actually trickier to build than it looks because every user’s read status for every conversation must be kept in the server forever, or for some period of time, and the status must be synced to the devices of other users when they’re online. In addition, the read receipt must be updated for every message from each member of the conversation. All told, the chat system needs to be tested to handle 10x-100x more event traffic for every message sent. To accomplish this, you might need to redesign your architecture.
Example 2. Preventing abuse
It’s even tougher after releasing chat to production. Once users start to chat, they sometimes receive unwanted communication – cyberstalking, inappropriate content sharing, or spamming.
2.1. User blocks and bans
Cyberstalking and inappropriate content sharing are frequent types of abuse. Basically, the abuser continues to send the unwanted message or inappropriate content to the target user.
This type of abuse is easily solved by implementing a user block feature, which, when used by the abuser’s target, does not allow the abuser to message their target. This works well for a targeted user. However, if the abuser targets users randomly, the user block feature is no longer useful. Once they’re blocked, the abuser can easily move to the next target.
Enter banning users. The ban user feature helps eliminate this kind of random abuse. It prevents the abuser from accessing chat, when they are banned.
But the problem gets more complicated, again, if the abuser creates multiple accounts to avoid a single ban.
Multiple account creation requires a different approach. You can ban them by making them invisible. Abusers can do all their abusive activities, but none of their activity is visible to other users. Everyone’s happy.
2.2. Preventing Spam
Spamming is another type of abuse. Spammers typically try to advertise illegal sites or spread viruses or malware through a given link within the message. They usually run bots to send these spam messages. It can be difficult to detect because spammers have a lot of experience dealing with different types of anti-spam features from other platforms. So they try to send spam messages at a very low speed to avoid suspicious activity detection. There’s no silver bullet or filter to prevent all spam messages. It requires cobbling together a lot of different anti-spam tools and policies to eliminate most spamming attempts.
Engineering teams must immediately prevent these types of abuse to keep users from having a toxic experience of the chat feature. But these anti-spam and anti-abuse features are not easy to build because they require various filters and policies to block all abusive activity effectively without blocking normal user activity. It might take weeks to months to develop.
Meanwhile, you risk poor adoption of chat once users start to complain about abuse and spam. It’s not only hurting user experience, but also impacting the chat adoption in the crucial early stages of the release. This could lead to a failed feature quickly.
Everything depends on the chat server
While the team works long days preventing abuse, rather than adding more “desirable” features for the product, they also need to dedicate more resources to operate their chat servers. As the number of users increase, the chat server must be carefully monitored and maintained. Or, further, you might need to rebuild the architecture or conduct performance optimization.
If high availability for the server was not considered at the design stage, it’s time to start. Since chat is real-time communication, any type of outage might impact the user experience severely.
Needless to say, it requires extra effort and resource, too, if your company is compliant with any ISO27001, SOC2, HIPAA, or GDPR.
This type of follow-up development and operating after the production release will easily add weeks or months of development for the team’s sprint schedule.
Building chat in-house transformed
At this stage, chat is completely different than at first thought, when the engineering team decided to build in-house. It has probably taken quite a long time to build only a very limited feature set compared to existing messengers in the market. After the release, it requires a lot more resources to maintain it.
Since the chat feature requires network connectivity and real-time communication, many experienced software engineers were brought into the project who could have focused on adding more valuable features to your core product. Instead, they spent their time “reinventing the wheel.”
I learned all this from a lot of effort and personal experience before I helped found Sendbird. It required more effort and resources than I expected and I only ended up providing a very basic set of features. This after a long time implementing chat.
Don’t re-invent the wheel
Don’t make the same mistakes I made — not every lesson needs to be learned the hard way. The next time you need to incorporate chat or messaging into your product or service, don’t waste time reinventing the wheel. Choose Sendbird and focus on delivering the features and experiences that make your application unique and special.