H.323 and SIP: Which Won the Protocol War?
November 19, 2008
In the VoIP and videoconferencing space, this has been a question on peoples' minds for years. Which protocol should I use in my enterprise? Which protocol is best for transmitting audio or video? If I install a video system, which protocol would be most interoperable? If I want to create a new service provider, which protocol works best for my customers?
Today, nearly 13 years after the introduction of SIP and H.323, the question of which protocol won the protocol war still remains unanswered. The fact is, we will likely go forward with both protocols for a very long time. Why? The reasons are several as I will discuss in the following paragraphs.
First, let's take a step back and look at the predecessor to VoIP: the PSTN. The signaling protocol used in the PSTN today is Signaling System #7 (or, SS7). It was first published in 1980 by the ITU in the Q.700-series of Recommendations. Wow, that really sounds old. But, SS7 was just 16 years old when SIP and H.323 were first published. What is the life expectancy of a protocol? If SS7 was considered ancient in 1996, then by the same token, SIP and H.323 are both nearly ancient now. Even so, we still have not settled on a protocol that we can say won the protocol war.
Both H.323 and SIP have strengths and weaknesses and the protocol of choice should really be one that best suits your particular requirements. But, how do you decide which to use? Let's examine the history of each of the two mainstream VoIP protocols to help us find out.
H.323 was defined by the ITU as a protocol designed for enterprise video conferencing. It borrowed much of the rich features defined in the previous generation H.320 systems that were designed for ISDN, but added to that functionality that was only possible on an IP network. It introduced an architecture that actually proved to be very robust and highly scalable, which then led to the widespread deployment of VoIP world-wide. But, wait! H.323 was designed for video conferencing, yet it was used for VoIP? Yes, indeed. Today, H.323 is still the most widely deployed inter-carrier VoIP protocol, carrying billions of minutes of voice traffic around the world every month. It is also widely deployed within numerous carrier networks for transiting voice calls over long distances.
SIP was defined by the IETF as a protocol to enable end-to-end voice calls over the Internet. SIP has always been heralded as the protocol that will kill the PSTN and dominate the world, while at the same time it was touted as being a very simple, flexible protocol that anybody could employ in their products. For these reasons, there was immediate uptake of SIP and there have been hundreds, if not thousands of articles about this "simple, yet flexible" protocol. There is even a magazine devoted to SIP! Yet, with all of this attention, SIP failed miserably at accomplishing one very important function: enabling end-to-end voice calls over the Internet.
Rather, SIP was used only within small service providers like Vonage, where the PSTN was used as the means of interconnecting with other carriers. Fueled by the hype around SIP, hundreds of small start-ups tried to follow a similar model, most of which failed. The problem is that SIP was proving to be a bit more complicated than people were led to believe. In fact, SIP is complicated on many levels, from the verbosity of messages that must be exchanged over a network, to the idiosyncrasies brought about from the overly-flexible design, to fundamental encoding and decoding problems perpetuated by the error-prone text-based syntax that defined the protocol. (And, I will note that this text encoding was also touted as a virtue.)
While these issues were plaguing fixed-network providers, wireless carriers were struggling to get SIP to work over wireless networks. A tremendous amount of effort was expended trying to make that happen, with many wireless carriers deciding to put SIP plans on hold and use a cousin to H.323 called H.324. And, they did. Today, virtually all mobile phones that enable real-time video communication utilize H.324.
Meanwhile, fixed-network providers and wireless providers, determined to make VoIP work with SIP as it was promised, continued to defined protocol extensions and elaborate network architectures that, for all intents and purposes, recreate the PSTN on an IP network. That complex system, known as the IP Multimedia System (IMS), is a monster in terms of size and complexity. And, what do users get out of all of that complexity? Well, you can make a phone call at so many pennies per minute like you could with your old PSTN phone line. Feel the excitement?
Oh, but there is more. Recognizing that IMS was complex and unsuitable, some carriers decided to define alternative systems or, more accurately, profiles of SIP. There are at least two or three different standards organizations that are actively working on profiles of SIP, and they all define things slightly differently. And, so the interworking woes continue — but nobody has given up, yet! That persistence in the face of insurmountable challenge amazes me, to be honest.
But, I do not mean to suggest that SIP failed. On the contrary, it and H.323 are both widely deployed now. And, certain companies have created products called Session Border Controllers (SBCs) to address the need of bridging voice calls between these two incompatible systems. And, given enough time, I am sure that the proponents of IMS will mature the standards and products to a point where they are usable, and SIP can lay a claim to being the global standard for voice communication.
That said, video conferencing is seeing tremendous new growth and H.323 dominates in that space. Remember, H.323 was defined first for video conferencing, not for voice. And, it marries nicely with all of those wireless H.324 devices on the market today. So, with the growing H.323 installation base for both voice and video, H.323 is definitely positioned to stay around for quite some time. And, while SIP could deliver video capabilities, it is extremely weak and would require a lot of additional standardization work to catch up to H.323 in terms of video capabilities. In fact, some very fundamental video capabilities are not defined at all for SIP, leading vendors to implement incompatible proprietary solutions.
Recognizing that the existing H.323 and SIP protocols are now growing old and that adding new functionality is more challenging than before and deploying new complex architectures based on the now legacy systems will be even more challenging, the ITU decided to initiate work on the next-generation multimedia system called H.325 (or "Advanced Multimedia System", not to be confused with the aforementioned IMS). With H.323 and SIP now nearly 13 years old, the time is right. H.325 aims to enable video, of course, but also voice, application sharing, file exchange, and a host of other new capabilities that might be distributed over a multiplicity of devices. You can imagine sitting in a conference room with video, writing on an electronic whiteboard seen locally and remotely, and exchanging files by placing them on a "virtual table" for other participants to have. You might control your conference with your mobile phone, yet still have full HD video capabilities from the video equipment located in the conference room. The concept is bold, but the technologies to enable this kind of functionality are materializing.
H.325 may still be a few years away, but the future is starting to look clearer now. H.323 and SIP are here and they are here to stay, just as the now extremely old SS7 networks are still in operation. Equipment manufacturers must support both protocols in order to be successful and the protocol employed between any two points should be considered on a case-by-case basis. If the application is video, then H.323 is the clear choice in almost every case. If the application is voice, the choice is a little less clear. SIP excels in the residential market, largely due to the way in which it is deployed (using a specific profile of the protocol and special configuration in the carrier network). But, between carriers, it could equally be either protocol. Within the enterprise, it could also be either protocol. After all, the next hop out of the enterprise is likely an SBC that would serve to protect the carrier network and provide any necessary protocol translation. The dream of end-to-end SIP calls effectively dead.
The time now is to look toward the future. The war is over and largely irrelevant now. We know what VoIP has delivered, but we can do so much more. What new capabilities should we expect from the next multimedia system? I am looking for increased functionality that goes well beyond just enabling voice and video, taking communications to the next level.