Introduction
Online live streaming is more and more popular during these days. To build a kind of that system from scratch may be complex, so finding a way to solve the problem is necessary. The good news is by using some new infrastructures the goal can be easily achieved.
Concepts
WebRTC provides a mechanism for browsers to start real-time peer-to-peer communication. Mediums can be carried during communication progress are those videos, audios, and some other kind of data. Since the newly version of Firefox, Chrome, and other browsers already supported this feature, so WebRTC is extremely meet our need.
Design
To start living show, hosts need to set up video and audio devices, then create a room waiting for some audiences to watch their them. To follow a host and watch his or her lives, the audience only needs to log in and enter the host’s room number. To gather media stream from host end, we can use HTML5 media APIs, to transmit and receive those streams, we can use WebRTC, so mix these together, we get a live system, easy and efficiency.
Technical considerations
The detail implementation is simple and compact, pictures in MDN demonstrated those necessary steps we need to follow in order to complete a WebRTC communication. As pictures illustrated, the core steps are those related to information exchanging, known as signaling, a signaling server must be designed in the hurry so we can create a path for clients to communicate with each other.
To exchange data between servers and clients, both AJAX and WebSocket can be adopted. To simplify the overall progress, we pick WebSocket here, because it’s bidirectional and easy to implement. To make each step clear, I decide to use Erlang and Cowboy. I will not use any library related to the communication itself.
Details
In this section, I’ll write down all steps needed to set up a WebRTC communication. I’ll also demonstrate each step in detail, show some codes when necessary.
Callers start to create a room and live his or her show
When callers decide to live, she or he need to enter room number and click the live button. After the live button has been clicked, client codes running in callers’ browser call
1 | navigator.getUserMedia(constraints, successCallback, errorCallback); |
to ask callers for permissions to access local media devices. Once permissions having been obtained, client codes can access media stream in callback function passed to navigator.getUsermedia. After that, client codes will save this stream into a JavaScript object and play this media through video tag.
Having obtained and played media stream, callers must tell server they are willing to request a room for a living show. To achieve the goal, the client will send the server a WebSocket message conjunction with a room number and some other information like user identities.
Once the server receives the message, it will extract the room number and look up the database to see if the room already existed, If the room requested exists, the server will reject the request and pass some error information to the client.
After passing room number unique checking, the server will create a record including room number and creator’s process id for the room. the record is an Erlang record structure which has the following definition:
1 | -record(session, {room :: integer(), |
Having done this progress, the room has been successfully created, and the server will reply a message to the room creator to notify the successful creation of the room.
Audiences join a room and watch the show
When an audience wants to watch a live show, he or she must supply a room number. Once the audience enters a room number, the identity, the JavaScript codes running on the browser will use WebSocket API’s send method to send a message containing those user inputted values to the server.
After receiving the join request from the audience, the server will check if the room had been created, if not, the server will reject the request and send back an error message to the audience.
If the room does exist, the server will insert the audience’s process ID (remember every client connected to the server through WebSocket will reside in an independent process) to the audience’s field of that room. In this case, the audience has been joined into the room and the server must notify the room’s owner that someone is interested in his or her show. Since WebSocket connections are bidirectional, It’s easy for the server directly sending messages to the client. The server only needs to use Erlang‘s built-in ability to start inter-process message passing, the callee will receive that message in the callback function registered to Cowboy, finally, the message will directly send to the browser.
In the next steps, I’ll demonstrate how signals be exchanged through signaling server.
Caller transmits its session descriptor
For two WebRTC peers to communicate, they must be negotiated. The negotiation progress involving two peers to exchange their session descriptor. Once the caller received a message the server sent to notify incoming clients, it must create a session descriptor and pass it to the client through the signaling server. To create a session descriptor, the server first creates an RTCPeerConnection object for that incoming client, the calls RTCPeerConnection interface’s createOffer() method to create a session descriptor. Once the descriptor has been created, the caller will first set that descriptor as its local descriptor by calling RTCPeerConnection.setLocalDescription() method then send the signaling server a message containing that descriptor.
As for the signaling server, when the time it has received the caller’s descriptor, it has to transmit the descriptor to the specificated callee. To achieve the goal, it must first find the callee’s PID by searching the database and the message passing the descriptor to that process, the descriptor will finally be transmitted to the callee.
Callees transmit their session descriptor to the caller
When caller’s descriptor has been transmitted to the callee, the callee will call RTCPeerConnection.setRemoteDescription() method to set the descriptor sent by the caller as its remote descriptor. To complete the negotiation progress, the callee also needs to transmit its own descriptor, this accomplished by calling RTCPeerConnection.createAnswer() method to create the descriptor set it as its local descriptor by calling RTCPeerConnection.setLocalDescription() and sending it to the caller through the signaling server. The final step the caller needs to take is to set the descriptor as its remote session descriptor by calling RTCPeerConnection.setRemoteDescription() method. At this phase, the negotiation progress has been finished, the next steps are exchanging peers’ addressing information.
Peers exchanging their ICE candidates
To complete setting signaling paths, peers need to obtain their address information and exchanging it with each other. To obtain network information, peers need to specify STUN/TURN server information in creating RTCPeerConnection objects. The address information then can be accessed in the onicecandidate callback function of RTCPeerConnection interface. Once peers can access those ICE candidates, they need to exchange them through signaling server immediately. After finishing those steps above, clients can now watch living show and chat.
Conclusion
WebRTC provides a mechanism for browsers exchanging media streams and data in a most convenient way. Since the WebRTC standard provides a set of API and a brief flow, the only thing we need to concentrate on is to construct a signaling server whose only purpose is to exchange necessary information need by peers to set up communication paths.
References
Resources
The source code of this article can be found on my Github Repo.