An Introduction to the getUserMedia API

In the mid-90s, chat was one of the best products available on the web. Raise your hand if you were young and thought how cool it would be to develop your own chat application. One of their best features was their ability to capture microphone audio and/or video from a webcam, and send it over the Internet. To implement these features, developers have relied on plugins like Flash and Silverlight for a long time. However, Flash and Silverlight can be a problem if you don’t have the proper permissions or you’re not tech-savvy. Today, such plugins aren’t required anymore thanks to the WebRTC project and its related APIs. This article will introduce the getUserMedia API, one of the APIs derived from the WebRTC project.

What’s the getUserMedia API

The getUserMedia API provides access to multimedia streams (video, audio, or both) from local devices. There are several use cases for this API. The first one is obviously real-time communication, but we can also employ it to record tutorials or lessons for online courses. Another interesting use case is the surveillance of your home or workplace. On its own, this API is only capable of acquiring audio and video, not sending the data or storing it in a file. To have a complete working chat, for example, we need to send data across the Internet. This can be done using the RTCPeerConnection API. To store the data we can use the MediaStreamRecorder API. The getUserMedia API is amazing for both developers and users. Developers can now access audio and video sources with a single function call, while users don’t need to install additional software. From the user perspective, this also means a decrease in the time to start using the feature, and also an increased use of the software by non tech-savvy people. Although the getUserMedia API has been around for a while now, as of December, 30^th 2013 it’s still a W3C Working Draft. So, the specifications may be susceptible to several changes. The API exposes only one method, getUserMedia(), that belongs to the window.navigator object. The method accepts as its parameters an object of constraints, a success callback, and a failure callback. The constraints parameter is an object having either one or both the properties audio and video. The value of these properties is a Boolean, where true means request the stream (audio or video), and false does not request the stream. So, to request both audio and video, pass the following object.

{
  video: true,
  audio: true
}

Alternatively, the value can be a Constraints object

. This type of object allows us to have more control over the requested stream. In fact, we can choose to retrieve a video source at high resolution, for example 1280×720, or a low one, for example 320×180. Each Constraints object contains two properties, mandatory and optional. mandatory is an object that specifies the set of Constraints that the UA must satisfy or else call the errorCallback. optional, is an array of objects that specifies the set of Constraints that the UA should try to satisfy but may ignore if they cannot be satisfied. Let’s say that we want audio and video of the user, where the video must be at least at a high resolution and have a framerate of 30. In addition, if available, we want the video at a framerate of 60. To perform this task, we have to pass the following object.

{
  video: {
    mandatory: {
      minWidth: 1280,
      minHeight: 720,
      minFrameRate: 30
    },
    optional: [
      { minFrameRate: 60 }
    ]
  },
  audio: true
}

You can find more information on the properties available in the specifications. The other two arguments to getUserMedia() are simply two callbacks invoked on success or failure, respectively. On success, the retrieved stream(s) are passed to the callback. The error callback is passed a MediaError object containing information on the error that occurred.

Browser Compatibility

The support for the getUserMedia API is decent on desktop but quite poor on mobile. Besides, the majority of the browsers that support it, still have the the vendor prefixed version. Currently, the desktop browsers that implement the API are Chrome 21+ (-webkit prefix), Firefox 17+ (-moz prefix), and Opera 12+ (unsupported from version 15 to 17) with some issues in older versions. On mobile browsers, only Chrome 21+ (-webkit prefix), and Opera 12+ (-webkit prefix from version 16) support the API. Also note that if a page containing the instructions to work with this API is opened through the file://

protocol in Chrome, it won’t work. The case of Opera is really interesting and deserves a note. This browser implemented the API but for an unknown (to me) reason, after the switch to the Blink rendering engine in version 15, they didn’t support it anymore. Finally, the API support was restored in version 18. As if it was not enough, Opera 18 is the first version to support the audio stream too. That said, we can ignore the compatibility issues thanks to a shim called getUserMedia.js. The latter will test the browser and if the API isn’t implemented, it fallbacks to Flash.

Demo

In this section I’ll show you a basic demo so that you can see how the getUserMedia API works and concretely see its parameters. The goal of this demo is to create a “mirror”, in the sense that everything captured from the webcam and the microphone will be streamed via the screen and the audio speakers. We’ll ask the user for permission to access both multimedia streams, and then output them using the HTML5 video element. The markup is pretty simple. In addition to the video element, we have two buttons: one to start execution and one to stop it. Regarding the scripting part, we first test for browser support. If the API isn’t supported, we display the message “API not supported”, and disable the two buttons. If the browser supports the getUserMedia API, we attach a listener to the click event of the buttons. If the “Play demo” button is clicked, we test if we’re dealing with an old version of Opera because of the issues described in the previous section. Then, we request the audio and video data from the user’s device. If the request is successful, we stream the data using the video element; otherwise, we show the error that occurred on the console. The “Stop demo” button causes the video to be paused and the streams to be stopped. A live demo of the code below is available here.

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    <title>getUserMedia Demo</title>
    <style>
      body
      {
        max-width: 500px;
        margin: 2em auto;
        font-size: 20px;
      }

      h1
      {
        text-align: center;
      }
         
      .buttons-wrapper
      {
        text-align: center;
      }

      .hidden
      {
        display: none;
      }

      #video
      {
        display: block;
        width: 100%;
      }

      .button-demo
      {
        padding: 0.5em;
        display: inline-block;
        margin: 1em auto;
      }

      .author
      {
        display: block;
        margin-top: 1em;
      }
    </style>
  </head>
  <body>
    <h1>getUserMedia API</h1>
    <video id="video" autoplay="autoplay" controls="true"></video>
    <div class="buttons-wrapper">
      <button id="button-play-gum" class="button-demo" href="#">Play demo</button>
      <button id="button-stop-gum" class="button-demo" href="#">Stop demo</button>
    </div>
    <span id="gum-unsupported" class="hidden">API not supported</span>
    <span id="gum-partially-supported" class="hidden">API partially supported (video only)</span>
    <script>
      var videoStream = null;
      var video = document.getElementById("video");

      // Test browser support
      window.navigator = window.navigator || {};
      navigator.getUserMedia = navigator.getUserMedia       ||
                               navigator.webkitGetUserMedia ||
                               navigator.mozGetUserMedia    ||
                               null;

      if (navigator.getUserMedia === null) {
        document.getElementById('gum-unsupported').classList.remove('hidden');
        document.getElementById('button-play-gum').setAttribute('disabled', 'disabled');
        document.getElementById('button-stop-gum').setAttribute('disabled', 'disabled');
      } else {
        // Opera <= 12.16 accepts the direct stream.
        // More on this here: http://dev.opera.com/articles/view/playing-with-html5-video-and-getusermedia-support/
        var createSrc = window.URL ? window.URL.createObjectURL : function(stream) {return stream;};

        // Opera <= 12.16 support video only.
        var audioContext = window.AudioContext       ||
                           window.webkitAudioContext ||
                           null;
        if (audioContext === null) {
          document.getElementById('gum-partially-supported').classList.remove('hidden');
        }

        document.getElementById('button-play-gum').addEventListener('click', function() {
          // Capture user's audio and video source
          navigator.getUserMedia({
            video: true,
            audio: true
          },
          function(stream) {
            videoStream = stream;
            // Stream the data
            video.src = createSrc(stream);
            video.play();
          },
          function(error) {
            console.log("Video capture error: ", error.code);
          });
        });
        document.getElementById('button-stop-gum').addEventListener('click', function() {
          // Pause the video
          video.pause();
          // Stop the stream
          videoStream.stop();
        });
      }
    </script>
  </body>
</html>

Conclusion

This article has introduced you to the WebRTC project, one of most exciting web projects in recent years. In particular, this article discussed the getUserMedia API. The possibility of creating a real-time communication system using the browser only and very few lines of code is terrific and opens a lot of new opportunities. As we’ve seen, the getUserMedia API is simple yet very flexible. It exposes just one method, but its first parameter, constraints, allows us to require the audio and video streams that better fit our application’s needs. The compatibility among browsers isn’t very wide, but it’s increasing, and this is good news! To better understand the concepts in this article, don’t forget to play with the provided demo. As a final note, I strongly encourage you to try to change the code to perform some task, for example applying a CSS filter to change how the video stream is shown.

Frequently Asked Questions about getUserMedia API

What is the getUserMedia API and how does it work?

The getUserMedia API is a part of the WebRTC (Web Real-Time Communication) technology that allows access to the user’s media devices such as a webcam or microphone. It works by prompting the user for permission to access these devices. Once permission is granted, the API returns a MediaStream object that contains the live media data. This data can then be used in various ways, such as displaying a live video feed on a webpage or recording audio.

How can I use the getUserMedia API in my web application?

To use the getUserMedia API in your web application, you first need to call the navigator.mediaDevices.getUserMedia() method. This method takes a constraints object as a parameter, which specifies the types of media to request. For example, to request both video and audio, you would pass {video: true, audio: true}. The method returns a Promise that resolves to a MediaStream object.

What are the constraints in getUserMedia API?

Constraints in getUserMedia API are the parameters that you can use to specify the type and quality of media you want to access. For example, you can specify whether you want to access video or audio, the resolution of the video, the frame rate, and so on. These constraints are passed as an object to the getUserMedia() method.

What happens if the user denies permission to access their media devices?

If the user denies permission to access their media devices, the Promise returned by the getUserMedia() method will be rejected with a PermissionDeniedError. Your code should handle this error appropriately, for example by informing the user that they need to grant permission for the application to work correctly.

Can I use the getUserMedia API on all browsers?

The getUserMedia API is supported by most modern browsers, including Chrome, Firefox, Safari, and Edge. However, it is not supported by Internet Explorer. You can check the current level of support on the Can I Use website.

How can I handle errors when using the getUserMedia API?

Errors when using the getUserMedia API can be handled using the catch() method of the Promise returned by the getUserMedia() method. This method takes a function as a parameter, which will be called with an error object if an error occurs.

Can I use the getUserMedia API to record video or audio?

Yes, you can use the getUserMedia API to record video or audio. Once you have a MediaStream object, you can use the MediaRecorder API to record the media data.

What is the MediaStream object in the getUserMedia API?

The MediaStream object represents a stream of media content. It contains one or more MediaStreamTrack objects, each of which represents a stream of audio or video data.

Can I change the constraints of a MediaStream after it has been created?

No, the constraints of a MediaStream cannot be changed after it has been created. However, you can stop the stream, change the constraints, and then create a new stream with the updated constraints.

Can I use the getUserMedia API in a mobile web application?

Yes, the getUserMedia API can be used in a mobile web application. However, the exact level of support may vary depending on the mobile browser and the specific media devices on the device.