Simulating depth perception with face tracking
Motion parallax is created by the apparent relative motion of objects when an observer moves. It's one of the monocular cues that enable depth perception. This demo simulates this phenomenon thanks to TensorFlow.js and three.js.
Just launch the demo with Chrome, Firefox or Edge on your webcam-equipped laptop or desktop and wait a little. You can use the drop-down list or add ?object=1 or ?object=2 at the end of the URL to try other 3D models.
Detecting the orientation of the camera/face axis
I take advantage of TensorFlow.JS' Face Landmarks Detection model, which provides the location of 468 face landmarks from the webcam's video stream. I just keep a single one of them, which is located between the two eyes, and compute the corresponding azimuthal and polar angles. With these coordinates, I can then move a three.js camera and render the 3D scene from the proper angle.
The resulting image is very stable, which suggests that the model's predictions are pretty accurate. I also tried a simpler and faster model (and you can too by adding ?blaze=true) but this makes the demo a bit too jittery.
Detecting the distance between the face and the camera
The Face Landmarks Detection model also optionally predicts the location and shape of the irises. Since the human iris' size is remarkably constant, we can then estimate the distance between the camera and the eyes if we know the webcam's focal length. This distance should be taken into account to determine the field of view of the three.js camera. However, this estimate is too noisy and I didn't use it (but you can still see that by yourself with ?distanceMethod=1). I also tried to use the distance from the facial landmark between the two eyes to one on the forehead (such distance should be globally invariant to facial expressions and left/right rotations of the head). This was a bit more stable but still unsatisfying. At the end of the day, I assumed that the distance between the observer's face and the camera is constant.
All this can only work for one observer since the screen can only display one picture at a time.
Moreover, I should theoretically not just position the three.js camera where the observer's face was detected. I should also warp the resulting image to account for the fact that the camera plane and the screen's plane aren't parallel. I didn't do it but I don't think it makes a big difference given the relatively narrow field of view of most webcams.
Besides, it'd be more efficient to compute the predictions in a web worker. Unfortunately enough, I understand it's only possible to use TensorFlow.js and WebGL in a worker when OffscreenCanvas is available, which is currently not the case for some browsers, e.g. Firefox.
Thanks to Google, the TensorFlow.JS community and the three.JS community for making available such cool libraries. I used Discover three.js, a great interactive book by Lewy Blue, to get started with three.js. Thanks also to my friend Fabien for his feedback.
Credits for the 3D models: