Just before the holiday break I had an interesting chat with Elias Cohenca (someone I’ve mentored on and off over the last few years). He was about to participate in his 6th annual hackathon in the Tel Aviv office, and wanted to bounce a few ideas off me regarding technologies.
Our Tel Aviv office has held these hackathons – named Bazinga, after Sheldon Cooper’s catchphrase in the Big Bang Theory – since 2015. Elias has used the hackathon to try out some really interesting concepts, over the years, from adding BIM 360 issues into a mixed-reality environment using HoloLens to building synthesizers using the noises made by 3D printers and stepper motors. Last year Elias collaborated on Amit Diamant’s great geometry loader implementation that has since found its way into the Forge viewer.
The themes for Bazinga 2020 – which was the first “Home Edition”, given the current pandemic – related to improving our products, contributing to the Tel Aviv site and facilitating remote working/collaboration. Elias teamed up with Doreen Dvoretz and Amit Diamant to focus on two of these themes, to create a remote collaboration capability integrated directly into the Forge viewer.
Their concept was called Big Room, which would allow people to visualize construction projects before they’re built. It would enable collaboration between members of the design team and other stakeholders (clients, construction workers, etc.) in the context of a virtual model.
Elias was curious about what I’d done in this space, so I pointed him at Vrok-It, which uses socket.io to communicate between a presenter and session participants. I also mentioned the possibility of using Microsoft’s Fluid Framework, an emerging technology which might also provide some interesting capabilities.
Anyway, fast-forward to this week, when I had the chance to take a look at what the Big Room team had achieved. It turns out that they had managed to meet all of their MVP goals:
- Enable participants to easily join a shared virtual space within a 3D model.
- Allow different users to see each other as avatars.
- Get their name automatically from their login.
- Allow participants to point at objects with a laser beam.
- Create a type of shared-screen experience, where all participants can follow one of the avatars in a guided tour.
- Enable hiding / isolating objects as participants navigate the 3D space.
- Take measurements of objects and share with other participants.
They had even managed to achieve their stretch goal of grabbing the user’s face via their webcam and mapping it onto their avatar.
Elias joined me for a remote collaboration session and it was super-cool. I really liked the laser pointer that not only allows you to highlight items visually in the model but also to select them.
It was great to be able to get a virtual tour: one the avatars can share their viewpoint as they move around the model.
A shared view of measurements proved to be very useful, too.
While it was only ever intended as a “nice to have”, the mapping of faces onto avatars was strangely compelling. Here’s an animation that gives a sense of how it worked:
You may notice that my face is animated at a lower framerate than Elias’s: this is actually an artefact of me watching the discussion from a second tab (I’m logged in twice, and the main participant is the one watching – as it’s the tab that has focus in the browser – rather than the one I’m watching). The face of the watcher animated at the same rate as Elias’s, apparently.
Elias mentioned the team had gone with socket.io for the communication, as we had with Vrok-It, and that they’d opted to capture low-res face bitmaps using face-api.js. This is a browser-based computer vision library that uses machine learning (via tensorflow.js) to extract faces from images. They had considered going with WebRTC, but really wanted to do frame-by-frame face extraction to be able to map it onto the avatar more realistically. (Hans Kellner showed me a really cool demo he put together back in 2015 that maps WebRTC video onto the side of an avatar cube in the Forge viewer, so this is definitely an alternative the team is interested in exploring.) As face-api.js provides facial keypoints, it would be cool to use these to do a better mapping of faces onto the avatar, or even to consider a lower-bandwidth approach that animates a static face image via keypoints, much as NVIDIA has done with the Maxine SDK. One suggestion I made was to use the emotion information provided by face-api.js to change the colour of the avatar’s head based on what they appear to be feeling, but this could well prove to be annoying. It needs testing. :-)
Some work is still needed on performance and scalability, whether WebRTC is adopted or not: right now the frame-rate is limited – to maintain performance – to around 10 fps, for instance, and it does get choppy when there are lots of concurrent participants. That said, there are also a number of mitigation tactics that would be interesting to explore: at the moment the client ignores messages from participants not in their field of view (and beyond a certain distance), but the messaging overhead still impacts performance. Moving this to be performed server-side would stop the messages from even being sent, but would need much more state to be managed on the server. Socket.io does have the concept of rooms – right now the team uses one big room (haha) for all participants – and so mapping virtual rooms to physical ones might also help.
All-in-all this is a really fun project which actually has some real potential. The Big Room team won second prize (the first-placed project must have been incredible!), so congratulations to Doreen, Elias and Amit!
While they’re all now back at their day jobs, they’d love to hear from people who would be interested in seeing this developed further. If you’d like to talk more about this, please do get in touch – I’ll pass your feedback on to Doreen, Elias and Amit.