September 2024 ~ October 2025
🌐 ACM UIST 2025 Adjunct
📺️ IEEE ISMAR 2025 Research Demonstration
Abstract
In remote collaboration systems, remote users often experience information asymmetry and limited interactivity when collaborating with on-site users using virtually reconstructed scenes of the physical environment. Real-time 360° camera streaming mitigates the narrow field-of-view limitations of conventional video conferencing systems by providing a wide-angle, fast-rendered view; however, the lack of depth information still restricts active and free spatial exploration. Conversely, offline CAD-based scene reconstruction allows free navigation but requires substantial time and cost to produce. To address these issues, this study introduces 3D Gaussian Splatting (3DGS) — a learning-based neural rendering technique capable of rapidly and accurately reconstructing large-scale physical environments with high responsiveness. CrossGaussian integrates real-time 360° video streaming and 3DGS-based large-scale scene reconstruction through an automated pipeline, thereby presenting the first room-scale remote collaboration design space that enables free-viewpoint exploration and novel visual interactions in remote collaborative environments.
Introduction
CrossGaussian is a first-author project I led from topic ideation to publication during my junior winter to senior fall in an HCI research lab. I defined the research direction, conducted literature reviews, designed the system, ran user studies, and presented the work at international conferences. In the early phase, I analyzed 20+ top-tier papers from CHI, UIST, and CVPR focused on remote collaboration systems, 3D reconstruction, and AI-based novel view synthesis. This review revealed the limitations of photogrammetry, NeRF, and Instant-NGP in remote collaboration, particularly high computational cost, slow processing, and limited interactivity. Based on this, I identified 3D Gaussian Splatting (3DGS) as a promising approach due to its explicit rendering and real-time performance. I independently designed an end-to-end prototype pipeline and developed it with co-authors. After building the prototype, I conducted a user study with 24 participants, collecting data via NASA-TLX, SUS, and custom questionnaires. Insights from early feedback led me to refine the research focus toward defining and exploring a design space for 3DGS-driven visualization and interaction techniques in remote collaboration. The work culminated in acceptance to the ACM UIST 2025 Poster Session and the ISMAR 2025 Demo Session, where I demonstrated the system live for three days. CrossGaussian demonstrates my ability to bridge cutting-edge AI rendering technology with human-centered design, delivering a functioning system that advances remote collaboration through HCI-driven insights.
In co-located collaboration, participants can freely move, explore, and interact within the shared physical space. In remote collaboration, however, this autonomy is significantly constrained. Remote users depend on the on-site collaborator to look behind objects or change viewpoints within a video feed, increasing communication burden, causing unnecessary coordination, and ultimately limiting interaction. Some prior work mounts cameras on robotic platforms to provide spatial context, but despite this advantage, such systems often induce simulator sickness for remote users. Therefore, enabling free exploration of the physical environment for remote collaborators remains an open challenge.
RESEARCH
Real-time 360° video streaming partially mitigates viewpoint limitations by providing a wide field of view; however, the lack of depth information prevents users from actively understanding spatial structure or estimating object distance. While manually created 3D models offer another alternative, fully modeling every space is inefficient and costly. To address this, recent remote collaboration research has explored progressive reconstruction using camera-based photogrammetry. Yet, because this approach relies on image-based Structure-from-Motion (SfM) to produce surface-centric meshes, it suffers from limitations in resolution, accuracy, and responsiveness. More recently, Neural Radiance Fields (NeRF) have been adopted for remote collaboration, but their high computational cost still makes them unsuitable for real-time interactive environments.


In contrast, the recently emerging camera-synthesized view–based neural rendering approach, 3D Gaussian Splatting (3DGS), represents a scene as a collection of numerous Gaussian primitives (defined by position, color, covariance, etc.) to enable fast rendering. Unlike NeRF, which encodes a scene implicitly through neural networks, 3DGS adopts an explicit and computation-efficient structure optimized for rapid processing. Owing to these characteristics, 3DGS offers significantly faster training speed, higher rendering performance, and superior scalability to large-scale or dynamic environments compared to NeRF. Building on this, our study integrates real-time contextual 360° video streaming with the fast, precise, and responsive capabilities of 3DGS for remote collaboration. Furthermore, we explore a room-scale design space for remote environment exploration and interactive techniques based on this integration.
SYSTEM ARCHITECURE
DESIGN IMPLEMENTATION

Leveraging the explicit scene representation structure of 3DGS and its precise depth rendering capabilities at room scale, we explored a design space to enhance explorability and interactivity in remote collaboration environments. Inspired by existing Cross-Reality scene blending research, we designed the following key features for remote collaboration.
Blending of Overlapping Scenes
Abrupt transitions between real-time streaming and 3DGS scenes can cause motion sickness and reduced presence. To mitigate this, our system implements a feature that visually separates yet blends 3DGS scenes with 360-degree video streaming. By adjusting the transparency of each overlaid scene and applying color scaling techniques, users can maintain real-time environmental context (360-degree stream) while simultaneously exploring with free viewpoints (3DGS). This reduces cognitive load during context switching and preserves presence. The overlay structure also enables visual distinction of non-salient regions through color scaling of 3DGS scenes or pixel value adjustments of 360-degree footage. Similar to Gruenefeld et al.'s adjustable scene blending, users can customize their optimal collaboration experience by controlling the blending ratio between 3DGS and 360-degree video to balance realism and exploration freedom.
Occlusion-Aware Exploration
Automatic Occlusion Detection and Visualization: Because the camera in a remote space captures the scene from a single viewpoint, there is a fundamental limitation in that it cannot visualize areas occluded by structures such as walls or pillars. For example, when an on-site worker needs to inspect equipment located behind a column, the 360° camera alone cannot reveal the hidden region. To address this, our system utilizes the 3D spatial information of the 3D Gaussian Splatting (3DGS) model to automatically detect and visualize occluded areas based on the camera’s position within the physical environment. Specifically, the system first computes which regions are blocked by surrounding structures relative to the current position and orientation of the 360° camera in the 3DGS model. It then compares the depth values of adjacent pixels to estimate each pixel’s precise depth and pseudo-normal direction, determining which parts correspond to shadows or occluded regions. Using Unity compute shaders and HLSL, the system performs real-time GPU-based shadow computation to quickly identify these occluded areas and visually highlight them for the user. Through this approach, remote users can intuitively perceive and utilize the camera’s viewpoint coverage in the on-site environment for more effective collaboration.
See-Through Capability: Our system provides see-through capabilities for remote 3D environments by leveraging depth information inherent in the 3DGS model. While photogrammetry relies on mesh-based representations with fixed surfaces, making transparency control difficult, Gaussian Splatting uses 3D Gaussians with alpha values, enabling natural semi-transparent rendering through alpha blending at the rendering stage. This allows users to see through objects and directly inspect spaces beyond them without complex viewpoint manipulation, enabling intuitive exploration and novel interactions without information loss caused by occlusion.
This research was accepted as a first-author poster at ACM UIST (ACM Symposium on User Interface Software and Technology) 2025, one of the most prestigious conferences in user interface and interaction technology. It was also accepted for the demo session at IEEE ISMAR (International Symposium on Mixed and Augmented Reality) 2025, the world's leading conference in augmented and mixed reality, where we conducted live demonstrations for three days. At both conferences, we received significant interest and positive feedback from renowned researchers in the HCI field worldwide and experts from global companies regarding the system's real-time performance, practical utility in remote collaboration, and the innovative nature of our 3DGS-based approach.
MATERIALS
This project has been published as an Adjunct Proceedings paper in the ACM Digital Library. To access the Full Paper, please click the image on the right to be redirected to the publication page. The paper is available for free under Open Access.

















