Abstract
For augmented (AR) and virtual reality (VR) applications, accurate estimates of the acoustic characteristics of a scene are critical for creating a sense of immersion. However, directly estimating Room-impulse Responses (RIRs) from scene geometry is often a challenging, data-expensive task. We propose a method to instead infer spatially-distributed acoustic parameters (such as C50, T60, etc) for an entire scene from lightweight information readily available in an AR/VR context. We consider an image-to-image translation task to transform a 2D floormap, conditioned on a calibration RIR measurement, into 2D heatmaps of acoustic parameters. Moreover, we show that the method also works for directionally-dependent (i.e. beamformed) parameter prediction. We introduce and release a 1000-room, complex-scene dataset to study the task, and demonstrate improvements over strong statistical baselines.