top of page

James Gardner

University of York

iGGi PG Researcher

I am a second-year PhD student at The University of York, supervised by Dr William Smith. My research focuses on deep inverse rendering for photorealistic augmented reality, specifically on designing models that can infer properties of the world, such as illumination, reflectance, and shape, from a small number of visual observations. I'm particularly interested in implicit neural representations, generative models and self-supervised learning. I hold an MEng in Electronic Engineering from The University of York, for which I was awarded the IET Prize for outstanding performance and the Malden Owen Award for the best graduating student on an MEng programme. I am currently on a research internship at Toshiba Computer Vision Research Group in Cambridge, working on SLAM applications using Neural Fields such as NeRF.

A description of James' research:

Humans are extraordinary at understanding their physical world. For example, when entering a new environment, we instantly understand any objects in the scene, including their positions, materials and uses. We can also predict what the scene would look like from another unseen perspective. Furthermore, we can model the intents of other dynamic actors within that environment. Our brains draw on prior knowledge to reason and make these inferences. This level of scene understanding is one of the grand challenges of artificial intelligence and would unlock exciting applications in autonomous robotic navigation and augmented reality (AR). In recent years, deep neural networks have shown great success in many supervised tasks, for example, object detection, classification and image segmentation. However, this supervised learning requires large amounts of labelled data that can be prohibitively expensive or impossible to obtain and does not capture the complete information present in a scene. On the other hand, humans learn complex scene understanding without direct supervision for perception. My research is around self-supervised scene representation learning, developing algorithms that consume images of an environment and convert these into compact representations for use in downstream tasks such as AR and robotics without human labelled data. Primarily I'm focused on self-supervised inverse rendering, estimating a scene's shape, material properties and lighting from a small number of images.


Other links







bottom of page