2nd Workshop on Urban Scene Modeling

Where Vision Meets Photogrammetry and Graphics - CVPR 2025

CVPR Workshop Urban Scene Modeling 2025

3D reconstruction and modeling are central to modern computer vision, with rapid urbanization creating urgent social and environmental challenges that demand innovative solutions. While substantial progress has been made in reconstructing basic 3D structures from images and point clouds, the next frontier lies in advancing structured and semantic reconstruction—moving beyond low-level 3D information to produce high-level, parametric models that capture both the structure and semantics of human environments.

This workshop aims to bridge the gap between state-of-the-art 3D scene modeling and structured, semantic 3D reconstruction by bringing together researchers from photogrammetry, computer vision, generative models, learned representations, and computer graphics. Through invited talks, spotlight presentations, workshop challenges, and a poster session, we will foster interdisciplinary interaction and empower the next generation of 3D reconstruction technologies by integrating techniques from multi-view learning, geometric modeling, and machine perception. We welcome original contributions in structured reconstruction, learning-based approaches, and all areas related to urban scene modeling.

News

  • March 31, 2025: Notification to Authors! 🎉
  • March 24, 2025: Paper submission deadline on CMT! ❗
  • Feb 8, 2025: Call for papers online 📝
  • Dec 20, 2024: Workshop is accepted for CVPR 2025 🎉

Keynote Speakers

Daniel G. Aliaga

Daniel G. Aliaga

Associate Professor of Computer Science, Purdue University

Daniel Aliaga is an Associate Professor in the Department of Computer Science at Purdue University. He holds a Bachelor of Science degree from Brown University and a Ph.D. from the University of North Carolina at Chapel Hill. Prof. Aliaga's research interests encompass urban modeling, reconstruction, and procedural and parametric modeling techniques. He is a pioneer in inverse procedural modeling for urban spaces, aiming to facilitate semi-automatic and controllable content creation and editing of large-scale geometric models. His interdisciplinary work bridges computer science with urban planning, architecture, meteorology, and more, focusing on urban visual computing and artificial intelligence tools to improve urban ecosystems and enable "what-if" exploration of sustainable urban designs. Dr. Aliaga has an extensive publication record and has held visiting positions at institutions such as ETH Zurich, INRIA Sophia-Antipolis, and KAUST.

Despoina Paschalidou

Despoina Paschalidou

Senior Research Scientist, NVIDIA Toronto AI Lab

Despoina Paschalidou is a Senior Research Scientist at the NVIDIA Toronto AI Lab, based in Santa Clara, California. She completed her Ph.D. at the Max Planck ETH Center for Learning Systems, where she was advised by Andreas Geiger and Luc Van Gool. Following her Ph.D., she served as a Postdoctoral Researcher at Stanford University under the supervision of Leonidas Guibas. Dr. Paschalidou's research focuses on developing representations that can reliably perceive, capture, and recreate the 3D world to facilitate seamless human interaction. She has worked on a range of problems, including 3D reconstruction of objects using interpretable primitive-based representations, generative models for objects, scenes, and videos, as well as indoor scene synthesis. Her work aims to advance the state of the art in generative modeling and contribute to the development of technologies that allow for better understanding and interaction with 3D environments.

Federico Tombari

Federico Tombari

Research Scientist and Manager, Google Zurich; Lecturer, Technical University of Munich (TUM)

Federico Tombari is a Research Scientist and Manager at Google Zurich, where he leads an applied research team in computer vision and machine learning. He is also affiliated with the Faculty of Computer Science at the Technical University of Munich (TUM) as a lecturer (Privatdozent). Dr. Tombari's research focuses on 3D computer vision, including areas such as 3D scene understanding, object recognition, 3D reconstruction and modeling, and simultaneous localization and mapping (SLAM). His work has significant applications in robotics, augmented reality, autonomous driving, and healthcare. He is actively involved in the academic community, serving as an Area Chair for leading conferences like CVPR and NeurIPS, and as an Associate Editor for the International Journal of Robotics Research (IJRR). Dr. Tombari has an extensive publication record and is known for his contributions to the advancement of 3D computer vision.

Jonathan Li

Jonathan Li

Professor of Geospatial Data Science, University of Waterloo

Jonathan Li is a full professor at the University of Waterloo, holding appointments in the Department of Geography and Environmental Management and the Department of Systems Design Engineering. His research specializes in urban remote sensing and geospatial data science, with a focus on the automated extraction of geometric and semantic information from Earth observation images and LiDAR point clouds using artificial intelligence algorithms. Prof. Li's recent work involves generating high-definition maps and digital terrain models to support the development of digital twin cities and autonomous vehicles. He is an elected Fellow of several prestigious organizations, including the Institute of Electrical and Electronics Engineers (IEEE), the Royal Society of Canada Academy of Science, the Canadian Academy of Engineering, and the Engineering Institute of Canada. Currently, he serves as the President of the Canadian Institute of Geomatics and is the Editor-in-Chief of the International Journal of Applied Earth Observation and Geoinformation.

Peter Wonka

Peter Wonka

Full Professor of Computer Science, King Abdullah University of Science and Technology (KAUST)

Peter Wonka is a Full Professor in the Computer Science Program at King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. His research interests lie in deep learning for visual computing, encompassing computer vision, machine learning, and computer graphics. Prof. Wonka focuses on topics such as generative models, 3D representation learning, geometry processing, and large-scale urban modeling. He is particularly interested in deep learning techniques applied to visual computing tasks, including generative adversarial networks, 3D reconstruction, and neural fields. Prior to joining KAUST, Prof. Wonka held positions at notable institutions and has a strong background in both mathematical and computational aspects of visual computing. He emphasizes impactful research and collaborates closely with students and postdoctoral researchers to advance the field.

Vasileios Balntas

Vasileios Balntas

Research Science Manager, Reality Labs Research

Vasileios Balntas is a Research Science Manager at Reality Labs Research, where he focuses on machine perception-related problems. His work involves developing advanced perception algorithms to enable machines to understand and interact with the physical world effectively. Prior to his current role, Dr. Balntas was the Head of Research at Scape Technologies, specializing in large-scale visual localization and mapping solutions. He also held a position as a Postdoctoral Researcher at the University of Oxford. With a strong background in computer vision and machine learning, Dr. Balntas has contributed to advancements in scene understanding and has worked on projects such as SceneScript. His experience bridges both academic research and industrial applications, aiming to bring cutting-edge machine perception technologies to real-world use cases.

Call for Papers

We invite submissions of original research related to urban scene modeling. Topics of interest include, but are not limited to:

  • Structured 3D Reconstruction/Modeling of human environments, including indoor and outdoor spaces, from sparse, noisy, or partial point clouds and images
  • Semantic, Instance, and Panoptic Segmentation and Parsing of 3D point clouds and images in complex human-centered environments
  • Fusion of Images and Point Clouds to improve the accuracy, detail, and structure of human-centric 3D scene modeling
  • Structured Representation of 3D Scenes, including parametric (e.g., CAD, B-Rep, Wireframe, etc.) models for buildings, interiors, and other human-made structures
  • Neural Implicit Representations for efficient and scalable modeling of large human environments
  • Learning Priors for Structured 3D Modeling, focusing on generating plausible, real-world human environments with structural consistency
  • Generative Models for Occlusion-Free Image Generation, enabling realistic texture mapping and enhanced 3D model quality
  • Multiview 3D Matching and Registration techniques for capturing and integrating scans of complex human spaces
  • Pose Estimation and Structured 3D Recovery from sparse image sets, with applications in architecture, robotics, and other domains
  • Differentiable Rendering and Occlusion Reasoning in human environments, such as indoor spaces, cityscapes, and public infrastructure
  • Applications of Structured and Semantic 3D Reconstruction in smart cities, construction, autonomous navigation, and digital twins
  • Benchmarks and Datasets for large-scale 3D modeling of human environments, driving new challenges and setting the standard for the field

We will accept submissions on two tracks: extended abstract submissions (up to 4 pages) and full papers (up to 8 pages) in the standard CVPR format. Accepted submissions will be presented as posters, and some will be selected for spotlight talks.

Paper submission deadline: March 24, 2025. via CMT

More details about the submission process will be announced soon.

Where to Submit

All submissions will be handled through the CMT platform and submitted via this link

Submit Here

How to Submit

Authors should submit their papers through CMT. Each submission must include:

  • The paper in PDF format following the CVPR format
  • Supplementary materials (optional)
  • Authors' information including names, affiliations, and contact details

Submission Guidelines

  • Papers must be submitted in PDF format
  • Papers must follow the CVPR format
  • Extended abstracts should be up to 4 pages (excluding references)
  • Full papers should be up to 8 pages (excluding references)
  • Supplementary material is allowed but should be limited to 100MB
  • All submissions must be in English

Author Guidelines

  • At least one author of each accepted paper must register for the workshop and present the paper
  • Authors are responsible for ensuring that their submissions do not violate any publication agreements or copyrights
  • Double submissions are not allowed
  • Papers will be reviewed by at least two expert reviewers

Important Dates

  • Paper submission deadline: March 24, 2025
  • Notification to authors: March 31, 2025
  • Camera-ready deadline: April 7, 2025

Acknowledgment: The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.

Challenges

2nd Building3D Challenge

Building3D Challenge

As part of this workshop, we are hosting the Building3D challenge. Building3D is an urban-scale publicly available dataset consisting of more than 160 thousand buildings with corresponding point clouds, meshes, and wireframe models covering 16 cities in Estonia. For this challenge, approximately 36, 000 buildings from the city of Tallinn are used as the training and testing dataset. Among them , we selected 6000 relatively dense and structurally simple buildings as the Entry-level dataset. The wireframe model is composed of points and edges representing shape and outline of the object. We require algorithms to take the original point cloud as input and regress the wireframe model. For the evaluation, the metrics of mean precision and recall are employed to evaluate accuracy of both points and edges, and overall offset of the model is calculated. Additionally, the wireframe edit distance (WED) is used as an additional metric to evaluate the accuracy of generated wireframe models. In contrast to the first Building3D Challenge, a new test dataset with entirely different building styles from the Building3D dataset will be used to evaluate the submissions. Enjoy 😀.

Awards & Submissions

The winning submission will receive a cash prize provided by the workshop sponsor and the chosen finalists will be invited to present their research in the workshop. The prerequisite to receive a money prize is to provide a write-up detailing their solution by a submission to the workshop, in the form of an extended abstract (4 pages) or a full paper (8 pages), as well as the code required to generate a winning submission under CC BY4.0 license.

There is a $ 10,000 prize pool for this challenge.

  • 1st Place: $5,000
  • 2nd Place: $3,000
  • 3rd Place: $1,000
  • Additional Prizes: $500

Please see the Competition Rules for additional information.

We thank gold sponsor Intelligence.Ally Technology and silver sponsor Shenzhen WUCE SPATIAL for their generous sponsorship of this competition.

Important Dates

March 1 2025, Sat.:       Competition starts
May 25 2025, Sun.:      Competition ends
May 29 2025, Thu.:        Notification to Participants
June 1 2025, Sun.:        Writeup Deadline

Organizers