CVPR Workshop Urban Scene Modeling 2024

Rapid urbanization poses social and environmental challenges. Addressing these issues effectively requires access to accurate and up-to-date 3D building models, obtained promptly and cost-effectively. Urban modeling is an interdisciplinary topic among computer vision, graphics, and photogrammetry. The demand for automated interpretation of scene geometry and semantics has surged due to various applications, including autonomous navigation, augmented reality, smart cities, and digital twins. As a result, substantial research effort has been dedicated to urban scene modeling within the computer vision and graphics communities, with a particular focus on photogrammetry, which has coped with urban modeling challenges for decades. This workshop is intended to bring researchers from these communities together. Through invited talks, spotlight presentations, a workshop challenge, and a poster session, it will increase interdisciplinary interaction and collaboration among photogrammetry, computer vision and graphics. We also solicit original contributions in the areas related to urban scene modeling.

6 Keynotes
Full Day Workshop
2 Competition Tracks

News!

May 31, 2024: Building3D Challenge has completed! 🎉

March 15, 2024: S23DR Challenge Competition Begins! 🎉

Feb 15, 2024: Building3D Challenge Competition Begins! 🎉

Feb 8, 2024: USM3D 2024 Call for Papers released! 🎉

Feb 7, 2024: CVPR Workshop list released! 🎉

Feb 3, 2024: We are live! 🎉

Challenges

S23DR Challenge!

S23DR
Challenge!

Structured Semantic 3D Reconstruction (S23DR) Challenge

🎉 2024 Challenge Winners 🎉

  1. Denys Rozumnyi
  2. Kunal Chelani
  3. Kuo-Chin Lien
Discretionary Awards:
  • Weihang Li & Wenzhao Tang
  • Serhii Ivanov

Challenge Description

As part of this workshop, we are hosting the S23DR challenge. What's next after Structure from Motion? The objective of this competition is to facilitate the development of methods for transforming posed images (sometimes also called "oriented images") / SfM outputs into a structured geometric representation (wire frame) from which semantically meaningful measurements can be extracted. In short: More Structured Structure from Motion.

In conjunction with the challenge, we release a new dataset: the HoHo Dataset. These data were gathered over the course of several years throughout the United States from a variety of smart phone and camera platforms. Each training sample/scene consists of a set of posed image features (segmentation, depth, etc.) and a sparse point cloud as input, and a sparse wire frame (3D embedded graph) with semantically tagged edges as the target. Additionally a mesh with semantically tagged faces is provided for each scene durning training. In order to preserve privacy, original images are not provided.

Awards & Submissions

The winning submission will receive a cash prize provided by the workshop sponsor and the selected finalists will be invited to present their research in the workshop. In order to be eligible for the prizes teams must follow all the rules, provide a write-up detailing their solution in a submission to the workshop, in the form of an extended abstract (4 pages) or a full paper (8 pages), as well as all artifacts code, weights, etc., required to generate a winning submission under CC-BY4.0 license.

There is a $ 25,000 prize pool for this challenge.

  • 1st Place: $10,000
  • 2nd Place: $7,000
  • 3rd Place: $5,000
  • Additional Prizes: $3,000

Please see the Competition Rules for additional information.

We thank Hover Inc. for their generous sponsorship of this competition

Important Dates

March 14, 2024 Competition starts
June 4, 2024 Entry Deadline
June 4, 2024 Team Merging
June 10, 2024 Competition ends
June 13, 2024 Writeup Deadline

-


Building3D Challenge!

Building3D
Challenge!

Building3D Challenge

🎉 2024 Challenge Winners 🎉

  1. Yuzhou Liu, Lingjie Zhu, Hanqiao Ye, Xiang Gao, Shuhan Shen
  2. Hongxin Yang, Siyu Chen, YuJun Liu
    • Jiahao Zhang, Qi Liu, Yuchao Dai, Le Hui, Zhixiang Pei
    • Hongye Hou, Fangyu Du, Jiaxin Ren, Yuxuan Jiang, Xiaobin Zhai
Honerable Mention: Fuhai Sun, Yingliang Zhang, Qiaoqiao Hao, Ting Han Duxin Zhu, Wenjing Wu, Tianrui Bayles-Rea, Guang Gao

As part of this workshop, we are hosting the Building3D challenge. Building3D is an urban-scale publicly available dataset consisting of more than 160 thousand buildings with corresponding point clouds, meshes, and wireframe models covering 16 cities in Estonia. For this challenge, approximately 36, 000 buildings from the city of Tallinn are used as the training and testing dataset. Among them , we selected 6000 relatively dense and structurally simple buildings as the Entry-level dataset. The wireframe model is composed of points and edges representing shape and outline of the object. We require algorithms to take the original point cloud as input and regress the wireframe model. For the evaluation, the metrics of mean precision and recall are employed to evaluate accuracy of both points and edges, and overall offset of the model is calculated. Additionally, the wireframe edit distance (WED) is used as an additional metric to evaluate the accuracy of generated wireframe models.

Awards & Submissions

The winning submission will receive a cash prize provided by the workshop sponsor and the chosen finalists will be invited to present their research in the workshop. The prerequisite to receive a money prize is to provide a write-up detailing their solution by a submission to the workshop, in the form of an extended abstract (4 pages) or a full paper (8 pages), as well as the code required to generate a winning submission under CC BY4.0 license.

We thank PopSmart Inc. for their generous sponsorship of this competition.

Important Dates

Feb 15 2024, Th:       Competition starts
May 31 2024, Fri:      Competition ends
June 7 2024, Fri:        Notification to Participants
June 11 2024, Tu:        Writeup Deadline

Schedule

  • 09:00am - 09:10am

    Welcome and introduction

  • 09:10am - 09:50am

    Iro Armeni: Keynote 1
    Building a Global Resource Cadastre

    Iro Armeni

    Iro Armeni

    Iro Armeni is an assistant professor at the Department of Civil and Environmental Engineering, Stanford University, leading the Gradient Spaces group. She is interested in interdisciplinary research spanning Architecture, Civil Engineering, and Machine Perception. Her focus is on developing quantitative and data-driven methods that learn from real-world visual data to generate, predict, and simulate new or renewed built environments that place the human in the center. Her goal is to create sustainable, inclusive, and adaptive built environments that can support our current and future physical and digital needs.


  • 09:50am - 10:30am

    Konrad Schindler: Keynote 2
    Urban Modelling in the Deep Learning Age: from U-Nets to Next-Token Prediction

    Konrad Schindler

    Konrad Schindler

    Konrad Schindler received a Ph.D. degree from Graz University of Technology (Austria) in 2003. He was a Photogrammetric Engineer in the private industry and held research positions at Graz University of Technology; Monash University (Melbourne, Australia) and ETH Zürich (Switzerland). He was appointed Assistant Professor of Image Understanding at TU Darmstadt (Germany) in 2009. Since 2010, he has been a tenured Professor of Photogrammetry and Remote Sensing at ETH Zürich. His research interests include remote sensing, photogrammetry, computer vision, and machine learning.


  • 10:30am - 11:20am

    Poster session & Social (coffeebreak)

  • 11:20am - 11:55am

    Building 3D Challenge

  • 11:55am - 12:30pm

    S23DR Challenge

  • 12:30pm - 13:30pm

    Lunch

  • 13:30pm - 14:10pm

    Thomas Funkhouser: Keynote 3
    3D Foundation Features for Urban Environments

    Thomas Funkhouser

    Thomas Funkhouser

    Thomas Funkhouser, the David M. Siegel Professor of Computer Science, Emeritus at Princeton University. Prior to joining Princeton, he was a member of the technical staff at AT&T Bell Laboratories. Funkhouser's research focuses on computer graphics and vision, with interests in interactive graphics, acoustic modeling, multi-user systems, global illumination, and algorithms for managing large-scale 3D data. Funkhouser has contributed to multiple SIGGRAPH conferences and received honors including the ACM SIGGRAPH Computer Graphics Achievement Award (2014), Sloan Foundation Fellowship (1999), and National Science Foundation Career Award (2000).


  • 14:10pm - 14:50pm

    Paper Spotlights

    • AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation
      Siqi Du, Weixi Wang, Shengjun Tang, Ruisheng Wang, Renzhong Guo
      Poster #169
    • SimpliCity: Reconstructing Buildings with Simple Regularized 3D Models
      Jean-Philippe Bauchet, Raphael Sulzer, Yuliya Tarabalka, Florent Lafarge
      Poster #170
    • ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation
      Iaroslav Melekhov, Anand C Umashankar, Hyeong-Jin E Kim, Vladislav Serkov, Dusty Argyle
      Poster #171
    • Point2Building: Reconstructing Buildings from airborne LiDAR Point Clouds
      Yujia Liu, Anton Obukhov, Jan Dirk Wegner, Konrad Schindler
      Poster #172
    • uTRAND: Unsupervised Anomaly Detection in Traffic Trajectories
      Giacomo D'Amicantonio, Egor Bondarev, P. H. N. de With
      Poster #173
    • OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities
      Lasse H Hansen, Simon Buus Jensen, Andreas Møgelmose, Mark Philip Philipsen, Lars Bodum, Thomas B. Moeslund
      Poster #174
    • DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
      Lu Ling, Yicehn Sheng, Zhi Tu, Wentian Zhao, Xin Cheng, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera
      Poster #175
  • 14:50pm - 15:30pm

    Noah Snavely: Keynote 4
    MegaScenes: Reconstructing All the World's Landmarks

    Noah Snavely

    Noah Snavely

    Noah Snavely is the Professor of Computer Science at Cornell Tech interested in computer vision and computer graphics, and a member of the Cornell Graphics and Vision Group. He also work at Google Research in NYC. His research interests are in computer vision and graphics, in particular in 3D understanding and depiction of scenes from images. He is an ACM Fellow, and the recipient of a PECASE, a Microsoft New Faculty Fellowship, an Alfred P. Sloan Fellowship, and the SIGGRAPH Significant New Researcher Award.


  • 15:30pm - 16:20pm

    Poster session & Social (coffeebreak)

  • 16:20pm - 17:00pm

    Richard Zhang: Keynote 5
    Learning Differentiable Primitive Representations for CAD and Urban Modeling

    Richard Zhang

    Richard Zhang

    Hao (Richard) Zhang is a Distinguished Professor at Simon Fraser University, and an Amazon Scholar. He earned his Ph.D. from the University of Toronto, and MMath and BMath degrees from the University of Waterloo. His research is in visual computing with special interests in geometric modeling, shape analysis, 3D vision, and geometric deep learning. Awards won by Richard include a Canadian Human-Computer Communications Society Achievement Award in Graphics, a Google Faculty Award, an NSERC Discovery Accelerator Supplement Award, and IEEE Fellowship. He and his students have won the CVPR 2020 Best Student Paper Award and Best Paper Awards at Symposium on Geometry Processing 2008 and CAD/Graphics 2017. He has served as an editor-in-chief for Computer Graphics Forum and will be the Technical Papers Chair for SIGGRAPH 2025.


  • 17:00pm - 17:40pm

    Yasutaka Furukawa: Keynote 6
    Pushing the Frontiers of 3D Content Generation: Tales of very complex CAD models and very loose image conditioning

    Yasutaka Furukawa

    Yasutaka Furukawa

    Yasutaka Furukawa, a principal scientist at Wayve and an associate professor of Computing Science at Simon Fraser University, has held positions as an assistant professor at Washington University in St. Louis, a software engineer at Google, and a post-doctoral research associate at UW, collaborating with Profs. Seitz, Curless, and Rick Szeliski. He earned his Ph.D. under Prof. Ponce at the University of Illinois at Urbana-Champaign. He has received the PAMI Longuet-Higgins prize (2020), CS-CAN: Outstanding Young CS Researcher Award (2018), NSF CAREER Award (2015), Best Student Paper Award at ECCV 2012, and multiple Google Faculty Research Awards (2018, 2017, 2016).


  • 17:40pm - 18:20pm

    Collaboration and Discussion Session (≈interactive panel)

  • 18:20pm - 18:30pm

    Closing

Keynotes

Iro Armeni

Iro Armeni

Iro Armeni is an assistant professor at the Department of Civil and Environmental Engineering, Stanford University, leading the Gradient Spaces group. She is interested in interdisciplinary research spanning Architecture, Civil Engineering, and Machine Perception. Her focus is on developing quantitative and data-driven methods that learn from real-world visual data to generate, predict, and simulate new or renewed built environments that place the human in the center. Her goal is to create sustainable, inclusive, and adaptive built environments that can support our current and future physical and digital needs.


Konrad Schindler

Konrad Schindler

Konrad Schindler received a Ph.D. degree from Graz University of Technology (Austria) in 2003. He was a Photogrammetric Engineer in the private industry and held research positions at Graz University of Technology; Monash University (Melbourne, Australia) and ETH Zürich (Switzerland). He was appointed Assistant Professor of Image Understanding at TU Darmstadt (Germany) in 2009. Since 2010, he has been a tenured Professor of Photogrammetry and Remote Sensing at ETH Zürich. His research interests include remote sensing, photogrammetry, computer vision, and machine learning.


Yasutaka Furukawa

Yasutaka Furukawa

Yasutaka Furukawa, a principal scientist at Wayve and an associate professor of Computing Science at Simon Fraser University, has held positions as an assistant professor at Washington University in St. Louis, a software engineer at Google, and a post-doctoral research associate at UW, collaborating with Profs. Seitz, Curless, and Rick Szeliski. He earned his Ph.D. under Prof. Ponce at the University of Illinois at Urbana-Champaign. He has received the PAMI Longuet-Higgins prize (2020), CS-CAN: Outstanding Young CS Researcher Award (2018), NSF CAREER Award (2015), Best Student Paper Award at ECCV 2012, and multiple Google Faculty Research Awards (2018, 2017, 2016).


Richard Zhang

Richard Zhang

Hao (Richard) Zhang is a Distinguished Professor at Simon Fraser University, and an Amazon Scholar. He earned his Ph.D. from the University of Toronto, and MMath and BMath degrees from the University of Waterloo. His research is in visual computing with special interests in geometric modeling, shape analysis, 3D vision, and geometric deep learning. Awards won by Richard include a Canadian Human-Computer Communications Society Achievement Award in Graphics, a Google Faculty Award, an NSERC Discovery Accelerator Supplement Award, and IEEE Fellowship. He and his students have won the CVPR 2020 Best Student Paper Award and Best Paper Awards at Symposium on Geometry Processing 2008 and CAD/Graphics 2017. He has served as an editor-in-chief for Computer Graphics Forum and will be the Technical Papers Chair for SIGGRAPH 2025.


Noah Snavely

Noah Snavely

Noah Snavely is the Professor of Computer Science at Cornell Tech interested in computer vision and computer graphics, and a member of the Cornell Graphics and Vision Group. He also work at Google Research in NYC. His research interests are in computer vision and graphics, in particular in 3D understanding and depiction of scenes from images. He is an ACM Fellow, and the recipient of a PECASE, a Microsoft New Faculty Fellowship, an Alfred P. Sloan Fellowship, and the SIGGRAPH Significant New Researcher Award.


Thomas Funkhouser

Thomas Funkhouser

Thomas Funkhouser, the David M. Siegel Professor of Computer Science, Emeritus at Princeton University, received his B.S. from Stanford University, M.S. from UCLA, and Ph.D. from UC Berkeley. Prior to joining Princeton, he was a member of the technical staff at AT&T Bell Laboratories. Funkhouser's research focuses on computer graphics and computer vision, with interests in interactive graphics, acoustic modeling, multi-user systems, global illumination, and algorithms for managing large-scale 3D data. He was a principal developer of the UC Berkeley Architectural Walkthrough System, which achieved real-time visualization of building models with millions of polygons. Funkhouser has contributed to multiple SIGGRAPH conferences and received honors including the ACM SIGGRAPH Computer Graphics Achievement Award (2014), Sloan Foundation Fellowship (1999), and National Science Foundation Career Award (2000).


Accepted Papers

In Proceedings

non-Proceedings

Open Access Paper Links

Call for Papers

Important Dates

Papers (in proceedings)

  • Submission Deadline: March 24, 2024
  • Notification of Acceptance: April 9, 2024
  • Camera Ready Deadline: April 12, 2024
========> Submit Here <========

Topics

The goal of this workshop is to push the frontier in urban scene modeling. Focal points for papers include but are not limited to:
  • Semantic/instance segmentation of 3D point clouds and images on urban scenes
  • Scene representation: Neural implicit scene representation, SDF, NeRF, Gaussian splats, mesh, CAD, etc.
  • 2.5D/3D reconstruction and modeling from remote sensing data (satellite images, LiDAR, etc.)
  • Generative models: Diffusion models and GANs for occlusion-free image and 3D scene generation
  • Meta learning for image/point cloud registration, segmentation, and modeling
  • Self-, weakly, and semi-supervised learning for urban scene modeling
  • Fusion of images, point clouds and other sensor data for urban scene modeling
  • Rendering and visualization of large-scale point clouds
  • Parametric reconstruction
  • 3D reconstruction and registration, joint registration and segmentation of images, matching multi view, pose estimation, deployment on mobile and embedded devices
  • Leveraging (learned) priors for structured/parametric 3D reconstruction from multimodal data
  • New datasets, and new labeling, and capturing methods for ground truth
  • Human-in-the-loop modeling and reconstruction
  • Methods to bridge traditional modeling tools with neural representations
  • Inverse and forward procedural modeling approaches for large scale scene generation
  • Novel contributions on depth, panoramic, and other image representations for scene understanding, modeling, and reconstruction
We welcome PC self-nominations. If you're willing to review for the workshop, please reach out to us at usm3d@googlegroups.com.

Organizers

Ruisheng Wang

Ruisheng
Wang

Professor
University of Calgary

Jack Langerman

Jack
Langerman

Sr. Applied Scientist
HOVER Inc.
 

Ilke Demir

Ilke
Demir

Senior Research Scientist
Intel Corporation
 

Qixing Huang

Qixing
Huang

Associate Professor
University of Texas at Austin

Florent Lafarge

Florent
Lafarge

Researcher
INRIA
 

Dmytro Mishkin

Dmytro
Mishkin

Researcher
Czech Technical University in Prague

Tolga Birdal

Tolga
Birdal

Assistant Professor
Imperial College London, UK

Hui Huang

Hui
Huang

Professor
Shenzhen University
 

Shangfeng Huang

Shangfeng
Huang

Researcher
University of Calgary
 

Daoyi Gao

Daoyi
Gao

Researcher
Technical University of Munich

Xiang Ma

Xiang
Ma

Head of Research
Amazon Web Services
 

Hanzhi Chen

Hanzhi
Chen

Researcher
Technical University of Munich

Clement Mallet

Clement
Mallet

Research Scientist
LASTIG

Caner Korkmaz

Caner
Korkmaz

Researcher
Imperial College London

Yang Wang

Yang
Wang

Associate Professor
Concordia University

Marc Pollefeys

Marc
Pollefeys

Professor
ETH Zurich

Program Committee

Name Affiliation
Daniel G. Aliaga Purdue University
Daniela Cabiddu CNR - Istituto di Matematica Applicata e Tecnologie Informatiche (IMATI)
Yiping Chen Sun Yat-Sen University
Ian Endres Hover Inc.
Hongchao Fan Norwegian University of Science and Technology
Antoine Guédon École Des Ponts Paristech
Liu He Purdue University
Qingyong Hu National University of Defense Technology
Yuzhong Huang Hover Inc. / USC Information Sciences Institute
Loic Landrieu Ecole des Ponts ParisTech
David Marx CoreWeave, EleutherAI
Philippos Mordohai Stevens Institute of Technology
Bisheng Yang Wuhan University
Name Affiliation
Liangliang Nan Delft University of Technology
Rongjun Qin The Ohio State University
Ziming Qui NYU / Lowe's
Gunho Sohn York University
Ioannis Stamos City University of New York
Gábor Sörös Bell Labs
George Vosselman University of Twente
Jun Wang Nanjing University of Aeronautics and Astronautics
Chenglu Wen Xiamen University
Michael Ying Yang University of Bath
Bo Yang The Hong Kong Polytechnic University
Wei Yao The Hong Kong Polytechnic University
Saurabh Prasad University of Houston
Jiju Poovvancheri Saint Mary’s University

This is a CVPR 2024 workshop