Dynamic Urban Digital Twins: Physics-Coherent Traffic Video Synthesis with Spatiotemporal 3D Semantic Constraints
Keywords:
urban digital twins, traffic video synthesis, physics-coherent generation, spatiotemporal 3D semantic constraints, neural rendering, infrastructure governanceAbstract
The proliferation of urban digital twins has opened new frontiers for simulating and managing complex metropolitan systems, yet the generation of realistic traffic video streams within these environments remains a fundamental challenge. Existing video synthesis approaches often produce visually plausible outputs that violate physical laws of traffic flow, object dynamics, or temporal continuity, thereby limiting their utility for infrastructure planning, autonomous vehicle testing, and policy analysis. This paper presents a comprehensive framework for dynamic urban digital twins that achieve physics-coherent traffic video synthesis by integrating spatiotemporal three-dimensional semantic constraints. The proposed architecture leverages a hierarchical representation of urban scenes, coupling neural rendering with physics-based vehicle behavior models and semantically annotated 3D point clouds. A key innovation is the incorporation of real-world traffic flow data and geometric reasoning to enforce lane adherence, collision avoidance, and velocity consistency across consecutive frames. We discuss the structural trade-offs inherent in balancing perceptual fidelity with physical plausibility, and examine the computational infrastructure required for real-time deployment across large-scale city models. Governance and policy implications are addressed, particularly concerning the use of synthetic data for equitable infrastructure design, bias mitigation in training datasets, and the ethical boundaries of simulating public urban spaces. The framework is evaluated through both quantitative metrics—such as Fréchet Video Distance and physics compliance scores—and qualitative case studies drawn from metropolitan traffic corridors. Results demonstrate a significant improvement in maintaining long-term temporal coherence and physical realism compared to conventional generative approaches. This work contributes a scalable blueprint for constructing urban digital twins that not only represent static geometry but also faithfully emulate the dynamic, physics-governed behaviors of real-world traffic, thereby advancing the state of the art in simulation-driven urban science and policy.
References
1. Batty, M. (2018). Digital twins. Environment and Planning B: Urban Analytics and City Science, 45(5), 817–820. https://doi.org/10.1177/2399808318796416
2. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning (pp. 1–16). PMLR.
3. Bojarski, M., Testa, D. D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., & Zieba, K. (2016). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316.
4. Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., & Catanzaro, B. (2018). High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8798–8807).
5. Xiong, Z., Song, Y., He, L., Xiong, W., Yuan, Y., Qiao, F., & Jacobs, N. (2026). PhysAlign: Physics-Coherent Image-to-Video Generation through Feature and 3D Representation Alignment. arXiv preprint arXiv:2603.13770.
6. Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). NeRF: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision (pp. 405–421). Springer.
7. Kerbl, B., Kopanas, G., Leimkühler, T., & Drettakis, G. (2023). 3D Gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 1–14.
8. Treiber, M., & Kesting, A. (2013). Traffic flow dynamics: Data, models and simulation. Springer.
9. Armeni, I., Sax, S., Zamir, A. R., & Savarese, S. (2017). Joint 2D-3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105.
10. Krajzewicz, D., Erdmann, J., Behrisch, M., & Bieker, L. (2012). Recent development and applications of SUMO–Simulation of Urban MObility. International Journal on Advances in Systems and Measurements, 5(3&4), 128–138.
11. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). Springer.
12. Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707.
13. Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (pp. 1050–1059). PMLR.
14. Yang, Y., Wong, A., & Soatto, S. (2021). Dense depth posterior (DDP) from single image and sparse range. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3341–3350).
15. Samet, H. (2006). Foundations of multidimensional and metric data structures. Morgan Kaufmann.
16. Kesting, A., Treiber, M., & Helbing, D. (2007). General lane-changing model MOBIL for car-following models. Transportation Research Record, 1999(1), 86–94.
17. Diakopoulos, N. (2016). Accountability in algorithmic decision making. Communications of the ACM, 59(2), 56–62.
18. O'Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown.
19. Zanella, A., Bui, N., Castellani, A., Vangelista, L., & Zorzi, M. (2014). Internet of Things for smart cities. IEEE Internet of Things Journal, 1(1), 22–32.
20. Chen, R. T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural ordinary differential equations. In Advances in Neural Information Processing Systems (pp. 6571–6583).
21. Lopez-Paz, D., & Ranzato, M. (2017). Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems (pp. 6467–6476).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal of Artificial Intelligence Research

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



