
Updates from our group
Full list of publications is here.

Building properties, such as height, usage, and material, play a crucial role in spatial data infrastructures, supporting various urban applications. Despite their importance, comprehensive building attribute data remain scarce in many urban areas. Recent advances have enabled the extraction of objective building attributes using remote sensing and street-level imagery. However, establishing a pipeline that integrates diverse open datasets, acquires holistic building imagery, and infers comprehensive building attributes at scale remains a significant challenge. Among the first, this study bridges the gaps by introducing OpenFACADES, an open framework that leverages multimodal crowdsourced data to enrich building profiles with both objective attributes and semantic descriptors through multimodal large language models. First, we integrate street-level image metadata from Mapillary with OpenStreetMap geometries via isovist analysis, identifying images that provide suitable vantage points for observing target buildings. Second, we automate the detection of building facades in panoramic imagery and tailor a reprojection approach to convert objects into holistic perspective views that approximate real-world observation. Third, we introduce an innovative approach that harnesses and investigates the capabilities of open-source large vision-language models (VLMs) for multi-attribute prediction and open-vocabulary captioning in building-level analytics, leveraging a globally sourced dataset of 31,180 labeled images from seven cities. Evaluation shows that fine-tuned VLM excel in multi-attribute inference, outperforming single-attribute computer vision models and zero-shot ChatGPT-4o. Further experiments confirm its superior generalization and robustness across culturally distinct region and varying image conditions. Finally, the model is applied for large-scale building annotation, generating a dataset of 1.2 million images for half a million buildings. This open‐source framework enhances the scope, adaptability, and granularity of building‐level assessments, enabling more fine‐grained and interpretable insights into the built environment. Our dataset and code are available openly at: https://github.com/seshing/OpenFACADES.

Understanding people’s preferences is crucial for urban planning, yet current approaches often combine responses from multi-cultural populations, obscuring demographic differences and risking amplifying biases. We conducted a large-scale urban visual perception survey of streetscapes worldwide using street view imagery, examining how demographics—including gender, age, income, education, race and ethnicity, and personality traits—shape perceptions among 1,000 participants with balanced demographics from five countries and 45 nationalities. This dataset, Street Perception Evaluation Considering Socioeconomics, reveals demographic- and personality-based differences across six traditional indicators—safe, lively, wealthy, beautiful, boring, depressing—and four new ones: live nearby, walk, cycle, green. Location-based sentiments further shape these preferences. Machine-learning models trained on existing global datasets tend to overestimate positive indicators and underestimate negative ones compared to human responses, underscoring the need for local context. Our study aspires to rectify the myopic treatment of street perception, which rarely considers demographics or personality traits.

The 15-minute city concept emphasizes accessible urban living by ensuring essential services are reachable within walking or biking distance. However, most evaluations rely on two-dimensional (2D) analyses, neglecting the vertical complexity of high-density cities. This study introduces a 3D framework for assessing 15-minute accessibility in Nanjing, China. Using natural language processing and rule-based methods, we construct a 3D functional composition dataset from multi-source data. We then develop floor-level proximity indices that account for both horizontal travel time and vertical circulation (e.g., stairs, elevators). Analyzing over 90 million simulated trips, we find that accessibility generally declines with building height, though access to offices and commercial facilities improves at 20th or higher floors. Spatial inequalities emerge not only between central and peripheral zones but also across building levels and regional GDP levels, with a U-shaped disparity tied to distance from downtown. Notably, 11%–17% of trips considered accessible in 2D analyses exceed the 15-minute threshold when vertical travel is included. Our findings highlight the need to incorporate vertical space in 15-minute city evaluations and offer a scalable method to support inclusive, fair, and livable 3D urban planning with the background of 15-minute city.

Urban environments are increasingly recognised for their potential to support psychological restoration, yet most studies assess green and grey spaces in isolation and rely on static, lab-based measures. This study introduces a multi-layered analytical framework that integrates experimental walking, momentary perception tracking, and machine learning to investigate how multisensory urban features shape restoration. Conducted on a university campus, the experiment exposed 20 participants to sequential grey–green–grey walking routes. Restoration was measured through pre/post psychometric surveys, heart rate variability (HRV), and minute-level micro-surveys during walking. Results reveal three key insights: (1) green exposure induces a short-term “inoculation effect”, with restorative benefits persisting even after re-entering grey environments; (2) visual features emerged as the most influential predictors of restoration, followed by noise and microclimate; and (3) solar irradiance — when balanced with moderate temperature and humidity — positively contributing to relaxation and stress reduction. Beyond experiments, we simulated design interventions on low-restoration scenarios using a large language model to enhance visual attributes, followed by predictive evaluation via machine learning. These simulations showed measurable improvements in predicted restoration, validating a data-driven approach for environmental optimisation. This research contributes to neurourbanism by bridging spatial sensing, physiological feedback, and AI-driven interpretation. It offers practical guidance for creating psychologically supportive urban environments — such as prioritising early green exposure and mitigating noise pollution — and introduces a replicable pipeline for evaluating restorative potential in future urban design.

Urban street environments are vital to supporting human activity in public spaces. The emergence of big data, such as street view images (SVI) combined with multi-modal large language models (MLLM), is transforming how researchers and practitioners investigate, measure, and evaluate semantic and visual elements of urban environments. Considering the low threshold for creating automated evaluative workflows using MLLM, it is crucial to explore both the risks and opportunities associated with these probabilistic models. In particular, the extent to which the integration of expert knowledge can influence the performance of MLLM in the evaluation of the quality of urban design has not been fully explored. This study set out an initial exploration of how integrating more formal and structured representations of expert urban design knowledge (e.g., formal quantifiers and descriptions from existing methods) into the input prompts of an MLLM (ChatGPT-4) can enhance the model’s capability and reliability to evaluate the walkability of built environments using SVIs. We collect walkability metrics through the existing literature and categorise them using relevant ontologies. Then we select a subset of these metrics, used for assessing the subthemes of pedestrian safety and attractiveness, and develop prompts for MLLMs accordingly. We analyse MLLM’s abilities to evaluate SVI walkability subthemes through prompts with multiple levels of clarity and specificity about evaluation criteria. Our experiments demonstrate that MLLMs are capable of providing assessments and interpretations based on general knowledge and can support the automation of imagetext multimodal evaluations. However, they generally provide more optimistic scores and can make mistakes when interpreting the provided metrics, resulting in incorrect evaluations. By integrating expert knowledge, MLLM’s evaluative performance exhibits higher consistency and concentration. Therefore, this paper highlights the importance of formally and effectively integrating domain knowledge into MLLMs for evaluating urban design quality.