April 12, 2025

Spatial data represent objects that possess spatial attributes, which represent the spatial locations and/or geometrical shape of these objects. In the past decade, the volume of available spatial data increased tremendously. For instance, in November 2013 NASA announced the release of petabytes of its earth dataset generated by remote sensing and satellite imagery technologies. Such data includes but not limited to: weather maps, socioeconomic data, vegetation indices, geological maps, and more. On the other hand, novel technology allows hundreds of millions of users to frequently use their mobile devices to access their healthcare information and bank accounts, interact with friends, buy stuff online, search interesting places to visit on-the-go, ask for driving directions, and more. In consequence, everything we do leaves breadcrumbs of spatial digital traces, e.g., geo-tagged tweets, venue check-ins. Spatial data science is the filed of analyzing and extracting knowledge from such spatial data. Making sense of spatial data will be beneficial for several applications that may transform science and society, e.g., Earth science, socio-economic analysis, urban planning / infrastructure, healthcare, and disaster response. State-of-the-art systems face the following data infrastructure challenges: (1) Heterogeneity: Spatial data come from different sources and holds a variety of attributes. Many cities install sensors (e.g., air quality, temperature, road traffic) in buildings and traffic intersections to monitor the environment. Such sensors possess spatial location attributes, sensory reading attributes (e.g., carbon monoxide level, temperature, barometric pressure), and temporal attributes. Furthermore, citizens use their mobile devices to post geo-tagged social media that consist of spatial data (e.g., the location of the user who posted the tweet), graph data (i.e., social relationships among social networking users living in a city), as well as textual data like tweets and Facebook posts. (2) Scalability and Interactivity: Recently, the volume of available geospatial data increased tremendously and since data continuously streams into the map (e.g., CheckIns, Uber Trips), it can be challenging to store, index, query, maintain, and visualize the tremendous amount of evolving spatial data. Also, assume a data scientist visualizing air pollution using a geospatial heamtmap or analyzing the geospatial autocorrelation between city traffic and air pollution. Such analytics tasks require a huge amount of interactive computation over large amount of traffic and carbon monoxide (as well as other air pollution metrics) sensors’ readings. Hence, it is necessary to design and develop data systems that are able to digest massive amount of geospatial data, effectively store it, and allow users to retrieve and analyze such data with interactive performance.