©2019 BY DATA SYSTEMS LAB.

WELCOME TO THE DATA SYSTEMS LAB

Our vision is growing data systems research through education, innovation and discovery as well as bridging the gap between the data systems community and other scientific and business domains that may benefit from data management solutions. Our mission is to design experimental data systems that enable emerging applications, incorporate human-awareness in the system design, process non-traditional data types, or support emerging software platforms or hardware paradigms.

 

RESEARCH THEMES

Detailed Earth. Spain And The Mediterranean Sea On A Moonlit Night.jpg
download.jpg
Data on a Touch Pad

SPATIAL DATA SYSTEMS

INTERNET OF THINGS DATA

CORE DATABASES

 

PEOPLE

23517734_10155625390400630_2180102088465

DR. MOHAMED SARWAT

Lab Director and Principal Investigator

Yu_jia_Photo_edited_edited.png

JIA YU

Research Assistant (PhD Student)

VAMSI MEDURI

Research Assistant (PhD Student)

YUHAN SUN

Research Assistant (PhD Student)

KANCHAN CHOWDHURY

Research Assistant (PhD Student)

ZISHAN FU

Research Assistant
(MS Student)

SETU SHAH

Research Assistant
(MS Student)

 

WHAT'S NEW?

Stay in the Know

 

NSF CAREER AWARD

Prof. Mohamed Sarwat (lab director) receives the prestigious National Science Foundation CAREER Award. The award funds a research project that will potentially extend the state-of-the-art in spatial and spatio-temporal data management to support efficient and scalable processing of large-scale Internet of Things data.

SELECTED PUBLICATIONS

Members of the data systems lab contributed to more than 30 publications in major database systems and spatial computing venues. Below is a sample: 

GEOSPARK: A CLUSTER COMPUTING FRAMEWORK FOR PROCESSING LARGE-SCALE SPATIAL DATA

ACM SIGSPTIAL International Conference on Geographic Information Systems, GIS 2015

This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of three layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Spark functionalities that include loading / storing data to disk as well as reg- ular RDD operations. Spatial RDD Layer consists of three novel Spatial Resilient Distributed Datasets (SRDDs) which extend regular Apache Spark RDDs to support geometrical and spatial objects. GeoSpark provides a geometrical oper- ations library that accesses Spatial RDDs to perform basic geometrical operations (e.g., Overlap, Intersect). System users can leverage the newly defined SRDDs to effectively develop spatial data processing programs in Spark. The Spatial Query Processing Layer efficiently executes spatial query processing algorithms (e.g., Spatial Range, Join, KNN query) on SRDDs. GeoSpark also allows users to create a spatial index (e.g., R-tree, Quad-tree ) that boosts spatial data processing performance in each SRDD partition. Pre- liminary experiments show that GeoSpark achieves better run time performance than its Hadoop-based counterparts (e.g., SpatialHadoop).

TWO BIRDS, ONE STONE: A FAST, YET LIGHTWEIGHT, INDEXING SCHEME FOR MODERN DATABASE SYSTEMS

Proceeding of the Very Large Database Endowment, PVLDB 2016

This paper proposes Hippo a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. Itmaintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distri- butions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to sat- isfy the query predicates and inspects the remaining pages. Experiments based on real and synthetic datasets show that Hippo occupies up to two orders of magnitude less storage space than that of the B+-Tree while still achieving compa- rable query execution performance to that of the B+-Tree for 0.1% - 1% selectivity factors. Also, the experiments show that Hippo outperforms BRIN (Block Range Index) in exe- cuting queries with various selectivity factors. Furthermore, Hippo achieves up to three orders of magnitude less maintenance overhead and up to an order of magnitude higher throughput (for hybrid query/update workloads) than its counterparts.

LARS*: AN EFFICIENT AND SCALABLE LOCATION-AWARE RECOMMENDER SYSTEM

IEEE Transactions on Knowledge and Data Engineering, TKDE 2014

This paper proposes LARS*, a location-aware recommender system that uses location-based ratings to produce recommendations. Traditional recommender systems do not consider spatial properties of users nor items; LARS*, on the other hand, supports a taxonomy of three novel classes of location-based ratings, namely, spatial ratings for non-spatial items, non-spatial ratings for spatial items, and spatial ratings for spatial items. LARS* exploits user rating locations through user partitioning, a technique that influences recommendations with ratings spatially close to querying users in a manner that maximizes system scalability while not sacrificing recommendation quality. LARS* exploits item locations using travel penalty, a technique that favors recommendation candidates closer in travel distance to querying users in a way that avoids exhaustive access to all spatial items. LARS* can apply these techniques separately, or together, depending on the type of location-based rating available. Experimental evidence using large-scale real-world data from both the Foursquare location-based social network and the MovieLens movie recommendation system reveals that LARS* is efficient, scalable, and capable of producing recommendations twice as accurate compared to existing recommendation approaches.

 

CONTACT US

Thanks for your interest in our research. Get in touch with us for any questions or comments regarding our work and publications. We’d love to hear from you.