I am a Senior Researcher at Microsoft Research, focusing on data science tools and platforms.
My recent projects have focused on query processing.
We developed a cost-based, platform-independent query plan rewrite rule for MATCH_RECOGNIZE
queries
in general-purpose SQL engines.
The new rule boosts median query latency by 5.4X in Trino.
In an even more ambitious undertaking, we developed a specialized execution engine for MATCH_RECOGNIZE
, which includes an extended set of operators and
a query optimizer based on a novel cost model. Our work resulted in a 6X median
performance improvement over state-of-the-art specialized execution engines.
Before joining Microsoft Research, I completed my PhD in Computer Science at the University of Toronto, under the guidance of Prof. Renée J. Miller. My thesis is in dataset search over massive Open Data archives. Specifically, I contributed algorithms for large-scale set similarity search and data sketches, which allow joinable or unionable tables to be found from over 100K tables in milliseconds. Based on my research work, I built an Open Data search engine stack to make it easy for people to use Open Data in their applications.
I am constantly exploring new ideas in computing through my open-source projects and writings. My goal is to work towards the democratization of computation, whereby advanced A.I. and algorithms are easily accessible to everyone without turning users into mere products.
[Github] [Google Scholar] [Blog] [Email]