Michael Wawrzoniak

hi@mhwaw.net

Ephemeral Per-query Engines for Serverless Analytics
Michael Wawrzoniak,Rodrigo Bruno,Ana Klimovic,Gustavo Alonso
Workshop on Serverless Data Analytics (SDA '23)
Vancouver, Canada, August 2023

Abstract: We challenge the common assumption that queries are submitted to a pre-configured, already running engine and put forward the idea of dynamically instantiating a chosen data processing engine upon query submission by leveraging Function-as-a-Service (FaaS) platforms. We demonstrate the idea by running unmodified data processing engines (we use Apache Drill as an initial example) on real-world serverless FaaS platforms and show that such engines can be instantiated on demand when a query arrives. We aim to eventually support a wide range of queries and workloads. Wide access to such functionality would be a game changer in data processing. First, it would enable pay-per-query models supporting sporadic, interactive data analysis on arbitrary engines. Second, it would significantly increase the flexibility for data processing by enabling the possibility of dynamically choosing the actual engine, its configuration, and the resource allocation on a per-query basis. Logically, this amounts to dynamically attaching a query engine to the query rather than sending the query to a pre-configured and already deployed engine. In this paper we elaborate on this vision, outline the design of the MetaQ prototype that we are building to explore the idea, demonstrate that it is realistic through initial experiments, and discuss its many exciting practical implications.

@inproceedings{epqe23,
    title = {Ephemeral Per-query Engines for Serverless Analytics}
,
    author = {Michael Wawrzoniak, Rodrigo Bruno, Ana Klimovic, Gustavo Alonso}
,
    booktitle = {Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023) — workshop on Serverless Data Analytics (SDA’23)}
,
    year = {2023}

}