The DEEM workshop will be held on Sunday, June 18th, in conjunction with SIGMOD/PODS 2023. The workshop will be held in hybrid (in-person and virtual) form. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios.
The workshop solicits regular research papers (10 pages plus unlimited references) describing preliminary or completed research results, as well as short papers (up to 4 pages) such as reports on applications and tools or preliminary results. With this new paper category (introduced in 2022) on applications and tools, the DEEM workshop aims to establish a broader forum for sharing interesting use cases, problems, datasets, benchmarks, visionary ideas, system designs, and descriptions of system components and tools related to end-to-end ML pipelines. Submissions should follow the guidelines as for SIGMOD, i.e. use the sigconf template for the ACM proceedings format.
Follow us on twitter @deem_workshop or contact us via email at info[at]deem-workshop[dot]org. We also provide archived websites of previous versions of the workshop: DEEM 2017, DEEM 2018, DEEM 2019, DEEM 2020, DEEM 2021, and DEEM 2022.
DEEM 2022 Proceedings: ACM DL Link
Abstract: A large fraction of the data science and machine learning workflow is performed in computational notebooks such as Jupyter with libraries such as pandas, NumPy, and scikit-learn in an ad-hoc, highly iterative manner. However, this process is not without its challenges. We describe three open-source tools that we've built that address scalability, interactivity, and reproducibility challenges along the way -- and have been adopted widely by data scientists. We also reflect on how our recipe -- of enhancing existing tools as opposed to replacing them -- may need revisiting in the exciting arena of LLM-powered data work, which forms the focus of our new EPIC Data lab at Berkeley.
Applying Machine Learning (ML) in real-world scenarios is a challenging task. In recent years, the main focus of the data management community has been on creating systems and abstractions for the efficient training of ML models on large datasets. However, model training is only one of many steps in an end-to-end ML application, and a number of orthogonal data management problems arise from the large-scale use of ML.
For example, data preprocessing and feature extraction workloads may be complicated and require simultaneous execution of relational and linear algebraic operations. Next, model selection may involve searching many combinations of model architectures, features, and hyper-parameters to find the best-performing model. After model training, the resulting model may have to be deployed and integrated into business workflows and require lifecycle management using metadata and lineage. As a further complication, the resulting system may have to take into account a heterogeneous audience, ranging from domain experts without programming skills to data engineers and statisticians who develop custom algorithms.
Additionally, the importance of incorporating ethics and legal compliance into machine-assisted decision-making is being broadly recognized. Critical opportunities for improving data quality and representativeness, controlling for bias, and allowing humans to oversee and impact computational processes are missed if we do not consider the lifecycle stages upstream from model training and deployment. DEEM welcomes research on providing system-level support to data scientists who wish to develop and deploy responsible machine learning methods.
DEEM aims to bring together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios.
We invite submissions in following two tracks:
Submission Website: https://cmt3.research.microsoft.com/DEEM2023
Inclusion and Diversity in Writing: http://2023.sigmod.org/calls_papers_inclusion_and_diversity.shtml
We are very pleased that we can award one talented researcher a travel grant of $1000, with the help of our sponsors.
Applications for this travel award are due 26 April 2023 to enable early-bird registration by 1 May.
Please find more information (e.g. eligibility criteria) and apply through the below form:
Application Form Travel Award DEEM @SIGMOD 2023
We will also award the best paper as well as the best presentation during the workshop!