Analytics Modelling Over Multiple Datasets Using Vector Embeddings
2025 Β· Andreas Loizou, Dimitrios Tsoumakos
Abstract
The massive increase in the data volume and dataset availability for analysts compels researchers to focus on data content and select high-quality datasets to enhance the performance of analytics operators. While selecting high-quality data significantly boosts analytical accuracy and efficiency, the exact process is very challenging given large-scale dataset availability. To address this issue, we propose a novel methodology that infers the outcome of analytics operators by creating a model from the available datasets. Each dataset is transformed to a vector embedding representation generated by our proposed deep learning model NumTabData2Vec, where similarity search are employed. Through experimental evaluation, we compare the prediction performance and the execution time of our framework to another state-of-the-art modelling operator framework, illustrating that our approach predicts analytics outcomes accurately, and increases speedup. Furthermore, our vectorization model can proje
Authors
(none)
Tags
Stats
Related papers
- Experimental Analysis Of Large-scale Learnable Vector Storage Compression (2023)7.50
- Table2vec: Neural Word And Entity Embeddings For Table Population And Retrieval (2019)13.55
- Leanvec: Searching Vectors Faster By Making Them Fit (2023)0.00
- Gleanvec: Accelerating Vector Search With Minimalist Nonlinear Dimensionality Reduction (2024)0.00
- Vector Embedding Of Multi-modal Texts: A Tool For Discovery? (2025)0.00
- Semantic Certainty Assessment In Vector Retrieval Systems: A Novel Framework For Embedding Quality Evaluation (2025)0.00
- Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications (2024)5.84
- Vectorsearch: Enhancing Document Retrieval With Semantic Embeddings And Optimized Search (2024)0.00