Tabular Embedding Model (TEM): Finetuning Embedding Models For Tabular RAG Applications
2024 Β· Sujit Khanna, Shishir Subedi
Abstract
In recent times Large Language Models have exhibited tremendous capabilities, especially in the areas of mathematics, code generation and general-purpose reasoning. However for specialized domains especially in applications that require parsing and analyzing large chunks of numeric or tabular data even state-of-the-art (SOTA) models struggle. In this paper, we introduce a new approach to solving domain-specific tabular data analysis tasks by presenting a unique RAG workflow that mitigates the scalability issues of existing tabular LLM solutions. Specifically, we present Tabular Embedding Model (TEM), a novel approach to fine-tune embedding models for tabular Retrieval-Augmentation Generation (RAG) applications. Embedding models form a crucial component in the RAG workflow and even current SOTA embedding models struggle as they are predominantly trained on textual datasets and thus underperform in scenarios involving complex tabular data. The evaluation results showcase that our approac
Authors
(none)
Tags
Stats
Related papers
- Advancing Retrieval-augmented Generation For Structured Enterprise And Internal Data (2025)1.20
- REFINE On Scarce Data: Retrieval Enhancement Through Fine-tuning Via Model Fusion Of Embedding Models (2024)3.58
- Dewey Long Context Embedding Model: A Technical Report (2025)0.00
- LMAR: Language Model Augmented Retriever For Domain-specific Knowledge Indexing (2025)1.57
- Rethinking Hybrid Retrieval: When Small Embeddings And LLM Re-ranking Beat Bigger Models (2025)0.00
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- CGPT: Cluster-guided Partial Tables With Llm-generated Supervision For Table Retrieval (2026)1.57
- Table2vec: Neural Word And Entity Embeddings For Table Population And Retrieval (2019)13.55