Abstract
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding, yet they struggle with tasks requiring real-time information retrieval, complex computations, or integration with external tools. Tool calling has emerged as a key paradigm to extend LLM functionalities, but existing methods often rely on fine-tuning or generic retrieval models that are computationally expensive or suboptimal for task-specific demonstration selection. In this paper, we propose DRanker, a lightweight and effective in-context learning framework that enhances tool calling through intelligent demonstration retrieval and reranking. DRanker employs a fine-tuned reranker model, optimized with a ranking-aware loss function, to select high-quality demonstrations from a candidate set retrieved via dense embeddings. Evaluated on the ToolACE and BFCL benchmarks, DRanker consistently outperforms competitive baselines across multiple LLM backbones. Our work highlights the importance of task-aligned demonstration retrieval and offers a scalable solution for improving LLM-based tool calling without significant computational overhead.