We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose Lossless HTML Cleaning and Two-Step ...
Abstract: Cross-modal remote sensing image-text retrieval (CMRSITR) aims to extract comprehensive information from diverse modalities. The primary challenge in this field is developing effective ...
Abstract: Scene text detection and recognition have attracted much attention in recent years because of their potential applications. Detecting and recognizing texts in images may suffer from scene ...