State-of-the-art RAG system built directly into our API.
Today, we're excited to announce Collections API. With Collections, you can upload and search through entire datasets. From PDFs and Excel sheets to entire codebases, you can upload your files into a knowledge base that supports precise and fast search. This allows developers to build RAG applications without the headache of managing indexing and retrieval infrastructure.
To help you get started, we're making file indexing and storage free for the first week*, with retrieval priced at a flat rate of $2.50 per 1,000 searches.
Choose the retrieval method that best fits your use case:
Our company's annual financial projections indicate a robust growth trajectory for the upcoming fiscal year, with expected revenue increases driven by expanded market share in emerging sectors. Analysts predict a 15% rise in Q1 2026, bolstered by strategic investments in technology and supply chain optimization. Key metrics such as EBITDA and net profit margins are forecasted to improve.
Our Collections API delivers state-of-the-art retrieval performance, matching or outperforming leading models in real-world RAG tasks across finance, legal, and coding domains.
These fields are especially challenging due to their long, dense documents. To avoid hallucinations and deliver reliable answers, models must retrieve the exact passages and reason over them accurately.
(Higher is better)
| Task | xAI Grok 4.1 Fast | Google Gemini Pro 3 | OpenAI GPT 5.1 |
|---|---|---|---|
| Finance Tabular and numerical questions | 93.0 | 85.9 | 84.7 |
| Legal Complex reasoning over multiple chunks | 73.9 | 74.5 | 71.2 |
| Coding Code understanding and large file systems | 86 | 85 | 81 |
Extracting tabular and numerical data from files can be challenging with semantic search alone. Hybrid search enables you to accurately retrieve this data from documents such as SEC filings*, allowing the model to precisely reference information.
Retrieval Score
The LegalBench dataset tests retrieval and reasoning over nuanced legal language and complex cross-references, consisting of 128 challenging question-answer pairs drawn from an extensive corpus of authentic commercial contracts across multiple datasets.
Retrieval Score
Code understanding is crucial for applications such as code summarization and generation. We use the DeepCodeBench dataset to comprehensively benchmark for this. It features a diverse set of tasks drawn from real-world open-source repositories, API usage, and complex algorithmic problems.
Accuracy Score
We do not use user data stored on Collections for model training purposes, unless the user has given consent.
*You may be charged after the free trial period. We will follow up with more information.