DocumentationToolsKnowledge tool

📚 Knowledge Tool

The Knowledge Tool lets you upload files and add URLs as data sources for your AI agent, enabling it to store, retrieve, and use custom information. With an intuitive interface and efficient data indexing, the Knowledge Tool helps your agent generate more accurate and relevant responses.

How to Use the Knowledge Tool

  1. Open the Knowledge Tool in your agent’s settings.
  2. Add a Description:
    Provide a brief description of the data source and its intended purpose.
  3. Add Data Sources:
    • Files: Upload one or more supported files.
    • URLs: Add one or more web page URLs, or use your site’s sitemap.xml for bulk import.
  4. Set Max Retriever Results (k):
    Choose how many results the agent should retrieve from this data source (default: 5). Adjust based on your needs and the LLM’s context window.

Supported File Formats

  • Text: .txt
  • Markdown: .md
  • PDF: .pdf
  • Text: .txt
  • CSV: .csv
  • Word: .docx

Limits: up to 10 files per agent, 32MB per file.

Note: Image, audio, video, and PowerPoint (.pptx) files are not supported for knowledge uploads.

Adding URLs

  • Add the URL of a web page or a direct link to a file.
  • For bulk import, provide your site’s sitemap.xml.

How Retrieval Works (RAG)

Under the hood, the Knowledge Tool implements RAG — Retrieval-Augmented Generation. Instead of fine-tuning a model on your content, the agent retrieves relevant snippets at chat time and grounds its answers in them.

What happens to your content:

  1. Chunking — each file or URL is split into smaller passages that respect document structure (headings, paragraphs).
  2. Embedding — every chunk is converted into a numeric vector that represents its semantic meaning and stored in a vector index. This happens once at ingestion time.
  3. Retrieval at chat time — when a visitor asks a question, the question is embedded with the same model and the index returns the top-k most semantically similar chunks (where k is the Max Retriever Results setting above).
  4. Grounded response — those retrieved chunks are added to the prompt for that turn, and the model answers from them rather than from generic training data.

A few implications worth knowing:

  • URL sources are snapshots. A URL is fetched once at ingestion and embedded; the live page is not re-fetched per query. If your source changes, re-ingest the URL.
  • Retrieval is semantic, not keyword. Matching depends on meaning, not exact words. Using the vocabulary your visitors use in your source documents improves recall.
  • The agent only sees the top-k chunks. A detail buried in a chunk that isn’t a strong match for the question won’t make it into the prompt. Break dense content into focused sections to improve coverage.
  • Retrieval triggers via a tool, not magic. Once you have files or URLs attached, the agent gets a search_knowledge_base tool. The Prompt Builder automatically instructs the model when to call it. Hand-written prompts should reference it explicitly.

If you have questions or need help, please contact [email protected].