PDF Bytes Knowledge Base

The PDFBytesKnowledgeBase reads PDF content from bytes or IO streams, converts them into vector embeddings and loads them to a vector database. This is useful when working with dynamically generated PDFs, API responses, or file uploads without needing to save files to disk.

Usage

We are using a local LanceDB database for this example. Make sure it’s running

pip install pypdf

knowledge_base.py

from agno.agent import Agent
from agno.knowledge.pdf import PDFBytesKnowledgeBase
from agno.vectordb.lancedb import LanceDb

vector_db = LanceDb(
    table_name="recipes_async",
    uri="tmp/lancedb",
)

with open("data/pdfs/ThaiRecipes.pdf", "rb") as f:
    pdf_bytes = f.read()

knowledge_base = PDFBytesKnowledgeBase(
    pdfs=[pdf_bytes],
    vector_db=vector_db,
)
knowledge_base.load(recreate=False)  # Comment out after first run

agent = Agent(
    knowledge=knowledge_base,
    search_knowledge=True,
)

agent.print_response("How to make Tom Kha Gai?", markdown=True)

Params

Parameter	Type	Default	Description
pdfs	Union[List[bytes], List[IO]]	-	List of PDF content as bytes or IO streams.
exclude_files	List[str]	[]	List of file patterns to exclude (inherited from base class).
reader	Union[PDFReader, PDFImageReader]	PDFReader()	A PDFReader or PDFImageReader that converts the PDFs into Documents for the vector database.

PDFBytesKnowledgeBase is a subclass of the AgentKnowledge class and has access to the same params.

Developer Resources

View Sync loading Cookbook
View Async loading Cookbook

PDF URLs S3 PDF

Introduction

Concepts

Other

How to

PDF Bytes Knowledge Base

Usage

Params

Developer Resources

​Usage

​Params

​Developer Resources

Usage

Params

Developer Resources