Documentation Index
Fetch the complete documentation index at: https://docs-v1.agno.com/llms.txt
Use this file to discover all available pages before exploring further.
Code
cookbook/models/vllm/async_basic.py
You are viewing v1 docs. For the latest documentation, visit docs.agno.com
Documentation Index
Fetch the complete documentation index at: https://docs-v1.agno.com/llms.txt
Use this file to discover all available pages before exploring further.
import asyncio
from agno.agent import Agent
from agno.models.vllm import vLLM
agent = Agent(model=vLLM(id="Qwen/Qwen2.5-7B-Instruct"), markdown=True)
asyncio.run(agent.aprint_response("Share a 2 sentence horror story"))
Create a virtual environment
Terminal and create a python virtual environment.python3 -m venv .venv
source .venv/bin/activate
Start vLLM server
vllm serve Qwen/Qwen2.5-7B-Instruct \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--dtype float16 \
--max-model-len 8192 \
--gpu-memory-utilization 0.9
Was this page helpful?