Give your OpenVoiceOS persona a memory.
By default a chat persona is amnesiac: every turn starts from scratch. A memory plugin fixes that — it remembers what was said and quietly feeds the relevant bits back into the next prompt, so the assistant can follow up, recall facts, and stay on topic. This package is a bundle of local-first memory backends plus an orchestrator that combines them.
"Local-first" means every backend runs on your machine. Some are pure standard library (zero extra dependencies); the heavier ones use local models or a local LLM endpoint — none of them phone home to a cloud service.
Backend (memory_module) |
What it remembers with | Extra setup |
|---|---|---|
ovos-memory-plugin-recency |
the last few turns (sliding window) | none |
ovos-memory-plugin-lexical |
keyword search over past turns (SQLite FTS5 + BM25) | none |
ovos-memory-plugin-local-rag |
semantic search over past turns (embeddings + vector DB) | local model stack |
ovos-memory-plugin-longterm |
a rolling summary of the whole conversation | a local chat endpoint |
ovos-memory-plugin-entity |
durable facts about the user (name, preferences…) | a local chat endpoint |
ovos-memory-plugin-composite |
several of the above at once, results merged | the members' setup |
New here? Jump to Quick start. Building something? See How it works and Write your own backend.
pip install ovos-memory-plugins
# add the local semantic-RAG stack (embeddings model + vector store):
pip install 'ovos-memory-plugins[local-rag]'recency and lexical need nothing beyond the base install — they are pure
Python standard library.
A persona is a small JSON file. The memory_module key picks a memory backend;
a block with the same name configures it. Drop the file in your ovos-persona
personas directory.
The simplest memory: a sliding window of recent turns. No models, no endpoints.
{
"name": "MyAssistant",
"memory_module": "ovos-memory-plugin-recency",
"ovos-memory-plugin-recency": {
"max_history": 10,
"system_prompt": "You are a helpful assistant."
}
}The assistant can now handle "and what about tomorrow?" because the previous turns are still in context. That is all most short conversations need.
A window forgets. To recall something from much earlier, search past turns by meaning:
{
"name": "MyAssistant",
"memory_module": "ovos-memory-plugin-local-rag",
"ovos-memory-plugin-local-rag": {
"retrieval": {"max_num_results": 4},
"system_prompt": "You are a helpful assistant."
}
}Every exchange is embedded and stored in a local vector database; before each
reply the most relevant past exchanges are retrieved and added to the prompt.
Ask "what was that book I mentioned last week?" and it can answer. Needs the
[local-rag] extra.
Real assistants want more than one kind of memory. The composite backend
loads several members and merges their results, so one memory_module gives you
hybrid recall (meaning and keywords) plus durable user facts:
{
"name": "MyAssistant",
"memory_module": "ovos-memory-plugin-composite",
"ovos-memory-plugin-composite": {
"members": [
{"module": "ovos-memory-plugin-local-rag", "config": {"collection": "kb"}},
{"module": "ovos-memory-plugin-lexical", "config": {"db_path": "~/.local/share/ovos/lex.db"}},
{"module": "ovos-memory-plugin-entity", "config": {"api_url": "http://localhost:8000/v1"}}
],
"fusion": "rrf",
"system_prompt": "You are a helpful assistant."
}
}local-rag catches paraphrases, lexical catches exact terms (names, codes,
rare words), and entity remembers who the user is. Their hits are merged with
Reciprocal Rank Fusion — see composite.
Prefer to see it run before wiring a persona? The examples/ folder
has ready persona files and offline demo scripts:
python examples/demo_composite.py # hybrid recall, fully offlineA memory backend is an AgentContextManager. The persona owns the chat model and
tools; the memory owns conversation state and prompt assembly — it never
generates the answer itself, it just shapes the messages the model sees. Three
methods make up the whole contract:
get_history(session_id) -> list[AgentMessage]
update_history(new_messages, session_id) -> None
build_conversation_context(utterance, session_id) -> list[AgentMessage]The persona calls build_conversation_context before each turn (to assemble the
prompt) and update_history after each turn (to record what happened). Two rules
hold for the returned message list:
- the first message MAY be a
systemmessage (the persona prompt); - the last message is ALWAYS the current user utterance.
Everything else — summaries, retrieved snippets, known facts, recent turns — goes
in between. The overview explains the shared knobs: the five
inject_mode strategies (how recalled context is placed in the prompt) and the
retrieval settings (max_num_results, min_score, query_mode).
| Want… | Use |
|---|---|
| just the last few turns | recency |
| exact-term recall (names, IDs, codes) | lexical |
| meaning-based recall of past detail | local-rag |
| robust recall (meaning + keywords) | composite of local-rag + lexical |
| to remember who the user is across sessions | entity |
| a compact gist of very long chats | longterm |
| more than one of the above | composite |
The overview has a full comparison table (persistence, dependencies, offline behaviour, cost per turn).
composite is a pure orchestrator: it loads member backends by name and
consolidates them.
- Retriever members (
local-rag,lexical, or any backend exposingsearch()) have their hits fused into one ranked, deduplicated list. Reciprocal Rank Fusion is the default — it ranks by position, not raw score, so it combines backends whose scores live on different scales (cosine vs BM25) without one drowning out the other. Other modes:weighted,merge,priority,interleave. - Context members (
longterm,entity,recency) contribute their system block (a summary, known facts…). - New turns are recorded in every member; recent history comes from a chosen
primarymember.
If a member fails to load or errors at runtime, it is skipped and the rest carry on. Full details and the config schema: composite.
Two paths, depending on what you are building:
- A retrieval backend (stores documents, recalls them by some search) —
subclass
BaseRetrievalMemoryand implement just two hooks,_store_documentand_query_backend(returningMemoryHits). You inherit history handling, the five inject modes, the context renderer, and asearch()that plugs straight into the composite. - Any other memory — subclass
AgentContextManagerand implement the three contract methods directly.
Register the class under the opm.agents.memory entry-point group and it becomes
selectable as a memory_module. Step-by-step guide with code:
write a backend.
ovos-persona
└─ memory_module: "ovos-memory-plugin-composite"
└─ CompositeMemory
├─ ovos-memory-plugin-local-rag (retriever — semantic)
├─ ovos-memory-plugin-lexical (retriever — keyword)
└─ ovos-memory-plugin-entity (context — user facts)
AgentContextManager (the contract every backend implements)
├─ get_history(session_id)
├─ update_history(messages, session_id)
└─ build_conversation_context(utterance, session_id) -> list[AgentMessage]
Retrieval backends and the composite share a BaseRetrievalMemory that provides
history, the inject-mode strategies, and the context renderer; a concrete
retriever only implements store and query. The fusion helpers and the
MemoryHit type live in ovos_memory_plugins.common.
pip install -e ".[test]"
pytest tests -v # everything (unit + end-to-end), no external services
pytest tests/test_composite.py -v # just the composite (fast)The end-to-end RAG test exercises the real embeddings + vector-store stack; its first run downloads the embeddings model into the shared cache, then is fast.
Apache License 2.0 — see LICENSE.
Developed by TigreGótico for OpenVoiceOS.
This project was funded through the NGI0 Commons Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101135429.
