ovos-memory-plugins

Give your OpenVoiceOS persona a memory.

By default a chat persona is amnesiac: every turn starts from scratch. A memory plugin fixes that — it remembers what was said and quietly feeds the relevant bits back into the next prompt, so the assistant can follow up, recall facts, and stay on topic. This package is a bundle of local-first memory backends plus an orchestrator that combines them.

"Local-first" means every backend runs on your machine. Some are pure standard library (zero extra dependencies); the heavier ones use local models or a local LLM endpoint — none of them phone home to a cloud service.

Backend (`memory_module`)	What it remembers with	Extra setup
`ovos-memory-plugin-recency`	the last few turns (sliding window)	none
`ovos-memory-plugin-lexical`	keyword search over past turns (SQLite FTS5 + BM25)	none
`ovos-memory-plugin-local-rag`	semantic search over past turns (embeddings + vector DB)	local model stack
`ovos-memory-plugin-longterm`	a rolling summary of the whole conversation	a local chat endpoint
`ovos-memory-plugin-entity`	durable facts about the user (name, preferences…)	a local chat endpoint
`ovos-memory-plugin-composite`	several of the above at once, results merged	the members' setup

New here? Jump to Quick start. Building something? See How it works and Write your own backend.

Install

pip install ovos-memory-plugins

# add the local semantic-RAG stack (embeddings model + vector store):
pip install 'ovos-memory-plugins[local-rag]'

recency and lexical need nothing beyond the base install — they are pure Python standard library.

Quick start

A persona is a small JSON file. The memory_module key picks a memory backend; a block with the same name configures it. Drop the file in your ovos-persona personas directory.

Step 1 — remember the last few turns (no setup)

The simplest memory: a sliding window of recent turns. No models, no endpoints.

{
  "name": "MyAssistant",
  "memory_module": "ovos-memory-plugin-recency",
  "ovos-memory-plugin-recency": {
    "max_history": 10,
    "system_prompt": "You are a helpful assistant."
  }
}

The assistant can now handle "and what about tomorrow?" because the previous turns are still in context. That is all most short conversations need.

Step 2 — recall things said long ago (semantic)

A window forgets. To recall something from much earlier, search past turns by meaning:

{
  "name": "MyAssistant",
  "memory_module": "ovos-memory-plugin-local-rag",
  "ovos-memory-plugin-local-rag": {
    "retrieval": {"max_num_results": 4},
    "system_prompt": "You are a helpful assistant."
  }
}

Every exchange is embedded and stored in a local vector database; before each reply the most relevant past exchanges are retrieved and added to the prompt. Ask "what was that book I mentioned last week?" and it can answer. Needs the [local-rag] extra.

Step 3 — combine memories (hero mode)

Real assistants want more than one kind of memory. The composite backend loads several members and merges their results, so one memory_module gives you hybrid recall (meaning and keywords) plus durable user facts:

{
  "name": "MyAssistant",
  "memory_module": "ovos-memory-plugin-composite",
  "ovos-memory-plugin-composite": {
    "members": [
      {"module": "ovos-memory-plugin-local-rag", "config": {"collection": "kb"}},
      {"module": "ovos-memory-plugin-lexical",   "config": {"db_path": "~/.local/share/ovos/lex.db"}},
      {"module": "ovos-memory-plugin-entity",    "config": {"api_url": "http://localhost:8000/v1"}}
    ],
    "fusion": "rrf",
    "system_prompt": "You are a helpful assistant."
  }
}

local-rag catches paraphrases, lexical catches exact terms (names, codes, rare words), and entity remembers who the user is. Their hits are merged with Reciprocal Rank Fusion — see composite.

Prefer to see it run before wiring a persona? The examples/ folder has ready persona files and offline demo scripts:

python examples/demo_composite.py     # hybrid recall, fully offline

How it works

A memory backend is an AgentContextManager. The persona owns the chat model and tools; the memory owns conversation state and prompt assembly — it never generates the answer itself, it just shapes the messages the model sees. Three methods make up the whole contract:

get_history(session_id) -> list[AgentMessage]
update_history(new_messages, session_id) -> None
build_conversation_context(utterance, session_id) -> list[AgentMessage]

The persona calls build_conversation_context before each turn (to assemble the prompt) and update_history after each turn (to record what happened). Two rules hold for the returned message list:

the first message MAY be a system message (the persona prompt);
the last message is ALWAYS the current user utterance.

Everything else — summaries, retrieved snippets, known facts, recent turns — goes in between. The overview explains the shared knobs: the five inject_mode strategies (how recalled context is placed in the prompt) and the retrieval settings (max_num_results, min_score, query_mode).

Choosing a backend

Want…	Use
just the last few turns	`recency`
exact-term recall (names, IDs, codes)	`lexical`
meaning-based recall of past detail	`local-rag`
robust recall (meaning + keywords)	`composite` of `local-rag` + `lexical`
to remember who the user is across sessions	`entity`
a compact gist of very long chats	`longterm`
more than one of the above	`composite`

The overview has a full comparison table (persistence, dependencies, offline behaviour, cost per turn).

The composite

composite is a pure orchestrator: it loads member backends by name and consolidates them.

Retriever members (local-rag, lexical, or any backend exposing search()) have their hits fused into one ranked, deduplicated list. Reciprocal Rank Fusion is the default — it ranks by position, not raw score, so it combines backends whose scores live on different scales (cosine vs BM25) without one drowning out the other. Other modes: weighted, merge, priority, interleave.
Context members (longterm, entity, recency) contribute their system block (a summary, known facts…).
New turns are recorded in every member; recent history comes from a chosen primary member.

If a member fails to load or errors at runtime, it is skipped and the rest carry on. Full details and the config schema: composite.

Write your own backend

Two paths, depending on what you are building:

A retrieval backend (stores documents, recalls them by some search) — subclass BaseRetrievalMemory and implement just two hooks, _store_document and _query_backend (returning MemoryHits). You inherit history handling, the five inject modes, the context renderer, and a search() that plugs straight into the composite.
Any other memory — subclass AgentContextManager and implement the three contract methods directly.

Register the class under the opm.agents.memory entry-point group and it becomes selectable as a memory_module. Step-by-step guide with code: write a backend.

Architecture

ovos-persona
  └─ memory_module: "ovos-memory-plugin-composite"
       └─ CompositeMemory
            ├─ ovos-memory-plugin-local-rag   (retriever — semantic)
            ├─ ovos-memory-plugin-lexical     (retriever — keyword)
            └─ ovos-memory-plugin-entity      (context  — user facts)

AgentContextManager  (the contract every backend implements)
  ├─ get_history(session_id)
  ├─ update_history(messages, session_id)
  └─ build_conversation_context(utterance, session_id) -> list[AgentMessage]

Retrieval backends and the composite share a BaseRetrievalMemory that provides history, the inject-mode strategies, and the context renderer; a concrete retriever only implements store and query. The fusion helpers and the MemoryHit type live in ovos_memory_plugins.common.

Running tests

pip install -e ".[test]"

pytest tests -v                       # everything (unit + end-to-end), no external services
pytest tests/test_composite.py -v     # just the composite (fast)

The end-to-end RAG test exercises the real embeddings + vector-store stack; its first run downloads the embeddings model into the shared cache, then is fast.

License

Apache License 2.0 — see LICENSE.

Credits

Developed by TigreGótico for OpenVoiceOS.

This project was funded through the NGI0 Commons Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101135429.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
ovos_memory_plugins		ovos_memory_plugins
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mypy.ini		mypy.ini
ngi.png		ngi.png
pyproject.toml		pyproject.toml
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ovos-memory-plugins

Install

Quick start

Step 1 — remember the last few turns (no setup)

Step 2 — recall things said long ago (semantic)

Step 3 — combine memories (hero mode)

How it works

Choosing a backend

The composite

Write your own backend

Architecture

Running tests

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ovos-memory-plugins

Install

Quick start

Step 1 — remember the last few turns (no setup)

Step 2 — recall things said long ago (semantic)

Step 3 — combine memories (hero mode)

How it works

Choosing a backend

The composite

Write your own backend

Architecture

Running tests

License

Credits

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages