Skip to content
#

pdf-processing

Here are 618 public repositories matching this topic...

Open-source toolkit for reliable RAG pipelines: convert PDFs to Markdown, clean documents, inspect chunks, compare chunking strategies, and enrich metadata for LLM applications.

  • Updated Jun 6, 2026
  • Python
document-processing-pipeline-for-regulated-industries

A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

  • Updated Oct 25, 2021
  • Python

📚 AI-Powered Book EPUB Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

  • Updated Sep 28, 2025
  • Python

Improve this page

Add a description, image, and links to the pdf-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-processing topic, visit your repo's landing page and select "manage topics."

Learn more