Skip to content

VirtualLUOUCAS/StrucTab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StrucTab

A Structured Optimization Framework for Table Parsing

中文版GitHub RepoHuggingFace DatasetModelScope DatasetPaper

News

  • [2026.06] 📖 Code and the TableVerse-5K benchmark are released!
  • [2026.06] 🎉 Our StrucTab is accepted by ECCV 2026!

Overview

StrucTab is a structured optimization framework for table parsing, the task of converting a table image into structured HTML. Instead of treating parsing as a flat image-to-text problem, StrucTab decomposes it into three coupled subtasks, namely row/column counting, merged-cell analysis, and final HTML generation, and optimizes a reinforcement-learning reward that is itself decomposed along the same axes (validity, structure, content).

Contents

Repository Layout

This repository releases:

  • code/ — the training-data construction pipeline, the Uni-TabRL reward, its four dependency services, and the analysis scripts behind the paper's figures and tables.
  • benchmark/ — a self-contained inference and evaluation harness for the TableVerse-5K table-parsing benchmark, scored with the structure-aware TEDS / TEDS-S metrics.
Full directory tree (click to expand)
StrucTab/
├── README.md
├── code/                         # training + RL reward + analysis (see code/README.md)
│   ├── training_data/            # build sequential-reasoning data from (image, HTML) pairs
│   ├── Uni_TabRL/
│   │   ├── reward/               # the decomposed RL reward (validity / structure / content)
│   │   ├── server/               # the four reward dependency services
│   │   │   └── TEDS_judger/      # TEDS / TEDS-S scoring service (also used by the benchmark)
│   │   └── configs/servers/      # endpoint lists for the reward services
│   └── analysis/                 # scripts behind the paper figures / tables
└── benchmark/                    # inference + evaluation harness for TableVerse-5K
    ├── apis/                     # pluggable backends: openai_compat | local_vllm
    ├── utils/                    # incremental writer, image encoding, signal handling
    ├── data/                     # ← place the TableVerse-5K dataset here
    ├── infer.py                  # entry: inference  → infer_results/<tag>/results.jsonl
    ├── judge.py                  # entry: TEDS scoring → judge_results/<tag>/results.jsonl
    ├── judger_server.json        # TEDS service endpoint list (host:port)
    └── requirements.txt

Code

The model-side code (training-data construction, the Uni-TabRL reward, the four dependency services, and the analysis scripts) is documented separately in code/README.md (中文版:code/README_zh.md).

The optimization framework is illustrated below:

Benchmark

benchmark/ is a self-contained harness for the TableVerse-5K table-parsing benchmark. It runs a two-stage pipeline, inference → judging, with pluggable API backends (openai_compat / local_vllm) and a structure-aware TEDS / TEDS-S scorer. Both stages support resume by keying every sample on its image_path.

For full setup, data download, and step-by-step inference / judging instructions, see benchmark/README.md (中文版:benchmark/README_zh.md).

The benchmark pipeline is illustrated below:

Citation

If you find StrucTab useful, please consider citing:

TBD

License

This project is released for research purposes only.

About

ECCV 2026 | StrucTab: A Structured Optimization Framework for Table Parsing

Topics

Resources

License

Stars

Watchers

Forks

Contributors