refactor: core/ restructuring, dedup, grader.py split, Sprints 6–8.1 by ArtVsMark · Pull Request #22 · ArtVsMark/Stepik-Python-Grader

ArtVsMark · 2026-06-30T08:50:30Z

Что сделано

Этот PR закрывает эпик #18 полностью (issues #23 → #19 → #20 → #21) и весь backlog CLAUDE.md (Sprints 6, 7, 8.1), а также три новых issue (#24, #25, #26), обнаруженных в ходе тестирования режима 4.

🏗️ #23 — Реструктуризация: внутренние модули → `core/`

Перенесены executor.py, normalizers.py, parsers.py, storage.py, stepik_client.py, oauth_flow.py, microbench_runner.py в core/ через git mv (история сохранена)
Добавлен core/__init__.py
Обновлены все import во всех файлах и тестах, включая unittest.mock.patch-строки и пути самовызова subprocess
Обновлены pyproject.toml, conftest.py, README.md, CONTRIBUTING.md

🔍 #19 — Parser dedup + import cycle + doc drift

Удалена дублирующая _parse_testblock_file из grader.py (оставлена единственная в core/parsers.py)
downloader.py больше не импортирует grader.py совсем — цикл устранён
CHANGELOG и CLAUDE.md приведены в соответствие с реальным состоянием кода

✂️ #20 — Разбивка `grader.py`, валидация codegen, dedup ranking

Finding feat(sprint3): tests/, python_requires>=3.10, pytest+ruff in pyproject #5: добавлена валидация идентификаторов перед интерполяцией в codegen — реальный injection-риск закрыт, 4 новых теста
Finding 📋 Дорожная карта: подтверждённые доработки (июнь 2026) #6: дублированная ranking-логика вынесена в core/microbench_runner.apply_relative_ranking(), 3 call-сайта консолидированы
Finding fix: Sprint 1+2 — критические исправления и рефакторинг (аудит июнь 2026) #4 (Sprint 7): grader.py (~1460 строк) разбит на grader_core.py / reporter.py / cli.py; grader.py теперь — тонкий 8-строчный compat-фасад; все patch-таргеты в тестах обновлены

🧹 #21 — Low-priority cleanups

except Exception сужен до конкретных типов в core/microbench_runner.py
float(str(x or 0)) → float(x or 0) в core/stepik_client.py
cli.py покрытие: 40% → 97% (18 новых тестов в test_cli.py)
Документация security assumption обновлена

Sprint 6

6.1: sys.executable вместо захардкоженного "python3"/"python" в core/executor.py
6.2: normalizers.py — sort_lines/normalize_whitespace формально экспортированы через __all__ с явной пометкой
6.3: новый config.py с [tool.stepik-grader] секцией в pyproject.toml

Sprint 7.2 / 7.3

BenchStats dataclass — дедублирует stats-расчёт между режимами 3/4
run_microbench_with_timeout() — timeout guard для режима 4

Sprint 8.1

Полноценный argparse CLI: --mode, --file, --dir, --repeats, --number, --version
Интерактивное меню сохранено при запуске без аргументов
12 новых тестов для argparse-функциональности

🐛 #24 — Адаптивное форматирование времени в режиме 4

Добавлена fmt_time(t: float) -> str в core/reporter.py с автовыбором единицы (s / ms / µs / ns)
Применена к колонкам Min / Median / Mean / Max / Std dev в режимах 3 и 4
Исправлено: значения ~1e-7 – 1e-6 с больше не усекаются до 0.0000
Добавлены тесты в test_formatters.py; исправлена ширина _SEP

📊 #25 — Реальный замер памяти в режиме 4 через `tracemalloc`

core/microbench_runner.py: bench_script теперь запускает tracemalloc вокруг timeit.repeat, возвращает peak_memory_kb через stdout
core/grader_core.py: run_microbench_mode() агрегирует пик по кейсам вместо хардкоженного 0.0
Заголовок колонки переименован: Memory → Memory (tracemalloc, KB)
Добавлены тесты: прямая проверка run_microbench() и агрегация в run_microbench_mode()

🏗️ #26 — Перенос `grader_core.py` и `reporter.py` в `core/` (продолжение #23)

git mv grader_core.py core/grader_core.py, git mv reporter.py core/reporter.py
Обновлены кросс-импорты между модулями, импорты в grader.py (compat-фасад) и cli.py
Обновлены patch-таргеты в 5 тестовых файлах
Обновлены CLAUDE.md, README.md, CHANGELOG.md
Целевая структура core/ полностью реализована

Финальное состояние

Метрика	До	После
Тесты	458 passed, 3 skipped	520 passed, 3 skipped
Coverage	87.71%	95.21%
`ruff check` / `ruff format`	✅	✅
Модулей в `core/`	0	9 (`executor`, `normalizers`, `parsers`, `storage`, `stepik_client`, `oauth_flow`, `microbench_runner`, `grader_core`, `reporter`)

Closes #23
Closes #19
Closes #20
Closes #21
Closes #18
Closes #24
Closes #25
Closes #26

…rt (fix #19) grader.py's local _parse_testblock_file had drifted into an exact duplicate of core/parsers.py's parse_testblock_file. Both grader.py and downloader.py now import the canonical function from core.parsers; downloader.py no longer imports grader.py at all (the removed local import was only masking the fact that both modules depended on logic that belonged in parsers.py). Also corrects stale test-count references (355 -> 461) and updates the DAG/structure diagrams in README.md and CLAUDE.md. All 458 tests pass (3 skipped) at 87.58% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

#20 finding #5) _build_function_wrapper() interpolated function_name and the solution filename's stem directly into a generated Python wrapper script (`from {module_stem} import {safe_func}`) without repr(), unlike the path parameter next to it. A function_name or filename stem containing a newline (e.g. from a crafted meta.json) could inject arbitrary statements into the wrapper that then gets executed via subprocess. Both values are now validated with str.isidentifier() before interpolation, raising ValueError on failure. run_single_test() catches that ValueError and reports it as a graceful RE verdict instead of letting it crash the whole grading run. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

…grader.py (#20 finding #6) grader.py had two identical inline blocks (mode 3's interactive-menu benchmark ranking and mode 4's run_microbench_mode) that computed each result's relative time and verdict via the same min-median-plus-_verdict() pattern. Extracted this into apply_relative_ranking() in core/microbench_runner.py, parameterized by threshold, so both call sites now share one implementation instead of duplicating the loop. grader.py's own _verdict() is left in place (still directly unit-tested) since it's semantically identical logic, not a broken duplicate -- only the two full inline blocks that repeated it were consolidated. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

…finding #4) grader.py (1460 lines) violated SRP by combining test-case loading, solution execution, table/rich output, and CLI menu logic in one file. Splits it into three modules per Sprint 7: - grader_core.py: load_test_cases, run_mode detection, wrapper codegen, run_single_test/run_tests/run_benchmark/run_microbench_mode. - reporter.py: the _console/_RICH rich-optional singleton, format_*/ print_* table functions, _cprint, _print_case_verbose. - cli.py: _interactive_menu, load-profile prompts, new main() entry point. grader.py itself becomes an 8-statement backward-compatibility facade: `from X import *` plus explicit re-exports of every private name and non-__all__ public name (run_microbench, apply_relative_ranking) the test suite references directly as grader.X, since `import *` skips underscore-prefixed names. Several tests patched grader._RICH/_console/Table/Text/run_tests/ run_single_test/run_microbench expecting to affect functions that now live in reporter.py/grader_core.py/cli.py -- those functions read their own module's globals at call time, not grader.py's re-exported copy, so the old patch targets silently stopped having any effect (two cases actually failed with AssertionError/KeyError; others degraded into testing the wrong branch without erroring). Updated patch targets in test_grader_coverage_gap.py, test_menu_modes.py, test_grader_extra.py, and test_formatters.py to point at the module that actually owns each name. Added a ruff per-file-ignore for grader.py (F401/F403/F405/I001) since every import in the facade is an intentional re-export. 465 passed (3 skipped), 88.97% coverage; ruff check/format clean; `echo 0 | python grader.py` smoke-tested end-to-end. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

…li.py coverage (#21) Closes the four low-priority findings from issue #21, and with them the #18 tracker epic (issues #19/#20/#21 all done): - core/microbench_runner.py: narrowed the broad `except Exception` around subprocess.run/float() parsing to `except (OSError, ValueError)`, the only two exception types that code path can actually raise. - core/stepik_client.py: simplified three redundant float(str(x or default)) conversions to float(x or default) -- float() already accepts int/float/str directly. - tests/test_cli.py (new): covers _interactive_menu() branches left untested by the Sprint 7 split (mode 1-4 error paths, mode 3/4 happy paths, profile-prompt custom values). cli.py coverage 40% -> 97%, total project coverage 88.97% -> 95.48%. - README.md: rewrote "Ограничения и безопасность" to state the no-sandbox threat model explicitly and fix stale module paths. 483 passed (3 skipped); ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

…n cmd (Sprint 6.1) "python3"/"python" can resolve to a system interpreter outside the active venv on Windows, running solutions against the wrong Python. sys.executable always points at the interpreter that launched grader, guaranteeing the same venv and installed packages. chore(normalizers): mark sort_lines/normalize_whitespace as experimental (Sprint 6.2) Neither function is wired into grader_core.py yet (no UI option to compare output order-insensitively or ignoring extra whitespace). Rather than deleting fully-implemented, tested utilities or relocating them out of the public module, added __all__ and explicit "experimental" docstring notes so the module's public surface is accurate. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

…r (Sprint 6.3) Adds config.py: a frozen GraderConfig dataclass + load_config() + module- level CONFIG singleton, reading overrides from [tool.stepik-grader] in pyproject.toml (falls back to documented defaults if absent). Replaces the hardcoded constant literals in grader_core.py (TIMEOUT_SECONDS, ENCODING, SIMILAR_THRESHOLD, MUCH_SLOWER_THRESHOLD, MEASURE_CHILD_MEMORY, MICROBENCH_MAX_CASES) with CONFIG-derived values -- same names, same defaults, so grader.py's __all__ re-exports are unaffected. core/executor.py's TIMEOUT also reads CONFIG.executor_timeout, wrapped in try/except ImportError with a literal 10 fallback: when executor.py runs as a subprocess script (python core/executor.py), sys.path[0] becomes core/, not the project root, so a bare `from config import CONFIG` would crash every function/main-mode test that shells out to it. Caught this via the full test suite before committing (13 failures) rather than assuming the import would just work. 490 passed (3 skipped), 95.44% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

…meout helper (Sprint 7.2/7.3) BenchStats (grader_core.py): unifies the min/median/mean/stdev/max computation that run_benchmark() and _micro_stats() each independently reimplemented. Both now build a BenchStats from their timings list and read its properties -- same dict-shaped return values as before, so reporter.py and existing callers/tests are unaffected. Added to grader_core.__all__ (re-exported via grader.BenchStats). run_microbench_with_timeout() (core/microbench_runner.py): runs an arbitrary fn() in a single-worker ThreadPoolExecutor, returning [] if it doesn't finish within timeout. Deliberately NOT wired into run_microbench_mode(): run_microbench() already wraps its subprocess.run() in timeout=60, which reliably kills the child and unblocks the caller -- layering a ThreadPoolExecutor on top of an already subprocess-bounded call adds no protection, and on an actual timeout would abandon the worker thread without killing whatever it was running. Kept available (with this reasoning in its docstring) for a future fn() that isn't already subprocess-bounded. 497 passed (3 skipped), 95.53% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

python grader.py --mode {1,2,3,4} [--file PATH] [--dir PATH] [--repeats N] [--number N], and python grader.py --version, alongside the existing interactive menu (still the default when --mode is omitted). Extracted each menu branch's body into standalone _run_mode_1/2/3/4() functions with no logic changes -- both _interactive_menu() and the new argparse dispatch in main() call the same code, so there's one implementation per mode instead of two. main() now takes an explicit argv: list[str] | None = None parameter instead of implicitly reading sys.argv, so tests can pass argument lists directly rather than depending on sys.argv (which contains pytest's own CLI flags during a test run). __version__ moved from grader.py into cli.py (where --version needs it) and re-exported back through grader.py's facade import -- the reverse (cli.py importing from grader.py) would have created a cycle, since grader.py already imports main from cli.py. Verified end-to-end against a real solution/test-dir for all four modes. Hit a pre-existing UnicodeEncodeError under Git Bash on Windows (console defaults to cp1251) -- reproduced identically on the unchanged interactive-menu path too, confirming it predates this change; not fixed here (environment-specific, PYTHONIOENCODING=utf-8 resolves it). 509 passed (3 skipped), 95.30% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

All GitHub issues (#18 epic: #19/#20/#21, plus #23) and all of CLAUDE.md's Sprint 6/7/8.1 backlog are now done. Only Sprint 8.2 (optional src/-layout, gated on a PyPI-publishing decision) remains, deliberately not started. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

Fixed :.4f (seconds) formatting truncated sub-millisecond timings in modes 3/4 to "0.0000" -- e.g. a 150us call showed as zero, useless for comparing fast solutions. Added fmt_time(t) to reporter.py, auto-selecting s/ms/us/ns based on magnitude, applied to the min/median/mean/max/stdev columns in format_benchmark_row() and print_benchmark_results()'s rich branch. format_correctness_row() (modes 1/2) is untouched -- out of scope per the issue. Widened the plain-text column width (7 -> 10 chars) and rich table min_width to fit unit-suffixed values ("150.000 ms"), and widened _SEP from 92 to 107 to match. Confirmed via research pass that no existing test asserts the exact :.4f-formatted digit string, so no other test files needed updates. 517 passed (3 skipped), 95.19% coverage; ruff check/format clean. Verified end-to-end: mode 3 benchmark on a real solution now shows "89.274 ms" instead of the old "0.0893". Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

Mode 4's Memory column always showed 0.00 because all 5 timeit.repeat runs share a single subprocess -- mode 3's psutil-RSS-in-a-thread approach can't attribute memory to one run within that. core/microbench_runner.py: wrapped the timeit.repeat() call in bench_script with tracemalloc.start()/get_traced_memory(); the peak is printed as a distinct "MEM:<bytes>" line after the timing lines and parsed separately, returned as peak_memory_mb (MB) on every return path (success, timeout, OSError) so the key always exists. This measures Python-heap peak, not process RSS -- doesn't see C-extension allocations, documented in the module docstring. grader_core.py: run_microbench_mode() no longer hardcodes peak_memory_mb=0.0; tracks a running max across benchmarked cases per solution like run_benchmark() does for mode 3. Function-call blocks (run_single_test) already had real psutil memory; stdin blocks now get the tracemalloc value. Fixed a stale mock in test_menu_modes.py that returned a dict without peak_memory_mb, which would now raise KeyError. Verified end-to-end: a solution allocating a 500k-element list now shows 7.90 MB instead of 0.00. 520 passed (3 skipped), 95.21% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

Continuation of the Issue #23 restructuring: relocated grader_core.py and reporter.py into core/ (via git mv, history preserved) so all internal (non-entry-point) modules live under core/. Only grader.py, cli.py, config.py, downloader.py, and diagnostik_stepik.py remain at the project root. Updated the cross-import between the two moved modules (core/grader_core.py's `from reporter import _print_case_verbose` and core/reporter.py's TYPE_CHECKING-only `from grader_core import TestCase`), grader.py's facade imports, and cli.py's imports to core.grader_core / core.reporter. Updated tests that imported these modules directly (bypassing the grader.py facade, which itself was unaffected): `import grader_core` / `import reporter` -> `from core import grader_core` / `from core import reporter`, and unittest.mock.patch("reporter.X", ...) string targets -> "core.reporter.X" (13 occurrences in test_grader_coverage_gap.py). Ran an exhaustive grep audit for every import/patch reference before editing -- all 520 tests passed on the first run after the move, no follow-up fixes needed. Verified end-to-end: --version, interactive menu, and --mode 1 against a real solution all still work. 520 passed (3 skipped), 95.21% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

ArtVsMark · 2026-07-02T13:55:03Z

📝 Сессия 02.07.2026 — закрыты issues #24, #25, #26

Ветка ArtVsMark-patch-1 дополнена ещё 4 коммитами. Все три новых issue выполнены, проверены end-to-end и закрыты в теле PR.

#24 — fix: адаптивное форматирование времени в режимах 3/4

Добавлена fmt_time() в core/reporter.py — автовыбор единиц s/ms/µs/ns. Исправлено усечение значений ~1e-7 с до 0.0000 (.4f). Smoke test: 89.274 ms вместо 0.0893.

#25 — feat: реальный замер памяти в режиме 4 через `tracemalloc`

run_microbench() теперь обертывает timeit.repeat в tracemalloc, возвращает peak_memory_kb. run_microbench_mode() агрегирует пик вместо хардкоженного 0.0. Smoke test: 7.90 MB для решения с [x**2 for x in range(100000)].

#26 — refactor: `grader_core.py` и `reporter.py` → `core/`

git mv для обоих файлов. Обновлены кросс-импорты, grader.py, cli.py и patch-таргеты в 5 тестовых файлах. Структура core/ теперь полностью соответствует CONTRIBUTING.md.

Состояние ветки: 520 passed, 3 skipped — 95.21% coverage — ruff ✅

➡️ Требуется: git push origin ArtVsMark-patch-1 с локальной машины, затем merge PR.

Update issue templates

2e9fc43

ArtVsMark closed this Jun 30, 2026

ArtVsMark reopened this Jun 30, 2026

ArtVsMark and others added 14 commits July 2, 2026 11:53

Add files via upload

eefe78e

refactor: move internal modules to core/

707a214

docs(checkpoint): record issues #23/#19/#20 completion, update backlog

a959f54

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

style: fix ruff findings and normalize line endings

f05e12f

ArtVsMark changed the title ~~Update issue templates~~ refactor: core/ restructuring, dedup, grader.py split, Sprints 6–8.1 Jul 2, 2026

ArtVsMark and others added 5 commits July 2, 2026 16:17

docs: добавить правила размещения файлов в CONTRIBUTING.md (issue #26)

f99ee1e

docs(checkpoint): record issues #24/#25/#26 completion

421e287

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

ArtVsMark merged commit 311b5ee into main Jul 2, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: core/ restructuring, dedup, grader.py split, Sprints 6–8.1#22

refactor: core/ restructuring, dedup, grader.py split, Sprints 6–8.1#22
ArtVsMark merged 20 commits into
mainfrom
ArtVsMark-patch-1

ArtVsMark commented Jun 30, 2026 •

edited

Loading

Uh oh!

ArtVsMark commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ArtVsMark commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Что сделано

🏗️ #23 — Реструктуризация: внутренние модули → core/

🔍 #19 — Parser dedup + import cycle + doc drift

✂️ #20 — Разбивка grader.py, валидация codegen, dedup ranking

🧹 #21 — Low-priority cleanups

Sprint 6

Sprint 7.2 / 7.3

Sprint 8.1

🐛 #24 — Адаптивное форматирование времени в режиме 4

📊 #25 — Реальный замер памяти в режиме 4 через tracemalloc

🏗️ #26 — Перенос grader_core.py и reporter.py в core/ (продолжение #23)

Финальное состояние

Uh oh!

ArtVsMark commented Jul 2, 2026

📝 Сессия 02.07.2026 — закрыты issues #24, #25, #26

#24 — fix: адаптивное форматирование времени в режимах 3/4

#25 — feat: реальный замер памяти в режиме 4 через tracemalloc

#26 — refactor: grader_core.py и reporter.py → core/

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ArtVsMark commented Jun 30, 2026 •

edited

Loading

🏗️ #23 — Реструктуризация: внутренние модули → `core/`

✂️ #20 — Разбивка `grader.py`, валидация codegen, dedup ranking

📊 #25 — Реальный замер памяти в режиме 4 через `tracemalloc`

🏗️ #26 — Перенос `grader_core.py` и `reporter.py` в `core/` (продолжение #23)

#25 — feat: реальный замер памяти в режиме 4 через `tracemalloc`

#26 — refactor: `grader_core.py` и `reporter.py` → `core/`