Skip to content

refactor: core/ restructuring, dedup, grader.py split, Sprints 6–8.1#22

Merged
ArtVsMark merged 20 commits into
mainfrom
ArtVsMark-patch-1
Jul 2, 2026
Merged

refactor: core/ restructuring, dedup, grader.py split, Sprints 6–8.1#22
ArtVsMark merged 20 commits into
mainfrom
ArtVsMark-patch-1

Conversation

@ArtVsMark

@ArtVsMark ArtVsMark commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Что сделано

Этот PR закрывает эпик #18 полностью (issues #23#19#20#21) и весь backlog CLAUDE.md (Sprints 6, 7, 8.1), а также три новых issue (#24, #25, #26), обнаруженных в ходе тестирования режима 4.


🏗️ #23 — Реструктуризация: внутренние модули → core/

  • Перенесены executor.py, normalizers.py, parsers.py, storage.py, stepik_client.py, oauth_flow.py, microbench_runner.py в core/ через git mv (история сохранена)
  • Добавлен core/__init__.py
  • Обновлены все import во всех файлах и тестах, включая unittest.mock.patch-строки и пути самовызова subprocess
  • Обновлены pyproject.toml, conftest.py, README.md, CONTRIBUTING.md

🔍 #19 — Parser dedup + import cycle + doc drift

  • Удалена дублирующая _parse_testblock_file из grader.py (оставлена единственная в core/parsers.py)
  • downloader.py больше не импортирует grader.py совсем — цикл устранён
  • CHANGELOG и CLAUDE.md приведены в соответствие с реальным состоянием кода

✂️ #20 — Разбивка grader.py, валидация codegen, dedup ranking

🧹 #21 — Low-priority cleanups

  • except Exception сужен до конкретных типов в core/microbench_runner.py
  • float(str(x or 0))float(x or 0) в core/stepik_client.py
  • cli.py покрытие: 40% → 97% (18 новых тестов в test_cli.py)
  • Документация security assumption обновлена

Sprint 6

  • 6.1: sys.executable вместо захардкоженного "python3"/"python" в core/executor.py
  • 6.2: normalizers.pysort_lines/normalize_whitespace формально экспортированы через __all__ с явной пометкой
  • 6.3: новый config.py с [tool.stepik-grader] секцией в pyproject.toml

Sprint 7.2 / 7.3

  • BenchStats dataclass — дедублирует stats-расчёт между режимами 3/4
  • run_microbench_with_timeout() — timeout guard для режима 4

Sprint 8.1

  • Полноценный argparse CLI: --mode, --file, --dir, --repeats, --number, --version
  • Интерактивное меню сохранено при запуске без аргументов
  • 12 новых тестов для argparse-функциональности

🐛 #24 — Адаптивное форматирование времени в режиме 4

  • Добавлена fmt_time(t: float) -> str в core/reporter.py с автовыбором единицы (s / ms / µs / ns)
  • Применена к колонкам Min / Median / Mean / Max / Std dev в режимах 3 и 4
  • Исправлено: значения ~1e-7 – 1e-6 с больше не усекаются до 0.0000
  • Добавлены тесты в test_formatters.py; исправлена ширина _SEP

📊 #25 — Реальный замер памяти в режиме 4 через tracemalloc

  • core/microbench_runner.py: bench_script теперь запускает tracemalloc вокруг timeit.repeat, возвращает peak_memory_kb через stdout
  • core/grader_core.py: run_microbench_mode() агрегирует пик по кейсам вместо хардкоженного 0.0
  • Заголовок колонки переименован: MemoryMemory (tracemalloc, KB)
  • Добавлены тесты: прямая проверка run_microbench() и агрегация в run_microbench_mode()

🏗️ #26 — Перенос grader_core.py и reporter.py в core/ (продолжение #23)

  • git mv grader_core.py core/grader_core.py, git mv reporter.py core/reporter.py
  • Обновлены кросс-импорты между модулями, импорты в grader.py (compat-фасад) и cli.py
  • Обновлены patch-таргеты в 5 тестовых файлах
  • Обновлены CLAUDE.md, README.md, CHANGELOG.md
  • Целевая структура core/ полностью реализована

Финальное состояние

Метрика До После
Тесты 458 passed, 3 skipped 520 passed, 3 skipped
Coverage 87.71% 95.21%
ruff check / ruff format
Модулей в core/ 0 9 (executor, normalizers, parsers, storage, stepik_client, oauth_flow, microbench_runner, grader_core, reporter)

Closes #23
Closes #19
Closes #20
Closes #21
Closes #18
Closes #24
Closes #25
Closes #26

@ArtVsMark ArtVsMark closed this Jun 30, 2026
@ArtVsMark ArtVsMark reopened this Jun 30, 2026
ArtVsMark and others added 14 commits July 2, 2026 11:53
…rt (fix #19)

grader.py's local _parse_testblock_file had drifted into an exact duplicate
of core/parsers.py's parse_testblock_file. Both grader.py and downloader.py
now import the canonical function from core.parsers; downloader.py no
longer imports grader.py at all (the removed local import was only masking
the fact that both modules depended on logic that belonged in parsers.py).
Also corrects stale test-count references (355 -> 461) and updates the
DAG/structure diagrams in README.md and CLAUDE.md.

All 458 tests pass (3 skipped) at 87.58% coverage; ruff check/format clean.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
#20 finding #5)

_build_function_wrapper() interpolated function_name and the solution
filename's stem directly into a generated Python wrapper script
(`from {module_stem} import {safe_func}`) without repr(), unlike the
path parameter next to it. A function_name or filename stem containing
a newline (e.g. from a crafted meta.json) could inject arbitrary
statements into the wrapper that then gets executed via subprocess.

Both values are now validated with str.isidentifier() before
interpolation, raising ValueError on failure. run_single_test() catches
that ValueError and reports it as a graceful RE verdict instead of
letting it crash the whole grading run.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…grader.py (#20 finding #6)

grader.py had two identical inline blocks (mode 3's interactive-menu
benchmark ranking and mode 4's run_microbench_mode) that computed each
result's relative time and verdict via the same min-median-plus-_verdict()
pattern. Extracted this into apply_relative_ranking() in
core/microbench_runner.py, parameterized by threshold, so both call
sites now share one implementation instead of duplicating the loop.

grader.py's own _verdict() is left in place (still directly unit-tested)
since it's semantically identical logic, not a broken duplicate -- only
the two full inline blocks that repeated it were consolidated.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…finding #4)

grader.py (1460 lines) violated SRP by combining test-case loading,
solution execution, table/rich output, and CLI menu logic in one file.
Splits it into three modules per Sprint 7:

- grader_core.py: load_test_cases, run_mode detection, wrapper codegen,
  run_single_test/run_tests/run_benchmark/run_microbench_mode.
- reporter.py: the _console/_RICH rich-optional singleton, format_*/
  print_* table functions, _cprint, _print_case_verbose.
- cli.py: _interactive_menu, load-profile prompts, new main() entry point.

grader.py itself becomes an 8-statement backward-compatibility facade:
`from X import *` plus explicit re-exports of every private name and
non-__all__ public name (run_microbench, apply_relative_ranking) the test
suite references directly as grader.X, since `import *` skips
underscore-prefixed names.

Several tests patched grader._RICH/_console/Table/Text/run_tests/
run_single_test/run_microbench expecting to affect functions that now live
in reporter.py/grader_core.py/cli.py -- those functions read their own
module's globals at call time, not grader.py's re-exported copy, so the
old patch targets silently stopped having any effect (two cases actually
failed with AssertionError/KeyError; others degraded into testing the
wrong branch without erroring). Updated patch targets in
test_grader_coverage_gap.py, test_menu_modes.py, test_grader_extra.py, and
test_formatters.py to point at the module that actually owns each name.

Added a ruff per-file-ignore for grader.py (F401/F403/F405/I001) since
every import in the facade is an intentional re-export.

465 passed (3 skipped), 88.97% coverage; ruff check/format clean;
`echo 0 | python grader.py` smoke-tested end-to-end.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…li.py coverage (#21)

Closes the four low-priority findings from issue #21, and with them the
#18 tracker epic (issues #19/#20/#21 all done):

- core/microbench_runner.py: narrowed the broad `except Exception` around
  subprocess.run/float() parsing to `except (OSError, ValueError)`, the
  only two exception types that code path can actually raise.
- core/stepik_client.py: simplified three redundant
  float(str(x or default)) conversions to float(x or default) --
  float() already accepts int/float/str directly.
- tests/test_cli.py (new): covers _interactive_menu() branches left
  untested by the Sprint 7 split (mode 1-4 error paths, mode 3/4 happy
  paths, profile-prompt custom values). cli.py coverage 40% -> 97%,
  total project coverage 88.97% -> 95.48%.
- README.md: rewrote "Ограничения и безопасность" to state the no-sandbox
  threat model explicitly and fix stale module paths.

483 passed (3 skipped); ruff check/format clean.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…n cmd (Sprint 6.1)

"python3"/"python" can resolve to a system interpreter outside the active
venv on Windows, running solutions against the wrong Python. sys.executable
always points at the interpreter that launched grader, guaranteeing the
same venv and installed packages.

chore(normalizers): mark sort_lines/normalize_whitespace as experimental (Sprint 6.2)

Neither function is wired into grader_core.py yet (no UI option to compare
output order-insensitively or ignoring extra whitespace). Rather than
deleting fully-implemented, tested utilities or relocating them out of the
public module, added __all__ and explicit "experimental" docstring notes
so the module's public surface is accurate.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…r (Sprint 6.3)

Adds config.py: a frozen GraderConfig dataclass + load_config() + module-
level CONFIG singleton, reading overrides from [tool.stepik-grader] in
pyproject.toml (falls back to documented defaults if absent). Replaces the
hardcoded constant literals in grader_core.py (TIMEOUT_SECONDS, ENCODING,
SIMILAR_THRESHOLD, MUCH_SLOWER_THRESHOLD, MEASURE_CHILD_MEMORY,
MICROBENCH_MAX_CASES) with CONFIG-derived values -- same names, same
defaults, so grader.py's __all__ re-exports are unaffected.

core/executor.py's TIMEOUT also reads CONFIG.executor_timeout, wrapped in
try/except ImportError with a literal 10 fallback: when executor.py runs
as a subprocess script (python core/executor.py), sys.path[0] becomes
core/, not the project root, so a bare `from config import CONFIG` would
crash every function/main-mode test that shells out to it. Caught this via
the full test suite before committing (13 failures) rather than assuming
the import would just work.

490 passed (3 skipped), 95.44% coverage; ruff check/format clean.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…meout helper (Sprint 7.2/7.3)

BenchStats (grader_core.py): unifies the min/median/mean/stdev/max
computation that run_benchmark() and _micro_stats() each independently
reimplemented. Both now build a BenchStats from their timings list and
read its properties -- same dict-shaped return values as before, so
reporter.py and existing callers/tests are unaffected. Added to
grader_core.__all__ (re-exported via grader.BenchStats).

run_microbench_with_timeout() (core/microbench_runner.py): runs an
arbitrary fn() in a single-worker ThreadPoolExecutor, returning [] if it
doesn't finish within timeout. Deliberately NOT wired into
run_microbench_mode(): run_microbench() already wraps its subprocess.run()
in timeout=60, which reliably kills the child and unblocks the caller --
layering a ThreadPoolExecutor on top of an already subprocess-bounded call
adds no protection, and on an actual timeout would abandon the worker
thread without killing whatever it was running. Kept available (with this
reasoning in its docstring) for a future fn() that isn't already
subprocess-bounded.

497 passed (3 skipped), 95.53% coverage; ruff check/format clean.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
python grader.py --mode {1,2,3,4} [--file PATH] [--dir PATH] [--repeats N]
[--number N], and python grader.py --version, alongside the existing
interactive menu (still the default when --mode is omitted).

Extracted each menu branch's body into standalone _run_mode_1/2/3/4()
functions with no logic changes -- both _interactive_menu() and the new
argparse dispatch in main() call the same code, so there's one
implementation per mode instead of two. main() now takes an explicit
argv: list[str] | None = None parameter instead of implicitly reading
sys.argv, so tests can pass argument lists directly rather than depending
on sys.argv (which contains pytest's own CLI flags during a test run).

__version__ moved from grader.py into cli.py (where --version needs it)
and re-exported back through grader.py's facade import -- the reverse
(cli.py importing from grader.py) would have created a cycle, since
grader.py already imports main from cli.py.

Verified end-to-end against a real solution/test-dir for all four modes.
Hit a pre-existing UnicodeEncodeError under Git Bash on Windows (console
defaults to cp1251) -- reproduced identically on the unchanged
interactive-menu path too, confirming it predates this change; not fixed
here (environment-specific, PYTHONIOENCODING=utf-8 resolves it).

509 passed (3 skipped), 95.30% coverage; ruff check/format clean.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
All GitHub issues (#18 epic: #19/#20/#21, plus #23) and all of CLAUDE.md's
Sprint 6/7/8.1 backlog are now done. Only Sprint 8.2 (optional src/-layout,
gated on a PyPI-publishing decision) remains, deliberately not started.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
ArtVsMark and others added 5 commits July 2, 2026 16:17
Fixed :.4f (seconds) formatting truncated sub-millisecond timings in
modes 3/4 to "0.0000" -- e.g. a 150us call showed as zero, useless for
comparing fast solutions. Added fmt_time(t) to reporter.py, auto-selecting
s/ms/us/ns based on magnitude, applied to the min/median/mean/max/stdev
columns in format_benchmark_row() and print_benchmark_results()'s rich
branch. format_correctness_row() (modes 1/2) is untouched -- out of scope
per the issue.

Widened the plain-text column width (7 -> 10 chars) and rich table
min_width to fit unit-suffixed values ("150.000 ms"), and widened _SEP
from 92 to 107 to match. Confirmed via research pass that no existing
test asserts the exact :.4f-formatted digit string, so no other test
files needed updates.

517 passed (3 skipped), 95.19% coverage; ruff check/format clean.
Verified end-to-end: mode 3 benchmark on a real solution now shows
"89.274 ms" instead of the old "0.0893".

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Mode 4's Memory column always showed 0.00 because all 5 timeit.repeat
runs share a single subprocess -- mode 3's psutil-RSS-in-a-thread
approach can't attribute memory to one run within that.

core/microbench_runner.py: wrapped the timeit.repeat() call in
bench_script with tracemalloc.start()/get_traced_memory(); the peak is
printed as a distinct "MEM:<bytes>" line after the timing lines and
parsed separately, returned as peak_memory_mb (MB) on every return path
(success, timeout, OSError) so the key always exists. This measures
Python-heap peak, not process RSS -- doesn't see C-extension allocations,
documented in the module docstring.

grader_core.py: run_microbench_mode() no longer hardcodes
peak_memory_mb=0.0; tracks a running max across benchmarked cases per
solution like run_benchmark() does for mode 3. Function-call blocks
(run_single_test) already had real psutil memory; stdin blocks now get
the tracemalloc value.

Fixed a stale mock in test_menu_modes.py that returned a dict without
peak_memory_mb, which would now raise KeyError.

Verified end-to-end: a solution allocating a 500k-element list now shows
7.90 MB instead of 0.00.

520 passed (3 skipped), 95.21% coverage; ruff check/format clean.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Continuation of the Issue #23 restructuring: relocated grader_core.py and
reporter.py into core/ (via git mv, history preserved) so all internal
(non-entry-point) modules live under core/. Only grader.py, cli.py,
config.py, downloader.py, and diagnostik_stepik.py remain at the project
root.

Updated the cross-import between the two moved modules
(core/grader_core.py's `from reporter import _print_case_verbose` and
core/reporter.py's TYPE_CHECKING-only `from grader_core import TestCase`),
grader.py's facade imports, and cli.py's imports to core.grader_core /
core.reporter.

Updated tests that imported these modules directly (bypassing the
grader.py facade, which itself was unaffected): `import grader_core` /
`import reporter` -> `from core import grader_core` / `from core import
reporter`, and unittest.mock.patch("reporter.X", ...) string targets ->
"core.reporter.X" (13 occurrences in test_grader_coverage_gap.py).

Ran an exhaustive grep audit for every import/patch reference before
editing -- all 520 tests passed on the first run after the move, no
follow-up fixes needed. Verified end-to-end: --version, interactive menu,
and --mode 1 against a real solution all still work.

520 passed (3 skipped), 95.21% coverage; ruff check/format clean.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
@ArtVsMark

Copy link
Copy Markdown
Owner Author

📝 Сессия 02.07.2026 — закрыты issues #24, #25, #26

Ветка ArtVsMark-patch-1 дополнена ещё 4 коммитами. Все три новых issue выполнены, проверены end-to-end и закрыты в теле PR.

#24 — fix: адаптивное форматирование времени в режимах 3/4

Добавлена fmt_time() в core/reporter.py — автовыбор единиц s/ms/µs/ns. Исправлено усечение значений ~1e-7 с до 0.0000 (.4f). Smoke test: 89.274 ms вместо 0.0893.

#25 — feat: реальный замер памяти в режиме 4 через tracemalloc

run_microbench() теперь обертывает timeit.repeat в tracemalloc, возвращает peak_memory_kb. run_microbench_mode() агрегирует пик вместо хардкоженного 0.0. Smoke test: 7.90 MB для решения с [x**2 for x in range(100000)].

#26 — refactor: grader_core.py и reporter.pycore/

git mv для обоих файлов. Обновлены кросс-импорты, grader.py, cli.py и patch-таргеты в 5 тестовых файлах. Структура core/ теперь полностью соответствует CONTRIBUTING.md.


Состояние ветки: 520 passed, 3 skipped — 95.21% coverage — ruff ✅

➡️ Требуется: git push origin ArtVsMark-patch-1 с локальной машины, затем merge PR.

@ArtVsMark ArtVsMark merged commit 311b5ee into main Jul 2, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment