refactor: core/ restructuring, dedup, grader.py split, Sprints 6–8.1#22
Conversation
…rt (fix #19) grader.py's local _parse_testblock_file had drifted into an exact duplicate of core/parsers.py's parse_testblock_file. Both grader.py and downloader.py now import the canonical function from core.parsers; downloader.py no longer imports grader.py at all (the removed local import was only masking the fact that both modules depended on logic that belonged in parsers.py). Also corrects stale test-count references (355 -> 461) and updates the DAG/structure diagrams in README.md and CLAUDE.md. All 458 tests pass (3 skipped) at 87.58% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
#20 finding #5) _build_function_wrapper() interpolated function_name and the solution filename's stem directly into a generated Python wrapper script (`from {module_stem} import {safe_func}`) without repr(), unlike the path parameter next to it. A function_name or filename stem containing a newline (e.g. from a crafted meta.json) could inject arbitrary statements into the wrapper that then gets executed via subprocess. Both values are now validated with str.isidentifier() before interpolation, raising ValueError on failure. run_single_test() catches that ValueError and reports it as a graceful RE verdict instead of letting it crash the whole grading run. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…grader.py (#20 finding #6) grader.py had two identical inline blocks (mode 3's interactive-menu benchmark ranking and mode 4's run_microbench_mode) that computed each result's relative time and verdict via the same min-median-plus-_verdict() pattern. Extracted this into apply_relative_ranking() in core/microbench_runner.py, parameterized by threshold, so both call sites now share one implementation instead of duplicating the loop. grader.py's own _verdict() is left in place (still directly unit-tested) since it's semantically identical logic, not a broken duplicate -- only the two full inline blocks that repeated it were consolidated. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…finding #4) grader.py (1460 lines) violated SRP by combining test-case loading, solution execution, table/rich output, and CLI menu logic in one file. Splits it into three modules per Sprint 7: - grader_core.py: load_test_cases, run_mode detection, wrapper codegen, run_single_test/run_tests/run_benchmark/run_microbench_mode. - reporter.py: the _console/_RICH rich-optional singleton, format_*/ print_* table functions, _cprint, _print_case_verbose. - cli.py: _interactive_menu, load-profile prompts, new main() entry point. grader.py itself becomes an 8-statement backward-compatibility facade: `from X import *` plus explicit re-exports of every private name and non-__all__ public name (run_microbench, apply_relative_ranking) the test suite references directly as grader.X, since `import *` skips underscore-prefixed names. Several tests patched grader._RICH/_console/Table/Text/run_tests/ run_single_test/run_microbench expecting to affect functions that now live in reporter.py/grader_core.py/cli.py -- those functions read their own module's globals at call time, not grader.py's re-exported copy, so the old patch targets silently stopped having any effect (two cases actually failed with AssertionError/KeyError; others degraded into testing the wrong branch without erroring). Updated patch targets in test_grader_coverage_gap.py, test_menu_modes.py, test_grader_extra.py, and test_formatters.py to point at the module that actually owns each name. Added a ruff per-file-ignore for grader.py (F401/F403/F405/I001) since every import in the facade is an intentional re-export. 465 passed (3 skipped), 88.97% coverage; ruff check/format clean; `echo 0 | python grader.py` smoke-tested end-to-end. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…li.py coverage (#21) Closes the four low-priority findings from issue #21, and with them the #18 tracker epic (issues #19/#20/#21 all done): - core/microbench_runner.py: narrowed the broad `except Exception` around subprocess.run/float() parsing to `except (OSError, ValueError)`, the only two exception types that code path can actually raise. - core/stepik_client.py: simplified three redundant float(str(x or default)) conversions to float(x or default) -- float() already accepts int/float/str directly. - tests/test_cli.py (new): covers _interactive_menu() branches left untested by the Sprint 7 split (mode 1-4 error paths, mode 3/4 happy paths, profile-prompt custom values). cli.py coverage 40% -> 97%, total project coverage 88.97% -> 95.48%. - README.md: rewrote "Ограничения и безопасность" to state the no-sandbox threat model explicitly and fix stale module paths. 483 passed (3 skipped); ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…n cmd (Sprint 6.1) "python3"/"python" can resolve to a system interpreter outside the active venv on Windows, running solutions against the wrong Python. sys.executable always points at the interpreter that launched grader, guaranteeing the same venv and installed packages. chore(normalizers): mark sort_lines/normalize_whitespace as experimental (Sprint 6.2) Neither function is wired into grader_core.py yet (no UI option to compare output order-insensitively or ignoring extra whitespace). Rather than deleting fully-implemented, tested utilities or relocating them out of the public module, added __all__ and explicit "experimental" docstring notes so the module's public surface is accurate. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…r (Sprint 6.3) Adds config.py: a frozen GraderConfig dataclass + load_config() + module- level CONFIG singleton, reading overrides from [tool.stepik-grader] in pyproject.toml (falls back to documented defaults if absent). Replaces the hardcoded constant literals in grader_core.py (TIMEOUT_SECONDS, ENCODING, SIMILAR_THRESHOLD, MUCH_SLOWER_THRESHOLD, MEASURE_CHILD_MEMORY, MICROBENCH_MAX_CASES) with CONFIG-derived values -- same names, same defaults, so grader.py's __all__ re-exports are unaffected. core/executor.py's TIMEOUT also reads CONFIG.executor_timeout, wrapped in try/except ImportError with a literal 10 fallback: when executor.py runs as a subprocess script (python core/executor.py), sys.path[0] becomes core/, not the project root, so a bare `from config import CONFIG` would crash every function/main-mode test that shells out to it. Caught this via the full test suite before committing (13 failures) rather than assuming the import would just work. 490 passed (3 skipped), 95.44% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
…meout helper (Sprint 7.2/7.3) BenchStats (grader_core.py): unifies the min/median/mean/stdev/max computation that run_benchmark() and _micro_stats() each independently reimplemented. Both now build a BenchStats from their timings list and read its properties -- same dict-shaped return values as before, so reporter.py and existing callers/tests are unaffected. Added to grader_core.__all__ (re-exported via grader.BenchStats). run_microbench_with_timeout() (core/microbench_runner.py): runs an arbitrary fn() in a single-worker ThreadPoolExecutor, returning [] if it doesn't finish within timeout. Deliberately NOT wired into run_microbench_mode(): run_microbench() already wraps its subprocess.run() in timeout=60, which reliably kills the child and unblocks the caller -- layering a ThreadPoolExecutor on top of an already subprocess-bounded call adds no protection, and on an actual timeout would abandon the worker thread without killing whatever it was running. Kept available (with this reasoning in its docstring) for a future fn() that isn't already subprocess-bounded. 497 passed (3 skipped), 95.53% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
python grader.py --mode {1,2,3,4} [--file PATH] [--dir PATH] [--repeats N]
[--number N], and python grader.py --version, alongside the existing
interactive menu (still the default when --mode is omitted).
Extracted each menu branch's body into standalone _run_mode_1/2/3/4()
functions with no logic changes -- both _interactive_menu() and the new
argparse dispatch in main() call the same code, so there's one
implementation per mode instead of two. main() now takes an explicit
argv: list[str] | None = None parameter instead of implicitly reading
sys.argv, so tests can pass argument lists directly rather than depending
on sys.argv (which contains pytest's own CLI flags during a test run).
__version__ moved from grader.py into cli.py (where --version needs it)
and re-exported back through grader.py's facade import -- the reverse
(cli.py importing from grader.py) would have created a cycle, since
grader.py already imports main from cli.py.
Verified end-to-end against a real solution/test-dir for all four modes.
Hit a pre-existing UnicodeEncodeError under Git Bash on Windows (console
defaults to cp1251) -- reproduced identically on the unchanged
interactive-menu path too, confirming it predates this change; not fixed
here (environment-specific, PYTHONIOENCODING=utf-8 resolves it).
509 passed (3 skipped), 95.30% coverage; ruff check/format clean.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Fixed :.4f (seconds) formatting truncated sub-millisecond timings in
modes 3/4 to "0.0000" -- e.g. a 150us call showed as zero, useless for
comparing fast solutions. Added fmt_time(t) to reporter.py, auto-selecting
s/ms/us/ns based on magnitude, applied to the min/median/mean/max/stdev
columns in format_benchmark_row() and print_benchmark_results()'s rich
branch. format_correctness_row() (modes 1/2) is untouched -- out of scope
per the issue.
Widened the plain-text column width (7 -> 10 chars) and rich table
min_width to fit unit-suffixed values ("150.000 ms"), and widened _SEP
from 92 to 107 to match. Confirmed via research pass that no existing
test asserts the exact :.4f-formatted digit string, so no other test
files needed updates.
517 passed (3 skipped), 95.19% coverage; ruff check/format clean.
Verified end-to-end: mode 3 benchmark on a real solution now shows
"89.274 ms" instead of the old "0.0893".
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Mode 4's Memory column always showed 0.00 because all 5 timeit.repeat runs share a single subprocess -- mode 3's psutil-RSS-in-a-thread approach can't attribute memory to one run within that. core/microbench_runner.py: wrapped the timeit.repeat() call in bench_script with tracemalloc.start()/get_traced_memory(); the peak is printed as a distinct "MEM:<bytes>" line after the timing lines and parsed separately, returned as peak_memory_mb (MB) on every return path (success, timeout, OSError) so the key always exists. This measures Python-heap peak, not process RSS -- doesn't see C-extension allocations, documented in the module docstring. grader_core.py: run_microbench_mode() no longer hardcodes peak_memory_mb=0.0; tracks a running max across benchmarked cases per solution like run_benchmark() does for mode 3. Function-call blocks (run_single_test) already had real psutil memory; stdin blocks now get the tracemalloc value. Fixed a stale mock in test_menu_modes.py that returned a dict without peak_memory_mb, which would now raise KeyError. Verified end-to-end: a solution allocating a 500k-element list now shows 7.90 MB instead of 0.00. 520 passed (3 skipped), 95.21% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Continuation of the Issue #23 restructuring: relocated grader_core.py and reporter.py into core/ (via git mv, history preserved) so all internal (non-entry-point) modules live under core/. Only grader.py, cli.py, config.py, downloader.py, and diagnostik_stepik.py remain at the project root. Updated the cross-import between the two moved modules (core/grader_core.py's `from reporter import _print_case_verbose` and core/reporter.py's TYPE_CHECKING-only `from grader_core import TestCase`), grader.py's facade imports, and cli.py's imports to core.grader_core / core.reporter. Updated tests that imported these modules directly (bypassing the grader.py facade, which itself was unaffected): `import grader_core` / `import reporter` -> `from core import grader_core` / `from core import reporter`, and unittest.mock.patch("reporter.X", ...) string targets -> "core.reporter.X" (13 occurrences in test_grader_coverage_gap.py). Ran an exhaustive grep audit for every import/patch reference before editing -- all 520 tests passed on the first run after the move, no follow-up fixes needed. Verified end-to-end: --version, interactive menu, and --mode 1 against a real solution all still work. 520 passed (3 skipped), 95.21% coverage; ruff check/format clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
📝 Сессия 02.07.2026 — закрыты issues #24, #25, #26Ветка #24 — fix: адаптивное форматирование времени в режимах 3/4Добавлена #25 — feat: реальный замер памяти в режиме 4 через
|
Что сделано
Этот PR закрывает эпик #18 полностью (issues #23 → #19 → #20 → #21) и весь backlog CLAUDE.md (Sprints 6, 7, 8.1), а также три новых issue (#24, #25, #26), обнаруженных в ходе тестирования режима 4.
🏗️ #23 — Реструктуризация: внутренние модули →
core/executor.py,normalizers.py,parsers.py,storage.py,stepik_client.py,oauth_flow.py,microbench_runner.pyвcore/черезgit mv(история сохранена)core/__init__.pyimportво всех файлах и тестах, включаяunittest.mock.patch-строки и пути самовызова subprocesspyproject.toml,conftest.py,README.md,CONTRIBUTING.md🔍 #19 — Parser dedup + import cycle + doc drift
_parse_testblock_fileизgrader.py(оставлена единственная вcore/parsers.py)downloader.pyбольше не импортируетgrader.pyсовсем — цикл устранён✂️ #20 — Разбивка
grader.py, валидация codegen, dedup rankingcore/microbench_runner.apply_relative_ranking(), 3 call-сайта консолидированыgrader.py(~1460 строк) разбит наgrader_core.py/reporter.py/cli.py;grader.pyтеперь — тонкий 8-строчный compat-фасад; все patch-таргеты в тестах обновлены🧹 #21 — Low-priority cleanups
except Exceptionсужен до конкретных типов вcore/microbench_runner.pyfloat(str(x or 0))→float(x or 0)вcore/stepik_client.pycli.pyпокрытие: 40% → 97% (18 новых тестов вtest_cli.py)Sprint 6
sys.executableвместо захардкоженного"python3"/"python"вcore/executor.pynormalizers.py—sort_lines/normalize_whitespaceформально экспортированы через__all__с явной пометкойconfig.pyс[tool.stepik-grader]секцией вpyproject.tomlSprint 7.2 / 7.3
BenchStatsdataclass — дедублирует stats-расчёт между режимами 3/4run_microbench_with_timeout()— timeout guard для режима 4Sprint 8.1
--mode,--file,--dir,--repeats,--number,--version🐛 #24 — Адаптивное форматирование времени в режиме 4
fmt_time(t: float) -> strвcore/reporter.pyс автовыбором единицы (s / ms / µs / ns)0.0000test_formatters.py; исправлена ширина_SEP📊 #25 — Реальный замер памяти в режиме 4 через
tracemalloccore/microbench_runner.py:bench_scriptтеперь запускаетtracemallocвокругtimeit.repeat, возвращаетpeak_memory_kbчерез stdoutcore/grader_core.py:run_microbench_mode()агрегирует пик по кейсам вместо хардкоженного0.0Memory→Memory (tracemalloc, KB)run_microbench()и агрегация вrun_microbench_mode()🏗️ #26 — Перенос
grader_core.pyиreporter.pyвcore/(продолжение #23)git mv grader_core.py core/grader_core.py,git mv reporter.py core/reporter.pygrader.py(compat-фасад) иcli.pyCLAUDE.md,README.md,CHANGELOG.mdcore/полностью реализованаФинальное состояние
ruff check/ruff formatcore/executor,normalizers,parsers,storage,stepik_client,oauth_flow,microbench_runner,grader_core,reporter)Closes #23
Closes #19
Closes #20
Closes #21
Closes #18
Closes #24
Closes #25
Closes #26