Skip to content

soluchi07/ADE-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adverse Drug Event Detection with n2c2, SIDER, and LLM Evaluation

This repository contains two related workflows for adverse drug event (ADE) extraction and evaluation:

  1. A classical preprocessing pipeline that extracts ADE-drug pairs from n2c2 annotations, normalizes terms, links them to SIDER, and filters validated versus potentially novel pairs.
  2. An LLM-based inference and evaluation workflow that runs ADE detection on either annotation files or parquet context windows and compares performance with and without SIDER context.

The current project state is centered on the parquet-window LLM workflow and its full-batch evaluation on the n2c2 test set.


Current State

As of 2026-03-14, the repository contains completed full-batch parquet inference outputs and evaluation artifacts for paired SIDER and no-SIDER runs across two run families (full and 2pred).

Current status:

  • Full parquet-window inference completed for both SIDER-enabled and no-SIDER prompts.
  • Full-batch comparison completed on all 202 test files.
  • Confidence-threshold sweep completed for 0.30, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, and 0.80.
  • Best tested SIDER operating region is the 0.75-0.80 plateau with overall F1 0.5140.
  • Without thresholding, the no-SIDER run performs better overall; SIDER overtakes once min-confidence >= 0.55.

Key current artifacts:

  • data/outputs/llm_predictions_parquet_full_parquet_sider.csv
  • data/outputs/llm_predictions_parquet_full_parquet_no_sider.csv
  • data/outputs/full_batch_comparison_eval.txt
  • findings/eval.md

Repository Layout

ade-project/
├── README.md
├── requirements.txt
├── example.env
├── drug_atc.tsv
├── data/
│   ├── n2c2/
│   │   ├── raw/
│   │   │   ├── train/
│   │   │   ├── test/
│   │   │   ├── test_txts/
│   │   │   └── entity_dataset_w_3_sentence_grouping.parquet
│   │   └── processed/
│   │       ├── ade_drug_relations.csv
│   │       ├── n2c2_clean.csv
│   │       ├── n2c2_entities.csv
│   │       ├── n2c2_relations.csv
│   │       ├── n2c2_with_sider_context.csv
│   │       ├── potential_novel_ade_pairs.csv
│   │       └── validated_ade_drug_pairs.csv
│   ├── sider/
│   │   ├── raw/
│   │   │   ├── drug_names.tsv
│   │   │   └── meddra_all_se.tsv
│   │   └── processed/
│   │       └── sider_clean.csv
│   └── outputs/
│       ├── evaluation_gold_truth_ade_drug.csv
│       ├── evaluation_selected_test_ids.csv
│       ├── full_batch_comparison_eval.txt
│       ├── full_batch_comparison_eval_thr_07.txt
│       ├── full_batch_comparison_eval_thr_08.txt
│       ├── full_batch_comparison_eval_2pred_thr_080.txt
│       ├── full_batch_comparison_eval_2pred_thr_090.txt
│       ├── llm_predictions_parquet_2pred_parquet_full_no_sider.csv
│       ├── llm_predictions_parquet_2pred_parquet_full_sider.csv
│       ├── llm_predictions_parquet_full_parquet_no_sider.csv
│       └── llm_predictions_parquet_full_parquet_sider.csv
├── findings/
│   ├── eval.md
│   ├── OPENROUTER_TEST_RESULTS.md
│   └── overview.txt
├── notebooks/
│   └── 01_extract_n2c2_entities.ipynb
└── scripts/
    ├── config.py
    ├── estimate_costs.py
    ├── evaluate_results.py
    ├── extract_n2c2_entities.py
    ├── filter_validate.py
    ├── jsonl_to_csv.py
    ├── link_sider.py
    ├── llm_ade_detection.py
    ├── normalize_terms.py
    └── run_pipeline.py

Data Requirements

You need both of the following datasets available locally:

  • n2c2 ADE extraction dataset in data/n2c2/raw/
  • SIDER raw files in data/sider/raw/

Required raw files:

  • data/n2c2/raw/train/*.ann
  • data/n2c2/raw/train/*.txt
  • data/n2c2/raw/test/*.ann
  • data/n2c2/raw/test/*.txt
  • data/n2c2/raw/test_txts/*.txt
  • data/n2c2/raw/entity_dataset_w_3_sentence_grouping.parquet
  • data/sider/raw/drug_names.tsv
  • data/sider/raw/meddra_all_se.tsv

The parquet file is required for the current LLM workflow.


Environment Setup

Python

The current workspace is running in a local virtual environment with Python 3.14.3.

Create and activate a virtual environment:

Windows PowerShell

python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt

Git Bash on Windows

python -m venv .venv
source .venv/Scripts/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

Required packages

Core dependencies are listed in requirements.txt and include:

  • pandas
  • numpy
  • pyarrow
  • fastparquet
  • rapidfuzz
  • drug_named_entity_recognition

Notes:

  • drug_named_entity_recognition is used for normalization where available; parts of the code fall back to simpler normalization if it is unavailable.

Environment variables

The LLM workflow uses OpenRouter and expects an API key in a local .env file.

Copy the sample file and set your key:

cp example.env .env

Then copy example.env into .env and edit it as follows:

OPENROUTER_API_KEY=your_key_here
OPENROUTER_API_URL=https://openrouter.ai/api/v1/chat/completions

llm_ade_detection.py reads .env directly at runtime.


Classical Pipeline

The classical preprocessing pipeline operates on n2c2 annotations and SIDER tables.

Run the full preprocessing pipeline

python scripts/run_pipeline.py

Useful variants

Run a subset of steps:

python scripts/run_pipeline.py --steps 1-3
python scripts/run_pipeline.py --steps 3,4

Skip steps with existing outputs:

python scripts/run_pipeline.py --skip-existing

Force rerun:

python scripts/run_pipeline.py --force

Include evaluation step if predictions already exist:

python scripts/run_pipeline.py --include-eval

Individual classical pipeline commands

python scripts/extract_n2c2_entities.py
python scripts/normalize_terms.py
python scripts/link_sider.py
python scripts/filter_validate.py

Outputs from this workflow are written mainly under data/n2c2/processed/ and data/sider/processed/.


LLM Workflow

The current research workflow uses scripts/llm_ade_detection.py to run inference with OpenRouter.

Supported modes:

  • --source ann: use annotation and note files
  • --source parquet: use 3-sentence parquet windows
  • --mode pilot: small first-file test
  • --mode batch: multi-file processing
  • --mode full: full parquet processing

Full parquet inference with SIDER context

python scripts/llm_ade_detection.py \
  --source parquet \
  --mode full \
  --output-suffix full_parquet_sider

Full parquet inference without SIDER context

python scripts/llm_ade_detection.py \
  --source parquet \
  --mode full \
  --disable-sider-context \
  --output-suffix full_parquet_no_sider

These commands generate:

  • data/outputs/llm_predictions_partial_parquet_<suffix>.jsonl
  • data/outputs/llm_predictions_parquet_<suffix>.csv

Example model override

python scripts/llm_ade_detection.py \
  --source parquet \
  --mode full \
  --model meta-llama/llama-3.1-8b-instruct \
  --output-suffix full_parquet_llama31

Default model at present:

  • meta-llama/llama-3.3-70b-instruct

Evaluation Commands

Compare SIDER versus no-SIDER on the full parquet run

python scripts/evaluate_results.py \
  --predictions-with-sider data/outputs/llm_predictions_parquet_full_parquet_sider.csv \
  --predictions-without-sider data/outputs/llm_predictions_parquet_full_parquet_no_sider.csv \
  --batch-type full

This writes:

  • data/outputs/full_batch_comparison_eval.txt

Run thresholded comparison

Example at 0.75:

python scripts/evaluate_results.py \
  --predictions-with-sider data/outputs/llm_predictions_parquet_full_parquet_sider.csv \
  --predictions-without-sider data/outputs/llm_predictions_parquet_full_parquet_no_sider.csv \
  --batch-type full \
  --min-confidence 0.75 \
  --reuse-gold-truth \
  --batch-report-output data/outputs/full_batch_comparison_eval_thr_075.txt

Single prediction file evaluation

python scripts/evaluate_results.py \
  --predictions data/outputs/llm_predictions_complete.csv

Configuration

Shared path and threshold settings live in scripts/config.py.

Important current settings:

  • PARQUET_CONTEXT_PATH: parquet window input path
  • N2C2_TEST_TXTS_DIR: preferred test note text source
  • MIN_N2C2_FREQUENCY_FOR_NOVEL = 2
  • SIDER_FREQUENCY_THRESHOLD = 0
  • LLM_MAX_FILES = 202
  • LLM_NOTE_TRUNCATION_LENGTH = 3000

Current Findings Summary

From the current full-batch parquet evaluation:

  • No threshold: no-SIDER performs better overall with F1 0.4836 versus SIDER 0.4673.
  • Starting at min-confidence 0.55, SIDER becomes better overall than no-SIDER.
  • Best tested SIDER operating point is the 0.75-0.80 plateau with precision 0.4700, recall 0.5670, and F1 0.5140.
  • Confidence values are quantized, so thresholds within 0.55-0.60, 0.65-0.70, and 0.75-0.80 produce identical retained predictions.

For detailed write-ups, see:

  • findings/eval.md

Notes and Caveats

  • Access to n2c2 data requires appropriate authorization.
  • The OpenRouter workflow will fail without OPENROUTER_API_KEY.
  • scripts/evaluate_results.py supports both single-file evaluation and paired SIDER versus no-SIDER comparison.

License

This repository is intended for academic and research use. Ensure that you have the necessary rights and approvals for all datasets and API services used with it.

About

A pipeline for detecting and validating adverse drug events by combining n2c2 clinical notes with SIDER drug–side effect data. Includes NLP-based entity and relation extraction, terminology normalization, cross-dataset linking, filtering, and evaluation tools for research in pharmacovigilance and clinical text mining.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors