Light

Project Detail

PhyloDIVaS

Web platform for running genome-wide selection pressure (dN/dS) analyses. Users choose species, the app builds orthogroups, generates codon alignments, and runs HyPhy analyses in Galaxy—then hydrates per-orthogroup results back into a searchable UI.

React.NET 6Galaxy APIPostgreSQLAWS LightsailNCBI E-utilities

Deep dive

Galaxy workflow

Built the analysis layer that bridges user-selected genomes to Galaxy: queueing jobs across many orthogroups, tracking long-running executions, and surfacing concise progress + results back in-app.

Designed and implemented every analysis feature in the application, owning the frontend UX, backend APIs, and Galaxy workflow integration end-to-end.

  • Automated ortholog detection, codon alignments, and selection pressure analysis across many species
  • End-to-end Galaxy workflow orchestration via REST APIs (submit → poll → parse → persist → display)
  • Real-time job tracking UI with batching, retries/backoff, and clear status updates for large orthogroup runs

Role

Sole analysis engineer (implemented the complete analysis stack end-to-end)

Timeline

Multi-release build with iterative workflow tuning

Galaxy Workflow Snapshot

Hover to zoom
Galaxy workflow visualization for PhyloDIVaS dN/dS pipeline

Workflow chains ortholog identification, codon alignment (MAFFT/MUSCLE/ClustalW), dN/dS estimation (HyPhy-BUSTED), and aggregation of results for multi-species comparison.

Species selection & session resume

First step: choose species to analyze, pick alignment tooling, or resume an existing Galaxy history to continue work.

Orthogroup analysis flow

Demonstrates how users assemble orthogroups, inspect gene members, and stage runs before launching HyPhy analysis.

HyPhy results in-app

Shows the per-orthogroup outputs from the primary HyPhy tool surfaced directly in the UI for quick review.

Responsibilities

  • • Built the analysis backend: Galaxy job orchestration endpoints (submit → poll → parse → persist) with per-step logs to keep runs debuggable.
  • • Designed and implemented the analysis UI with resumable sessions, live job status updates, and exportable analysis results.
  • • Designed custom Galaxy workflow to run genomes through a sophisticated data pipeline using parallelized tools mapped over dataset collections.

Challenges solved

  • • Made long-running external compute feel productized (polling cadence, timeouts, partial failures).
  • • Prevented UI + API overload during large runs (batch sizing, retries, and concise status surfacing).
  • • Normalized scientific tool outputs into stable UI contracts for quick review and comparison.

Why it matters

Researchers can launch genome-wide selection scans without wiring together multiple bioinformatics tools by hand. The platform keeps datasets, alignment settings, and run history organized so results stay reproducible. Thoroughly tested for accuracy and performance, PhyloDIVaS can analyze thousands of gene groups across dozens of species at a time.

Workflow designObservabilityReliabilityBioinformatics