Troubleshooting Index¶

A consolidated, searchable index of the issues that come up most often across local development, Supabase/auth, CI/CD gates, deployment, and the database. Each entry follows symptom → cause → fix. Use your browser's find (Ctrl/Cmd-F) to jump to a message.

Two environments

Paths differ per environment: test lives in /opt/app-name, production in /opt/app-name-prod. Substitute accordingly throughout.

Local dev¶

Symptom	Cause	Fix
`"Tests failed"` when running `git commit`	Backend not running, or a code change broke an API endpoint. The pre-commit hook runs tests and blocks the commit on failure.	Start the backend (`cd backend && uvicorn app.main:app --reload --port 5001`), see which test failed, fix it, re-stage, commit again. Emergency only: `git commit --no-verify`.
Husky hooks don't run at all on commit	Husky wasn't installed during `npm install`.	Run `npm run prepare` (or `npm run install-hooks`) from the repo root.
`"Connection refused"` errors during tests/dev	Backend server isn't running.	Start it in another terminal: `cd backend && uvicorn app.main:app --reload --port 5001`.
`"No auth token available"` warnings during tests	`TEST_USER_EMAIL` / `TEST_USER_PASSWORD` aren't set in `.env`.	Add the test credentials to your `.env`. Tests still pass — the auth-specific tests are skipped.
Variable fallbacks behaving unexpectedly	Resolution order between local `.env` and defaults is unclear.	Run the diagnostic: `cd backend && npm run test:config` to print how each variable resolves.

Use --no-verify sparingly

Bypassing the pre-commit hook ships untested code. Failed tests usually signal a real problem — fix the cause rather than skipping the gate.

See Developer Setup and the Environment Variable Matrix.

Supabase / Auth¶

Symptom	Cause	Fix
Test user cannot log in during smoke/E2E	`VITE_SUPABASE_URL` / `VITE_SUPABASE_ANON_KEY` point at a different project than the one where `TEST_USER_EMAIL` exists. Tests authenticate directly with Supabase.	Point both `VITE_*` values at the project that owns the test user.
`"Network Error"` in the browser	`VITE_API_URL` isn't set to the public HTTPS domain of the backend.	Set `VITE_API_URL` to `https://<your-domain>` (or `https://test.<your-domain>`).
Service-role operations unexpectedly blocked by RLS	App is using the anon key where it needs the service key.	Use `SUPABASE_SERVICE_KEY` (server-side only) for RLS-bypassing operations. Never prefix it with `VITE_`.
Supabase service key visible in the frontend bundle	The key was accidentally prefixed with `VITE_`, bundling it into shipped JS.	Remove the `VITE_` prefix immediately and rotate the key — it has been exposed.

See Supabase (Hosted) and Secrets & Environment.

CI/CD gate failures¶

The deploy job only runs after quality-gate (Sonar), E2E results, and security-audit succeed. Any failure blocks deployment.

Symptom	Cause	Fix
Sonar quality gate fails (`Quality Gate FAILED for commit …`)	New code introduced issues (coverage, duplications, bugs) that breach the gate conditions for the analysed commit.	Open the SonarCloud project, review the failing conditions printed in the job log, fix them, and push again. To disable the gate temporarily, set `vars.SONAR_ENABLED` to anything other than `true` (it then resolves as `skipped`).
`"No SonarCloud analysis found for commit … after scan"`	The scan didn't register an analysis for the exact `GITHUB_SHA` (timing, or a mismatched project key).	Re-run the job; the check waits/polls. Confirm `SONAR_TOKEN` is valid and the project key matches.
E2E gate fails: `"No E2E results file found"`	`tests/e2e/.results.json` is missing — E2E tests were never recorded for this branch.	Run `/test-e2e` or `/pr-ready` locally, then push.
E2E gate fails: `"E2E results SHA … is not reachable from HEAD"`	Recorded results belong to a commit that isn't an ancestor of HEAD (rebased/amended).	Re-run `/test-e2e` or `/pr-ready` to record results against the current branch tip.
E2E gate fails: `"Code changed since E2E tests ran"`	Non-doc/non-config source files changed after the last recorded E2E run.	Re-run E2E and push the updated `.results.json` together with the code.
E2E gate fails: `"E2E tests did not pass (status: …)"`	The recorded run itself failed.	Fix the failing E2E tests, re-record, push.
Security audit fails (`npm audit`)	A dependency has a high/critical advisory (`--audit-level=high`).	Update the offending package (`npm audit fix` or bump manually); re-run.
Trivy fails with a `CRITICAL` vulnerability	A built image or a filesystem dependency has a fixable CRITICAL CVE (`ignore-unfixed: true`, so only fixable ones fail).	Update the base image or dependency. If it's a verified false positive, add it to `.trivyignore`.
`"Verify Architecture Standards"` step fails	`scripts/check-architecture.js` found a layering/structure violation.	Read the script's output and refactor the offending module to satisfy the architecture rules.

See Pipeline Overview.

Deployment¶

Symptom	Cause	Fix
Deploy aborts: `"ERROR: .env file is missing on the server!"`	`/opt/app-name/.env` (or `-prod`) doesn't exist. The deploy script refuses to run without it.	SSH in and create the server `.env` with all runtime secrets (see Secrets Matrix). CI never creates this file.
`"failed to create connection … missing CF-Access-Client-Id"`	Cloudflare Access service-token credentials are absent or wrong in GitHub.	Verify `CF_ACCESS_CLIENT_ID` and `CF_ACCESS_CLIENT_SECRET` exist with no stray spaces, and that the service token is still valid in Cloudflare Zero Trust. See Cloudflare Tunnel.
`"Permission denied (publickey)"` on SSH	The deploy key in GitHub doesn't match the server's `authorized_keys`, or is malformed.	Confirm `SSH_KEY` (or `PROD_SSH_KEY`) includes the full `BEGIN/END` lines; check the public key is in `~/.ssh/authorized_keys` on the server.
`docker compose pull` hangs / times out	Insufficient disk space, or GHCR auth failed.	SSH in, run `df -h /` (need ~2GB+ free), `docker image prune -a -f`, then retry. Auth uses `GITHUB_TOKEN` + `GITHUB_ACTOR` from `.env.deploy`.
`"High disk usage detected … aggressive cleanup"` / disk full	Server root above 80%. The script runs `docker system prune -af --volumes` automatically.	If it persists, manually prune images and check for large logs/volumes; ensure old image tags are being cleaned.
`"Docker Compose Up failed or timed out!"` / unhealthy containers	A container failed its healthcheck within the 60s `--wait-timeout`.	The job prints the last 50 app log lines. SSH in: `cd /opt/app-name && docker compose ps`, `docker compose logs app --tail 100`, and confirm `.env` values are correct.
Health check fails after deploy / rollback	The app didn't return healthy on `/api/health`.	Inspect `docker compose logs app`; verify DB connectivity and required env vars; redeploy once fixed.

See Pipeline Overview and Cloudflare Tunnel.

Database¶

Symptom	Cause	Fix
Migrations hang or fail in CI/deploy	Migrations are running over the transaction pooler (port 6543), which can't hold the long-lived/prepared connections migrations need.	Run migrations against the direct connection (`DATABASE_URL_DIRECT`, port 5432). Keep the app on `DATABASE_URL` (6543).
`docker compose run --rm app alembic upgrade head` errors during deploy	Migration step failed — bad SQL, missing direct URL, or unreachable DB.	Check the migration logs; verify `DATABASE_URL_DIRECT` and `DB_SSL_MODE=require` are set in the server `.env`; fix the migration and redeploy.
Intermittent connection drops under load	App pointed at the direct connection instead of the pooler.	Use the pooler (`DATABASE_URL`, 6543) for normal app traffic; reserve direct (5432) for migrations and batch jobs.
SSL/TLS handshake errors connecting to Postgres	`DB_SSL_MODE` not set for a managed Postgres that requires TLS.	Set `DB_SSL_MODE=require`.

See the Environment Variable Matrix for the database variables and Supabase (Hosted) for pooler vs direct details.