Skip to content

Troubleshooting Index

A consolidated, searchable index of the issues that come up most often across local development, Supabase/auth, CI/CD gates, deployment, and the database. Each entry follows symptom → cause → fix. Use your browser's find (Ctrl/Cmd-F) to jump to a message.

Two environments

Paths differ per environment: test lives in /opt/app-name, production in /opt/app-name-prod. Substitute accordingly throughout.


Local dev

Symptom Cause Fix
"Tests failed" when running git commit Backend not running, or a code change broke an API endpoint. The pre-commit hook runs tests and blocks the commit on failure. Start the backend (cd backend && uvicorn app.main:app --reload --port 5001), see which test failed, fix it, re-stage, commit again. Emergency only: git commit --no-verify.
Husky hooks don't run at all on commit Husky wasn't installed during npm install. Run npm run prepare (or npm run install-hooks) from the repo root.
"Connection refused" errors during tests/dev Backend server isn't running. Start it in another terminal: cd backend && uvicorn app.main:app --reload --port 5001.
"No auth token available" warnings during tests TEST_USER_EMAIL / TEST_USER_PASSWORD aren't set in .env. Add the test credentials to your .env. Tests still pass — the auth-specific tests are skipped.
Variable fallbacks behaving unexpectedly Resolution order between local .env and defaults is unclear. Run the diagnostic: cd backend && npm run test:config to print how each variable resolves.

Use --no-verify sparingly

Bypassing the pre-commit hook ships untested code. Failed tests usually signal a real problem — fix the cause rather than skipping the gate.

See Developer Setup and the Environment Variable Matrix.


Supabase / Auth

Symptom Cause Fix
Test user cannot log in during smoke/E2E VITE_SUPABASE_URL / VITE_SUPABASE_ANON_KEY point at a different project than the one where TEST_USER_EMAIL exists. Tests authenticate directly with Supabase. Point both VITE_* values at the project that owns the test user.
"Network Error" in the browser VITE_API_URL isn't set to the public HTTPS domain of the backend. Set VITE_API_URL to https://<your-domain> (or https://test.<your-domain>).
Service-role operations unexpectedly blocked by RLS App is using the anon key where it needs the service key. Use SUPABASE_SERVICE_KEY (server-side only) for RLS-bypassing operations. Never prefix it with VITE_.
Supabase service key visible in the frontend bundle The key was accidentally prefixed with VITE_, bundling it into shipped JS. Remove the VITE_ prefix immediately and rotate the key — it has been exposed.

See Supabase (Hosted) and Secrets & Environment.


CI/CD gate failures

The deploy job only runs after quality-gate (Sonar), E2E results, and security-audit succeed. Any failure blocks deployment.

Symptom Cause Fix
Sonar quality gate fails (Quality Gate FAILED for commit …) New code introduced issues (coverage, duplications, bugs) that breach the gate conditions for the analysed commit. Open the SonarCloud project, review the failing conditions printed in the job log, fix them, and push again. To disable the gate temporarily, set vars.SONAR_ENABLED to anything other than true (it then resolves as skipped).
"No SonarCloud analysis found for commit … after scan" The scan didn't register an analysis for the exact GITHUB_SHA (timing, or a mismatched project key). Re-run the job; the check waits/polls. Confirm SONAR_TOKEN is valid and the project key matches.
E2E gate fails: "No E2E results file found" tests/e2e/.results.json is missing — E2E tests were never recorded for this branch. Run /test-e2e or /pr-ready locally, then push.
E2E gate fails: "E2E results SHA … is not reachable from HEAD" Recorded results belong to a commit that isn't an ancestor of HEAD (rebased/amended). Re-run /test-e2e or /pr-ready to record results against the current branch tip.
E2E gate fails: "Code changed since E2E tests ran" Non-doc/non-config source files changed after the last recorded E2E run. Re-run E2E and push the updated .results.json together with the code.
E2E gate fails: "E2E tests did not pass (status: …)" The recorded run itself failed. Fix the failing E2E tests, re-record, push.
Security audit fails (npm audit) A dependency has a high/critical advisory (--audit-level=high). Update the offending package (npm audit fix or bump manually); re-run.
Trivy fails with a CRITICAL vulnerability A built image or a filesystem dependency has a fixable CRITICAL CVE (ignore-unfixed: true, so only fixable ones fail). Update the base image or dependency. If it's a verified false positive, add it to .trivyignore.
"Verify Architecture Standards" step fails scripts/check-architecture.js found a layering/structure violation. Read the script's output and refactor the offending module to satisfy the architecture rules.

See Pipeline Overview.


Deployment

Symptom Cause Fix
Deploy aborts: "ERROR: .env file is missing on the server!" /opt/app-name/.env (or -prod) doesn't exist. The deploy script refuses to run without it. SSH in and create the server .env with all runtime secrets (see Secrets Matrix). CI never creates this file.
"failed to create connection … missing CF-Access-Client-Id" Cloudflare Access service-token credentials are absent or wrong in GitHub. Verify CF_ACCESS_CLIENT_ID and CF_ACCESS_CLIENT_SECRET exist with no stray spaces, and that the service token is still valid in Cloudflare Zero Trust. See Cloudflare Tunnel.
"Permission denied (publickey)" on SSH The deploy key in GitHub doesn't match the server's authorized_keys, or is malformed. Confirm SSH_KEY (or PROD_SSH_KEY) includes the full BEGIN/END lines; check the public key is in ~/.ssh/authorized_keys on the server.
docker compose pull hangs / times out Insufficient disk space, or GHCR auth failed. SSH in, run df -h / (need ~2GB+ free), docker image prune -a -f, then retry. Auth uses GITHUB_TOKEN + GITHUB_ACTOR from .env.deploy.
"High disk usage detected … aggressive cleanup" / disk full Server root above 80%. The script runs docker system prune -af --volumes automatically. If it persists, manually prune images and check for large logs/volumes; ensure old image tags are being cleaned.
"Docker Compose Up failed or timed out!" / unhealthy containers A container failed its healthcheck within the 60s --wait-timeout. The job prints the last 50 app log lines. SSH in: cd /opt/app-name && docker compose ps, docker compose logs app --tail 100, and confirm .env values are correct.
Health check fails after deploy / rollback The app didn't return healthy on /api/health. Inspect docker compose logs app; verify DB connectivity and required env vars; redeploy once fixed.

See Pipeline Overview and Cloudflare Tunnel.


Database

Symptom Cause Fix
Migrations hang or fail in CI/deploy Migrations are running over the transaction pooler (port 6543), which can't hold the long-lived/prepared connections migrations need. Run migrations against the direct connection (DATABASE_URL_DIRECT, port 5432). Keep the app on DATABASE_URL (6543).
docker compose run --rm app alembic upgrade head errors during deploy Migration step failed — bad SQL, missing direct URL, or unreachable DB. Check the migration logs; verify DATABASE_URL_DIRECT and DB_SSL_MODE=require are set in the server .env; fix the migration and redeploy.
Intermittent connection drops under load App pointed at the direct connection instead of the pooler. Use the pooler (DATABASE_URL, 6543) for normal app traffic; reserve direct (5432) for migrations and batch jobs.
SSL/TLS handshake errors connecting to Postgres DB_SSL_MODE not set for a managed Postgres that requires TLS. Set DB_SSL_MODE=require.

See the Environment Variable Matrix for the database variables and Supabase (Hosted) for pooler vs direct details.