Troubleshooting Index¶
A consolidated, searchable index of the issues that come up most often across local development, Supabase/auth, CI/CD gates, deployment, and the database. Each entry follows symptom → cause → fix. Use your browser's find (Ctrl/Cmd-F) to jump to a message.
Two environments
Paths differ per environment: test lives in /opt/app-name, production in /opt/app-name-prod. Substitute accordingly throughout.
Local dev¶
| Symptom | Cause | Fix |
|---|---|---|
"Tests failed" when running git commit |
Backend not running, or a code change broke an API endpoint. The pre-commit hook runs tests and blocks the commit on failure. | Start the backend (cd backend && uvicorn app.main:app --reload --port 5001), see which test failed, fix it, re-stage, commit again. Emergency only: git commit --no-verify. |
| Husky hooks don't run at all on commit | Husky wasn't installed during npm install. |
Run npm run prepare (or npm run install-hooks) from the repo root. |
"Connection refused" errors during tests/dev |
Backend server isn't running. | Start it in another terminal: cd backend && uvicorn app.main:app --reload --port 5001. |
"No auth token available" warnings during tests |
TEST_USER_EMAIL / TEST_USER_PASSWORD aren't set in .env. |
Add the test credentials to your .env. Tests still pass — the auth-specific tests are skipped. |
| Variable fallbacks behaving unexpectedly | Resolution order between local .env and defaults is unclear. |
Run the diagnostic: cd backend && npm run test:config to print how each variable resolves. |
Use --no-verify sparingly
Bypassing the pre-commit hook ships untested code. Failed tests usually signal a real problem — fix the cause rather than skipping the gate.
See Developer Setup and the Environment Variable Matrix.
Supabase / Auth¶
| Symptom | Cause | Fix |
|---|---|---|
| Test user cannot log in during smoke/E2E | VITE_SUPABASE_URL / VITE_SUPABASE_ANON_KEY point at a different project than the one where TEST_USER_EMAIL exists. Tests authenticate directly with Supabase. |
Point both VITE_* values at the project that owns the test user. |
"Network Error" in the browser |
VITE_API_URL isn't set to the public HTTPS domain of the backend. |
Set VITE_API_URL to https://<your-domain> (or https://test.<your-domain>). |
| Service-role operations unexpectedly blocked by RLS | App is using the anon key where it needs the service key. | Use SUPABASE_SERVICE_KEY (server-side only) for RLS-bypassing operations. Never prefix it with VITE_. |
| Supabase service key visible in the frontend bundle | The key was accidentally prefixed with VITE_, bundling it into shipped JS. |
Remove the VITE_ prefix immediately and rotate the key — it has been exposed. |
See Supabase (Hosted) and Secrets & Environment.
CI/CD gate failures¶
The deploy job only runs after quality-gate (Sonar), E2E results, and security-audit succeed. Any failure blocks deployment.
| Symptom | Cause | Fix |
|---|---|---|
Sonar quality gate fails (Quality Gate FAILED for commit …) |
New code introduced issues (coverage, duplications, bugs) that breach the gate conditions for the analysed commit. | Open the SonarCloud project, review the failing conditions printed in the job log, fix them, and push again. To disable the gate temporarily, set vars.SONAR_ENABLED to anything other than true (it then resolves as skipped). |
"No SonarCloud analysis found for commit … after scan" |
The scan didn't register an analysis for the exact GITHUB_SHA (timing, or a mismatched project key). |
Re-run the job; the check waits/polls. Confirm SONAR_TOKEN is valid and the project key matches. |
E2E gate fails: "No E2E results file found" |
tests/e2e/.results.json is missing — E2E tests were never recorded for this branch. |
Run /test-e2e or /pr-ready locally, then push. |
E2E gate fails: "E2E results SHA … is not reachable from HEAD" |
Recorded results belong to a commit that isn't an ancestor of HEAD (rebased/amended). | Re-run /test-e2e or /pr-ready to record results against the current branch tip. |
E2E gate fails: "Code changed since E2E tests ran" |
Non-doc/non-config source files changed after the last recorded E2E run. | Re-run E2E and push the updated .results.json together with the code. |
E2E gate fails: "E2E tests did not pass (status: …)" |
The recorded run itself failed. | Fix the failing E2E tests, re-record, push. |
Security audit fails (npm audit) |
A dependency has a high/critical advisory (--audit-level=high). |
Update the offending package (npm audit fix or bump manually); re-run. |
Trivy fails with a CRITICAL vulnerability |
A built image or a filesystem dependency has a fixable CRITICAL CVE (ignore-unfixed: true, so only fixable ones fail). |
Update the base image or dependency. If it's a verified false positive, add it to .trivyignore. |
"Verify Architecture Standards" step fails |
scripts/check-architecture.js found a layering/structure violation. |
Read the script's output and refactor the offending module to satisfy the architecture rules. |
See Pipeline Overview.
Deployment¶
| Symptom | Cause | Fix |
|---|---|---|
Deploy aborts: "ERROR: .env file is missing on the server!" |
/opt/app-name/.env (or -prod) doesn't exist. The deploy script refuses to run without it. |
SSH in and create the server .env with all runtime secrets (see Secrets Matrix). CI never creates this file. |
"failed to create connection … missing CF-Access-Client-Id" |
Cloudflare Access service-token credentials are absent or wrong in GitHub. | Verify CF_ACCESS_CLIENT_ID and CF_ACCESS_CLIENT_SECRET exist with no stray spaces, and that the service token is still valid in Cloudflare Zero Trust. See Cloudflare Tunnel. |
"Permission denied (publickey)" on SSH |
The deploy key in GitHub doesn't match the server's authorized_keys, or is malformed. |
Confirm SSH_KEY (or PROD_SSH_KEY) includes the full BEGIN/END lines; check the public key is in ~/.ssh/authorized_keys on the server. |
docker compose pull hangs / times out |
Insufficient disk space, or GHCR auth failed. | SSH in, run df -h / (need ~2GB+ free), docker image prune -a -f, then retry. Auth uses GITHUB_TOKEN + GITHUB_ACTOR from .env.deploy. |
"High disk usage detected … aggressive cleanup" / disk full |
Server root above 80%. The script runs docker system prune -af --volumes automatically. |
If it persists, manually prune images and check for large logs/volumes; ensure old image tags are being cleaned. |
"Docker Compose Up failed or timed out!" / unhealthy containers |
A container failed its healthcheck within the 60s --wait-timeout. |
The job prints the last 50 app log lines. SSH in: cd /opt/app-name && docker compose ps, docker compose logs app --tail 100, and confirm .env values are correct. |
| Health check fails after deploy / rollback | The app didn't return healthy on /api/health. |
Inspect docker compose logs app; verify DB connectivity and required env vars; redeploy once fixed. |
See Pipeline Overview and Cloudflare Tunnel.
Database¶
| Symptom | Cause | Fix |
|---|---|---|
| Migrations hang or fail in CI/deploy | Migrations are running over the transaction pooler (port 6543), which can't hold the long-lived/prepared connections migrations need. | Run migrations against the direct connection (DATABASE_URL_DIRECT, port 5432). Keep the app on DATABASE_URL (6543). |
docker compose run --rm app alembic upgrade head errors during deploy |
Migration step failed — bad SQL, missing direct URL, or unreachable DB. | Check the migration logs; verify DATABASE_URL_DIRECT and DB_SSL_MODE=require are set in the server .env; fix the migration and redeploy. |
| Intermittent connection drops under load | App pointed at the direct connection instead of the pooler. | Use the pooler (DATABASE_URL, 6543) for normal app traffic; reserve direct (5432) for migrations and batch jobs. |
| SSL/TLS handshake errors connecting to Postgres | DB_SSL_MODE not set for a managed Postgres that requires TLS. |
Set DB_SSL_MODE=require. |
See the Environment Variable Matrix for the database variables and Supabase (Hosted) for pooler vs direct details.