Common Issues¶

Solutions to frequently encountered problems with the New Hires Reporting System.

Service Connection Issues¶

"Backend API is not available"¶

Symptoms: Frontend shows error message, can't upload or validate files

Solutions:

# Check service status
docker-compose -f docker-compose.prod.yml ps

# Check backend health
curl http://localhost:8000/health

# Restart backend
docker-compose -f docker-compose.prod.yml restart backend

# Check backend logs for errors
docker-compose -f docker-compose.prod.yml logs backend --tail=50

Common causes: - Backend service crashed → Check logs for Python errors - Database connection failed → Verify database is running - Port 8000 conflict → Check if another service is using the port

"Workers not processing jobs"¶

Symptoms: Jobs stuck in pending status, no corrections happening

Solutions:

# Check worker status
docker-compose -f docker-compose.prod.yml ps workers

# Check worker logs
docker-compose -f docker-compose.prod.yml logs workers --tail=50

# Look for AWS Bedrock errors
docker logs newhires-workers | grep -i "error\|exception"

# Restart workers
docker-compose -f docker-compose.prod.yml restart workers

Common causes: - AWS credentials invalid → Check .env file has valid AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY - Bedrock access denied → Verify IAM policy includes bedrock:InvokeModel permission - Model access not enabled → Check AWS Bedrock console for model access - Throttling → Reduce MAX_CONCURRENT_BEDROCK_CALLS in .env

See AWS Bedrock Error Troubleshooting for detailed Bedrock error solutions.

AWS Bedrock Issues¶

"AccessDeniedException" in worker logs¶

Symptoms: Workers show AWS access denied errors

Solutions:

Verify AWS credentials in .env:

docker exec newhires-workers env | grep AWS

Check IAM permissions:
Go to AWS Console → IAM → Users → Your User → Permissions
Ensure policy includes bedrock:InvokeModel action
See AWS Bedrock Setup for correct IAM policy

Test credentials:

docker exec newhires-workers python3 -c "
import boto3
print(boto3.client('sts').get_caller_identity())
"

Restart workers:

docker-compose -f docker-compose.prod.yml restart workers

"Could not resolve foundation model"¶

Symptoms: Workers can't find Claude or Llama model

Solutions:

Enable model access in AWS Console:
Go to: https://console.aws.amazon.com/bedrock/home?region=us-east-1#/modelaccess
Click "Manage model access"
Enable "Claude Sonnet 4.5" and/or "Llama 4 Scout"
Wait for "Access granted" status

Verify model ID in .env:

grep BEDROCK_MODEL_ID .env
# Should be empty (uses default) or:
# BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0

Restart workers:

docker-compose -f docker-compose.prod.yml restart workers

Database Issues¶

"Database connection failed"¶

Symptoms: Backend can't connect to PostgreSQL

Solutions:

# Check database is running
docker ps | grep newhires-db

# Check database health
docker exec newhires-db pg_isready -U newhires

# Check database logs
docker logs newhires-db --tail=50

# Verify password in .env
grep POSTGRES_PASSWORD .env

# Restart database (WARNING: may cause brief downtime)
docker-compose -f docker-compose.prod.yml restart db

"Database migration failed"¶

Symptoms: Backend won't start, shows Alembic errors

Solutions:

# Check current migration status
docker exec newhires-backend alembic current

# View migration history
docker exec newhires-backend alembic history

# Retry migration
docker exec newhires-backend alembic upgrade head

# If migration is stuck, check logs
docker logs newhires-backend | grep alembic

Frontend Issues¶

"Frontend not loading"¶

Symptoms: Browser shows blank page or connection refused

Solutions:

# Check frontend is running
docker ps | grep newhires-frontend

# Check frontend logs
docker logs newhires-frontend --tail=50

# Test frontend locally
curl -I http://localhost:8080

# Check nginx config
docker exec newhires-frontend cat /etc/nginx/conf.d/default.conf

# Restart frontend
docker-compose -f docker-compose.prod.yml restart frontend

"API calls failing from frontend"¶

Symptoms: Frontend loads but shows API errors

Solutions:

Check VITE_API_URL in .env:

grep VITE_API_URL .env
# Should be: http://localhost:8000/api/v1
# Or your custom domain: https://api.your-domain.com/api/v1

Rebuild frontend if you changed VITE_API_URL:

docker-compose -f docker-compose.prod.yml down
docker-compose -f docker-compose.prod.yml pull
docker-compose -f docker-compose.prod.yml up -d

Check backend is reachable from frontend:

docker exec newhires-frontend wget -O- http://backend:8000/health

Validation Issues¶

"INVALID_LENGTH" on every line¶

Cause: Line ending issues (Windows CRLF vs Unix LF)

Solution: System handles this automatically. If persists:

# Check file encoding
file your_file.txt

# Convert line endings (if needed)
dos2unix your_file.txt

"Unknown record type"¶

Cause: File contains header codes not configured in the system

Solution:

Check which record type is failing in validation results
Verify file format matches one of the supported state formats
Contact support if you need a new state format added

Performance Issues¶

"Corrections taking too long"¶

Symptoms: Jobs sit in processing status for extended time

Solutions:

Check worker load:
```
docker stats newhires-workers
```

Scale up workers for faster processing:

docker-compose -f docker-compose.prod.yml up -d --scale workers=3

Increase concurrency in .env:

MAX_CONCURRENT_BEDROCK_CALLS=5  # Increase from 2

Warning: Higher concurrency = higher AWS costs

Switch to faster model:

# Edit .env
BEDROCK_MODEL_ID=us.meta.llama4-scout-17b-instruct-v1:0

Llama 4 Scout is ~3x faster than Claude but slightly less accurate

"High AWS Bedrock costs"¶

Symptoms: Unexpected AWS charges

Solutions:

Check token usage in logs:

docker logs newhires-workers | grep "tokens used"

Reduce concurrent calls:

# Edit .env
MAX_CONCURRENT_BEDROCK_CALLS=1  # Reduce from 2

Reduce retry attempts:

# Edit .env
MAX_AI_ATTEMPTS=3  # Reduce from 5

Switch to cheaper model:

# Edit .env
BEDROCK_MODEL_ID=us.meta.llama4-scout-17b-instruct-v1:0

Set up AWS billing alerts:
AWS Console → Billing → Budgets
Create alert for Bedrock usage

Docker Issues¶

"Permission denied" errors¶

Solutions:

# Add user to docker group (Linux)
sudo usermod -aG docker $USER

# Log out and back in, then verify
docker ps

"Port already in use"¶

Symptoms: Can't start services, port conflict error

Solutions:

# Find what's using the port
sudo lsof -i :8000  # Backend
sudo lsof -i :8080  # Frontend

# Kill the process or change ports in docker-compose.prod.yml

"Out of disk space"¶

Symptoms: Containers won't start, disk space errors

Solutions:

# Check disk usage
docker system df

# Clean up unused images and containers
docker system prune -a

# Remove old volumes (CAREFUL - deletes data!)
docker volume prune

Environment Variable Issues¶

"Missing required environment variable"¶

Symptoms: Services fail to start with env var errors

Solutions:

Verify .env file exists:
```
ls -la .env
```

Check required variables are set:

grep -E "IMAGE_TAG|AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|POSTGRES_PASSWORD" .env

Ensure no placeholder values:

grep -E "EXAMPLE|your_|changeme" .env
# This should return NOTHING

Verify .env is in same directory as docker-compose.prod.yml:
```
ls -la
# Should show both files
```

Restart services after fixing .env:

docker-compose -f docker-compose.prod.yml down
docker-compose -f docker-compose.prod.yml up -d

Quick Diagnostics¶

Run this comprehensive check to diagnose multiple issues:

#!/bin/bash
# Save as: quick-check.sh

echo "=== Service Status ==="
docker-compose -f docker-compose.prod.yml ps

echo -e "\n=== Backend Health ==="
curl -s http://localhost:8000/health

echo -e "\n=== Frontend Health ==="
curl -I http://localhost:8080 2>&1 | head -1

echo -e "\n=== Database Health ==="
docker exec newhires-db pg_isready -U newhires

echo -e "\n=== Worker Status ==="
docker logs newhires-workers --tail=10

echo -e "\n=== Recent Errors ==="
docker logs newhires-workers --tail=50 | grep -i error | tail -10

echo -e "\n=== AWS Credentials ==="
docker exec newhires-workers env | grep AWS_REGION

echo -e "\n=== Job Queue Status ==="
docker exec newhires-db psql -U newhires -d newhires -c \
  "SELECT status, COUNT(*) FROM correction_jobs GROUP BY status;"

echo -e "\n=== Resource Usage ==="
docker stats --no-stream

Make executable and run:

chmod +x quick-check.sh
./quick-check.sh

Getting More Help¶

Detailed Troubleshooting Guides¶

AWS Bedrock Errors - Comprehensive Bedrock troubleshooting
Docker Problems - Docker-specific issues
Logs & Debugging - How to read and analyze logs

Useful Commands¶

# View all logs
docker-compose -f docker-compose.prod.yml logs -f

# Restart everything
docker-compose -f docker-compose.prod.yml restart

# Fresh start (keeps data)
docker-compose -f docker-compose.prod.yml down
docker-compose -f docker-compose.prod.yml up -d

# Check environment
docker exec newhires-workers env | grep -E "AWS|BEDROCK|POSTGRES"

# Export logs for support
docker logs newhires-backend > backend.log
docker logs newhires-workers > workers.log
docker logs newhires-frontend > frontend.log

Prevention Tips¶

Monitor worker logs regularly:
```
docker logs newhires-workers --tail=50
```
Set up AWS billing alerts to avoid surprise costs

Backup database regularly:

docker exec newhires-db pg_dump -U newhires newhires > backup_$(date +%Y%m%d).sql

Keep .env file secure:
```
chmod 600 .env
```
Update regularly:
Check for new IMAGE_TAG from development team
Rotate AWS credentials every 90 days
Test after changes:
After updating .env, restart services and verify health endpoints
After scaling workers, monitor AWS costs

Still Having Issues?¶

If problems persist:

Collect diagnostic information:

# Run the quick-check.sh script above
./quick-check.sh > diagnostics.txt

Check documentation:
Deployment Overview
Environment Variables
AWS Bedrock Setup
Review recent changes:
Did you update .env recently?
Did you change IMAGE_TAG?
Did you modify docker-compose.prod.yml?
Contact support with diagnostic logs and details of what changed