Skip to content

Common Issues

Solutions to frequently encountered problems with the New Hires Reporting System.

Service Connection Issues

"Backend API is not available"

Symptoms: Frontend shows error message, can't upload or validate files

Solutions:

# Check service status
docker-compose -f docker-compose.prod.yml ps

# Check backend health
curl http://localhost:8000/health

# Restart backend
docker-compose -f docker-compose.prod.yml restart backend

# Check backend logs for errors
docker-compose -f docker-compose.prod.yml logs backend --tail=50

Common causes: - Backend service crashed → Check logs for Python errors - Database connection failed → Verify database is running - Port 8000 conflict → Check if another service is using the port


"Workers not processing jobs"

Symptoms: Jobs stuck in pending status, no corrections happening

Solutions:

# Check worker status
docker-compose -f docker-compose.prod.yml ps workers

# Check worker logs
docker-compose -f docker-compose.prod.yml logs workers --tail=50

# Look for AWS Bedrock errors
docker logs newhires-workers | grep -i "error\|exception"

# Restart workers
docker-compose -f docker-compose.prod.yml restart workers

Common causes: - AWS credentials invalid → Check .env file has valid AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY - Bedrock access denied → Verify IAM policy includes bedrock:InvokeModel permission - Model access not enabled → Check AWS Bedrock console for model access - Throttling → Reduce MAX_CONCURRENT_BEDROCK_CALLS in .env

See AWS Bedrock Error Troubleshooting for detailed Bedrock error solutions.


AWS Bedrock Issues

"AccessDeniedException" in worker logs

Symptoms: Workers show AWS access denied errors

Solutions:

  1. Verify AWS credentials in .env:

    docker exec newhires-workers env | grep AWS
    

  2. Check IAM permissions:

  3. Go to AWS Console → IAM → Users → Your User → Permissions
  4. Ensure policy includes bedrock:InvokeModel action
  5. See AWS Bedrock Setup for correct IAM policy

  6. Test credentials:

    docker exec newhires-workers python3 -c "
    import boto3
    print(boto3.client('sts').get_caller_identity())
    "
    

  7. Restart workers:

    docker-compose -f docker-compose.prod.yml restart workers
    


"Could not resolve foundation model"

Symptoms: Workers can't find Claude or Llama model

Solutions:

  1. Enable model access in AWS Console:
  2. Go to: https://console.aws.amazon.com/bedrock/home?region=us-east-1#/modelaccess
  3. Click "Manage model access"
  4. Enable "Claude Sonnet 4.5" and/or "Llama 4 Scout"
  5. Wait for "Access granted" status

  6. Verify model ID in .env:

    grep BEDROCK_MODEL_ID .env
    # Should be empty (uses default) or:
    # BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0
    

  7. Restart workers:

    docker-compose -f docker-compose.prod.yml restart workers
    


Database Issues

"Database connection failed"

Symptoms: Backend can't connect to PostgreSQL

Solutions:

# Check database is running
docker ps | grep newhires-db

# Check database health
docker exec newhires-db pg_isready -U newhires

# Check database logs
docker logs newhires-db --tail=50

# Verify password in .env
grep POSTGRES_PASSWORD .env

# Restart database (WARNING: may cause brief downtime)
docker-compose -f docker-compose.prod.yml restart db

"Database migration failed"

Symptoms: Backend won't start, shows Alembic errors

Solutions:

# Check current migration status
docker exec newhires-backend alembic current

# View migration history
docker exec newhires-backend alembic history

# Retry migration
docker exec newhires-backend alembic upgrade head

# If migration is stuck, check logs
docker logs newhires-backend | grep alembic

Frontend Issues

"Frontend not loading"

Symptoms: Browser shows blank page or connection refused

Solutions:

# Check frontend is running
docker ps | grep newhires-frontend

# Check frontend logs
docker logs newhires-frontend --tail=50

# Test frontend locally
curl -I http://localhost:8080

# Check nginx config
docker exec newhires-frontend cat /etc/nginx/conf.d/default.conf

# Restart frontend
docker-compose -f docker-compose.prod.yml restart frontend

"API calls failing from frontend"

Symptoms: Frontend loads but shows API errors

Solutions:

  1. Check VITE_API_URL in .env:

    grep VITE_API_URL .env
    # Should be: http://localhost:8000/api/v1
    # Or your custom domain: https://api.your-domain.com/api/v1
    

  2. Rebuild frontend if you changed VITE_API_URL:

    docker-compose -f docker-compose.prod.yml down
    docker-compose -f docker-compose.prod.yml pull
    docker-compose -f docker-compose.prod.yml up -d
    

  3. Check backend is reachable from frontend:

    docker exec newhires-frontend wget -O- http://backend:8000/health
    


Validation Issues

"INVALID_LENGTH" on every line

Cause: Line ending issues (Windows CRLF vs Unix LF)

Solution: System handles this automatically. If persists:

# Check file encoding
file your_file.txt

# Convert line endings (if needed)
dos2unix your_file.txt

"Unknown record type"

Cause: File contains header codes not configured in the system

Solution:

  1. Check which record type is failing in validation results
  2. Verify file format matches one of the supported state formats
  3. Contact support if you need a new state format added

Performance Issues

"Corrections taking too long"

Symptoms: Jobs sit in processing status for extended time

Solutions:

  1. Check worker load:

    docker stats newhires-workers
    

  2. Scale up workers for faster processing:

    docker-compose -f docker-compose.prod.yml up -d --scale workers=3
    

  3. Increase concurrency in .env:

    MAX_CONCURRENT_BEDROCK_CALLS=5  # Increase from 2
    

Warning: Higher concurrency = higher AWS costs

  1. Switch to faster model:
    # Edit .env
    BEDROCK_MODEL_ID=us.meta.llama4-scout-17b-instruct-v1:0
    

Llama 4 Scout is ~3x faster than Claude but slightly less accurate


"High AWS Bedrock costs"

Symptoms: Unexpected AWS charges

Solutions:

  1. Check token usage in logs:

    docker logs newhires-workers | grep "tokens used"
    

  2. Reduce concurrent calls:

    # Edit .env
    MAX_CONCURRENT_BEDROCK_CALLS=1  # Reduce from 2
    

  3. Reduce retry attempts:

    # Edit .env
    MAX_AI_ATTEMPTS=3  # Reduce from 5
    

  4. Switch to cheaper model:

    # Edit .env
    BEDROCK_MODEL_ID=us.meta.llama4-scout-17b-instruct-v1:0
    

  5. Set up AWS billing alerts:

  6. AWS Console → Billing → Budgets
  7. Create alert for Bedrock usage

Docker Issues

"Permission denied" errors

Solutions:

# Add user to docker group (Linux)
sudo usermod -aG docker $USER

# Log out and back in, then verify
docker ps

"Port already in use"

Symptoms: Can't start services, port conflict error

Solutions:

# Find what's using the port
sudo lsof -i :8000  # Backend
sudo lsof -i :8080  # Frontend

# Kill the process or change ports in docker-compose.prod.yml

"Out of disk space"

Symptoms: Containers won't start, disk space errors

Solutions:

# Check disk usage
docker system df

# Clean up unused images and containers
docker system prune -a

# Remove old volumes (CAREFUL - deletes data!)
docker volume prune

Environment Variable Issues

"Missing required environment variable"

Symptoms: Services fail to start with env var errors

Solutions:

  1. Verify .env file exists:

    ls -la .env
    

  2. Check required variables are set:

    grep -E "IMAGE_TAG|AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|POSTGRES_PASSWORD" .env
    

  3. Ensure no placeholder values:

    grep -E "EXAMPLE|your_|changeme" .env
    # This should return NOTHING
    

  4. Verify .env is in same directory as docker-compose.prod.yml:

    ls -la
    # Should show both files
    

  5. Restart services after fixing .env:

    docker-compose -f docker-compose.prod.yml down
    docker-compose -f docker-compose.prod.yml up -d
    


Quick Diagnostics

Run this comprehensive check to diagnose multiple issues:

#!/bin/bash
# Save as: quick-check.sh

echo "=== Service Status ==="
docker-compose -f docker-compose.prod.yml ps

echo -e "\n=== Backend Health ==="
curl -s http://localhost:8000/health

echo -e "\n=== Frontend Health ==="
curl -I http://localhost:8080 2>&1 | head -1

echo -e "\n=== Database Health ==="
docker exec newhires-db pg_isready -U newhires

echo -e "\n=== Worker Status ==="
docker logs newhires-workers --tail=10

echo -e "\n=== Recent Errors ==="
docker logs newhires-workers --tail=50 | grep -i error | tail -10

echo -e "\n=== AWS Credentials ==="
docker exec newhires-workers env | grep AWS_REGION

echo -e "\n=== Job Queue Status ==="
docker exec newhires-db psql -U newhires -d newhires -c \
  "SELECT status, COUNT(*) FROM correction_jobs GROUP BY status;"

echo -e "\n=== Resource Usage ==="
docker stats --no-stream

Make executable and run:

chmod +x quick-check.sh
./quick-check.sh


Getting More Help

Detailed Troubleshooting Guides

Useful Commands

# View all logs
docker-compose -f docker-compose.prod.yml logs -f

# Restart everything
docker-compose -f docker-compose.prod.yml restart

# Fresh start (keeps data)
docker-compose -f docker-compose.prod.yml down
docker-compose -f docker-compose.prod.yml up -d

# Check environment
docker exec newhires-workers env | grep -E "AWS|BEDROCK|POSTGRES"

# Export logs for support
docker logs newhires-backend > backend.log
docker logs newhires-workers > workers.log
docker logs newhires-frontend > frontend.log

Prevention Tips

  1. Monitor worker logs regularly:

    docker logs newhires-workers --tail=50
    

  2. Set up AWS billing alerts to avoid surprise costs

  3. Backup database regularly:

    docker exec newhires-db pg_dump -U newhires newhires > backup_$(date +%Y%m%d).sql
    

  4. Keep .env file secure:

    chmod 600 .env
    

  5. Update regularly:

  6. Check for new IMAGE_TAG from development team
  7. Rotate AWS credentials every 90 days

  8. Test after changes:

  9. After updating .env, restart services and verify health endpoints
  10. After scaling workers, monitor AWS costs

Still Having Issues?

If problems persist:

  1. Collect diagnostic information:

    # Run the quick-check.sh script above
    ./quick-check.sh > diagnostics.txt
    

  2. Check documentation:

  3. Deployment Overview
  4. Environment Variables
  5. AWS Bedrock Setup

  6. Review recent changes:

  7. Did you update .env recently?
  8. Did you change IMAGE_TAG?
  9. Did you modify docker-compose.prod.yml?

  10. Contact support with diagnostic logs and details of what changed