Skip to content

File Validation

Learn how to validate employer reporting files using the New Hires Reporting System.

Workflow Overview

graph LR
    A[Upload File] --> B[Select State]
    B --> C[Validate]
    C --> D{Errors Found?}
    D -->|Yes| E[Review Errors]
    D -->|No| F[Download Valid File]
    E --> G[Request Corrections]
    G --> H[Download Corrected File]

Step-by-Step Guide

1. Access the Application

Open your browser and navigate to:

2. Upload File

  1. Click "Browse files" or drag-and-drop
  2. Select your employer reporting file
  3. Supported formats:
  4. .txt (fixed-width text)
  5. Louisiana: 1132 characters per record
  6. Texas: 801 characters per record
  7. Colorado: 860 characters per record
  8. Ohio: Variable-length format

3. Select State Format

Choose the appropriate state from the dropdown:

  • Louisiana - LA-1132 format
  • Texas - ENHR-E format
  • Colorado - CO-2.00 format
  • Ohio - OH format

4. Start Validation

Click "Validate File" button

Processing Time: - Small files (< 100 records): 2-5 seconds - Medium files (100-500 records): 5-15 seconds - Large files (500+ records): 15-60 seconds

AI-Powered Corrections

After validation, you can request AI-powered corrections using AWS Bedrock. The system analyzes errors and suggests intelligent fixes. This adds processing time but makes many errors auto-correctable.

5. Review Results

After validation completes, you'll see:

Error Summary:

Total Errors: 28
❌ Validation Errors: 28
📋 Records with Errors: 15

Error Details: - Line number - Field name - Error type - Current value - Error description

6. Request AI Corrections (Optional)

If errors are found, you can:

  1. Request Corrections: Click to submit correction job to AI workers
  2. Monitor Progress: Job status shows in queue (pending → processing → completed)
  3. Download Corrected File: Once complete, download the corrected version

Processing Time for Corrections: - Depends on queue and number of errors - Typically: 30 seconds to 2 minutes - Status updates in real-time


Validation Error Types

MISSING_REQUIRED_FIELD

Cause: Required field is empty

Example:

Line 5, Field: EMPLOYEE-FIRSTNAME
Current Value: (empty)
Error: Required field cannot be blank

AI Correction: May infer from other employee data or employer information


INVALID_FORMAT

Cause: Field doesn't match expected format

Example:

Line 8, Field: EMPLOYEE-SSN
Current Value: "12345678"
Error: Must be 9 digits

AI Correction: Suggests proper format based on context and validation rules


INVALID_LENGTH

Cause: Record isn't the correct length for the state format

Example:

Line 3
Expected: 1132 characters
Actual: 1130 characters
Error: Record too short

AI Correction: Identifies missing or extra data and adjusts spacing


INVALID_VALUE

Cause: Value is outside allowed range or invalid code

Example:

Line 12, Field: EMPLOYEE-STATE
Current Value: "XX"
Error: Must be valid 2-letter state code

AI Correction: Suggests nearest valid value based on context


Understanding AI Corrections

How AWS Bedrock Analyzes Errors

  1. Context Analysis: Reads entire record for context
  2. Pattern Recognition: Identifies data patterns across the file
  3. Intelligent Inference: Infers missing values from available data
  4. Validation: Verifies suggestions against format rules
  5. Structured Output: Returns corrected records in proper format

AI Models Available

The system uses AWS Bedrock with:

Model Provider Speed Accuracy Cost Best For
Claude Sonnet 4.5 Anthropic Fast Excellent Medium Production (default)
Llama 4 Scout Meta Very Fast Good Low Budget deployments

To change models, update BEDROCK_MODEL_ID in .env:

# Claude Sonnet 4.5 (default)
BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0

# Llama 4 Scout (budget option)
BEDROCK_MODEL_ID=us.meta.llama4-scout-17b-instruct-v1:0

When AI Can Help Most

✅ Missing employee address (infer from employer) ✅ Invalid formats (SSN, dates, ZIP codes) ✅ Missing employer data ✅ Obvious typos and formatting issues ✅ Consistent data patterns across records ✅ Space/padding corrections

When AI May Struggle

❌ Missing critical unique data (SSN, names) with no context ❌ Ambiguous field relationships ❌ Complex business rule violations ❌ Data requiring external verification ❌ Completely corrupted records


File Size Limits

Environment Max File Size Max Records Notes
Development 50 MB ~40,000 Configurable
Production 50 MB ~40,000 Adjust in reverse proxy

To increase limits, update reverse proxy configuration:

client_max_body_size 100M;
request_body {
    max_size 100MB
}

Performance Tips

Faster Validation

  • Validate without corrections first: See if errors exist before requesting corrections
  • Process during off-peak hours: AWS Bedrock API may be faster at different times
  • Split large files: Process in smaller batches for faster turnaround
  • Scale workers: Increase worker count for higher throughput

Reduce AWS Costs

  • Fix recurring errors: Prevent same errors in future files
  • Use cheaper model: Switch to Llama 4 Scout (70% cheaper than Claude)
  • Reduce retry attempts: Lower MAX_AI_ATTEMPTS in .env
  • Batch processing: Process multiple files in one session
  • Set concurrency limits: Adjust MAX_CONCURRENT_BEDROCK_CALLS based on budget

Cost Estimates: - Claude Sonnet 4.5: ~$0.045 per correction job - Llama 4 Scout: ~$0.015 per correction job


Troubleshooting

Validation Stuck or Spinning

Cause: Network issue or backend problem

Solutions: 1. Wait 60 seconds (timeout period) 2. Refresh browser page (Ctrl+Shift+R) 3. Check backend logs: docker-compose -f docker-compose.prod.yml logs -f backend 4. Restart backend: docker-compose -f docker-compose.prod.yml restart backend


"Invalid File Format"

Cause: File doesn't match selected state format

Solutions: 1. Verify correct state selected 2. Check file is plain text (not Excel, PDF) 3. Verify record length matches state specification 4. Check for hidden characters: cat -A yourfile.txt | head 5. Ensure file uses UTF-8 encoding


No Errors Found (But Expected)

Cause: File is actually valid OR wrong validation rules

Solutions: 1. Verify correct state format selected 2. Check if file has correct record length 3. Review state specifications for expected format 4. Contact support if rules seem incorrect


Too Many Errors (Unexpected)

Cause: Wrong state format OR line ending issues

Solutions: 1. Double-check state selection 2. System auto-handles line endings (CRLF → LF) 3. Try downloading file again from source 4. Verify file wasn't modified (encoding changes) 5. Check if file is corrupted


Corrections Not Processing

Cause: Workers not running or AWS Bedrock issues

Solutions:

# Check worker status
docker-compose -f docker-compose.prod.yml ps workers

# Check worker logs
docker logs newhires-workers --tail=50

# Look for AWS errors
docker logs newhires-workers | grep -i "error\|bedrock"

# Verify AWS credentials
docker exec newhires-workers env | grep AWS

# Restart workers
docker-compose -f docker-compose.prod.yml restart workers

See AWS Bedrock Troubleshooting for detailed error solutions.


Job Stuck in "Processing"

Cause: Worker crashed or Bedrock timeout

Solutions: 1. Wait 2-3 minutes (workers have retry logic) 2. Check worker logs for errors 3. Verify worker is running: docker ps | grep newhires-workers 4. Check job queue status:

docker exec newhires-db psql -U newhires -d newhires -c \
  "SELECT id, status, created_at FROM correction_jobs ORDER BY created_at DESC LIMIT 10;"
5. Restart workers if needed


Best Practices

Before Uploading

  • ✅ Verify file is the correct state format
  • ✅ Check file hasn't been modified (opens in Notepad, not Excel)
  • ✅ Ensure file uses UTF-8 encoding
  • ✅ Keep backup of original file
  • ✅ Remove any header/footer rows not part of data

During Validation

  • ✅ Wait for full completion (don't refresh prematurely)
  • ✅ Review error summary before requesting corrections
  • ✅ Check validation results are reasonable
  • ✅ Note how many errors are reported

After Validation

  • ✅ Download validation report immediately
  • ✅ Keep both original and corrected versions
  • ✅ Review corrections before submitting to state
  • ✅ Test corrected file with state's own validation tool
  • ✅ Document any recurring error patterns

When Using AI Corrections

  • ✅ Review AI-corrected records before submission
  • ✅ Verify critical fields (SSN, names, addresses)
  • ✅ Check that corrections make sense in context
  • ✅ Keep original file for comparison
  • ✅ Monitor AWS costs in production

Data Privacy

What Data is Sent to AWS Bedrock?

When you request corrections: - Only records with errors are sent - Only the specific fields needing correction - Employer context data (for inference) - NO complete file is uploaded to AWS

AWS Bedrock Data Handling

  • ✅ AWS does not store or train on your data
  • ✅ Data stays in us-east-1 region
  • ✅ Encrypted in transit (HTTPS)
  • ✅ Complies with SOC 2, HIPAA-eligible
  • ❌ Data is not retained after API response

See: AWS Bedrock Data Privacy


Supported State Formats

State Format Name Record Length Status
Louisiana LA-1132 1132 chars ✅ Fully supported
Texas ENHR-E 801 chars ✅ Fully supported
Colorado CO-2.00 860 chars ✅ Fully supported
Ohio OH Variable ✅ Fully supported

Need another state? Contact your development team to add support.


Next Steps