File Validation¶
Learn how to validate employer reporting files using the New Hires Reporting System.
Workflow Overview¶
graph LR
A[Upload File] --> B[Select State]
B --> C[Validate]
C --> D{Errors Found?}
D -->|Yes| E[Review Errors]
D -->|No| F[Download Valid File]
E --> G[Request Corrections]
G --> H[Download Corrected File]
Step-by-Step Guide¶
1. Access the Application¶
Open your browser and navigate to:
- Development: http://localhost:8080
- Production:
https://your-domain.com
2. Upload File¶
- Click "Browse files" or drag-and-drop
- Select your employer reporting file
- Supported formats:
.txt(fixed-width text)- Louisiana: 1132 characters per record
- Texas: 801 characters per record
- Colorado: 860 characters per record
- Ohio: Variable-length format
3. Select State Format¶
Choose the appropriate state from the dropdown:
- Louisiana - LA-1132 format
- Texas - ENHR-E format
- Colorado - CO-2.00 format
- Ohio - OH format
4. Start Validation¶
Click "Validate File" button
Processing Time: - Small files (< 100 records): 2-5 seconds - Medium files (100-500 records): 5-15 seconds - Large files (500+ records): 15-60 seconds
AI-Powered Corrections
After validation, you can request AI-powered corrections using AWS Bedrock. The system analyzes errors and suggests intelligent fixes. This adds processing time but makes many errors auto-correctable.
5. Review Results¶
After validation completes, you'll see:
Error Summary:
Error Details: - Line number - Field name - Error type - Current value - Error description
6. Request AI Corrections (Optional)¶
If errors are found, you can:
- Request Corrections: Click to submit correction job to AI workers
- Monitor Progress: Job status shows in queue (pending → processing → completed)
- Download Corrected File: Once complete, download the corrected version
Processing Time for Corrections: - Depends on queue and number of errors - Typically: 30 seconds to 2 minutes - Status updates in real-time
Validation Error Types¶
MISSING_REQUIRED_FIELD¶
Cause: Required field is empty
Example:
AI Correction: May infer from other employee data or employer information
INVALID_FORMAT¶
Cause: Field doesn't match expected format
Example:
AI Correction: Suggests proper format based on context and validation rules
INVALID_LENGTH¶
Cause: Record isn't the correct length for the state format
Example:
AI Correction: Identifies missing or extra data and adjusts spacing
INVALID_VALUE¶
Cause: Value is outside allowed range or invalid code
Example:
AI Correction: Suggests nearest valid value based on context
Understanding AI Corrections¶
How AWS Bedrock Analyzes Errors¶
- Context Analysis: Reads entire record for context
- Pattern Recognition: Identifies data patterns across the file
- Intelligent Inference: Infers missing values from available data
- Validation: Verifies suggestions against format rules
- Structured Output: Returns corrected records in proper format
AI Models Available¶
The system uses AWS Bedrock with:
| Model | Provider | Speed | Accuracy | Cost | Best For |
|---|---|---|---|---|---|
| Claude Sonnet 4.5 | Anthropic | Fast | Excellent | Medium | Production (default) |
| Llama 4 Scout | Meta | Very Fast | Good | Low | Budget deployments |
To change models, update BEDROCK_MODEL_ID in .env:
# Claude Sonnet 4.5 (default)
BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0
# Llama 4 Scout (budget option)
BEDROCK_MODEL_ID=us.meta.llama4-scout-17b-instruct-v1:0
When AI Can Help Most¶
✅ Missing employee address (infer from employer) ✅ Invalid formats (SSN, dates, ZIP codes) ✅ Missing employer data ✅ Obvious typos and formatting issues ✅ Consistent data patterns across records ✅ Space/padding corrections
When AI May Struggle¶
❌ Missing critical unique data (SSN, names) with no context ❌ Ambiguous field relationships ❌ Complex business rule violations ❌ Data requiring external verification ❌ Completely corrupted records
File Size Limits¶
| Environment | Max File Size | Max Records | Notes |
|---|---|---|---|
| Development | 50 MB | ~40,000 | Configurable |
| Production | 50 MB | ~40,000 | Adjust in reverse proxy |
To increase limits, update reverse proxy configuration:
Performance Tips¶
Faster Validation¶
- Validate without corrections first: See if errors exist before requesting corrections
- Process during off-peak hours: AWS Bedrock API may be faster at different times
- Split large files: Process in smaller batches for faster turnaround
- Scale workers: Increase worker count for higher throughput
Reduce AWS Costs¶
- Fix recurring errors: Prevent same errors in future files
- Use cheaper model: Switch to Llama 4 Scout (70% cheaper than Claude)
- Reduce retry attempts: Lower
MAX_AI_ATTEMPTSin.env - Batch processing: Process multiple files in one session
- Set concurrency limits: Adjust
MAX_CONCURRENT_BEDROCK_CALLSbased on budget
Cost Estimates: - Claude Sonnet 4.5: ~$0.045 per correction job - Llama 4 Scout: ~$0.015 per correction job
Troubleshooting¶
Validation Stuck or Spinning¶
Cause: Network issue or backend problem
Solutions:
1. Wait 60 seconds (timeout period)
2. Refresh browser page (Ctrl+Shift+R)
3. Check backend logs: docker-compose -f docker-compose.prod.yml logs -f backend
4. Restart backend: docker-compose -f docker-compose.prod.yml restart backend
"Invalid File Format"¶
Cause: File doesn't match selected state format
Solutions:
1. Verify correct state selected
2. Check file is plain text (not Excel, PDF)
3. Verify record length matches state specification
4. Check for hidden characters: cat -A yourfile.txt | head
5. Ensure file uses UTF-8 encoding
No Errors Found (But Expected)¶
Cause: File is actually valid OR wrong validation rules
Solutions: 1. Verify correct state format selected 2. Check if file has correct record length 3. Review state specifications for expected format 4. Contact support if rules seem incorrect
Too Many Errors (Unexpected)¶
Cause: Wrong state format OR line ending issues
Solutions: 1. Double-check state selection 2. System auto-handles line endings (CRLF → LF) 3. Try downloading file again from source 4. Verify file wasn't modified (encoding changes) 5. Check if file is corrupted
Corrections Not Processing¶
Cause: Workers not running or AWS Bedrock issues
Solutions:
# Check worker status
docker-compose -f docker-compose.prod.yml ps workers
# Check worker logs
docker logs newhires-workers --tail=50
# Look for AWS errors
docker logs newhires-workers | grep -i "error\|bedrock"
# Verify AWS credentials
docker exec newhires-workers env | grep AWS
# Restart workers
docker-compose -f docker-compose.prod.yml restart workers
See AWS Bedrock Troubleshooting for detailed error solutions.
Job Stuck in "Processing"¶
Cause: Worker crashed or Bedrock timeout
Solutions:
1. Wait 2-3 minutes (workers have retry logic)
2. Check worker logs for errors
3. Verify worker is running: docker ps | grep newhires-workers
4. Check job queue status:
docker exec newhires-db psql -U newhires -d newhires -c \
"SELECT id, status, created_at FROM correction_jobs ORDER BY created_at DESC LIMIT 10;"
Best Practices¶
Before Uploading¶
- ✅ Verify file is the correct state format
- ✅ Check file hasn't been modified (opens in Notepad, not Excel)
- ✅ Ensure file uses UTF-8 encoding
- ✅ Keep backup of original file
- ✅ Remove any header/footer rows not part of data
During Validation¶
- ✅ Wait for full completion (don't refresh prematurely)
- ✅ Review error summary before requesting corrections
- ✅ Check validation results are reasonable
- ✅ Note how many errors are reported
After Validation¶
- ✅ Download validation report immediately
- ✅ Keep both original and corrected versions
- ✅ Review corrections before submitting to state
- ✅ Test corrected file with state's own validation tool
- ✅ Document any recurring error patterns
When Using AI Corrections¶
- ✅ Review AI-corrected records before submission
- ✅ Verify critical fields (SSN, names, addresses)
- ✅ Check that corrections make sense in context
- ✅ Keep original file for comparison
- ✅ Monitor AWS costs in production
Data Privacy¶
What Data is Sent to AWS Bedrock?¶
When you request corrections: - Only records with errors are sent - Only the specific fields needing correction - Employer context data (for inference) - NO complete file is uploaded to AWS
AWS Bedrock Data Handling¶
- ✅ AWS does not store or train on your data
- ✅ Data stays in us-east-1 region
- ✅ Encrypted in transit (HTTPS)
- ✅ Complies with SOC 2, HIPAA-eligible
- ❌ Data is not retained after API response
Supported State Formats¶
| State | Format Name | Record Length | Status |
|---|---|---|---|
| Louisiana | LA-1132 | 1132 chars | ✅ Fully supported |
| Texas | ENHR-E | 801 chars | ✅ Fully supported |
| Colorado | CO-2.00 | 860 chars | ✅ Fully supported |
| Ohio | OH | Variable | ✅ Fully supported |
Need another state? Contact your development team to add support.
Next Steps¶
- Learn about Correction Modes (if available)
- Understand Downloading Results
- Troubleshoot issues in Common Issues
- Review AWS Bedrock Error Troubleshooting