Maintenance and updates for your On-Premise AI infrastructure are professionally planned and executed to minimise downtime and ensure system stability. The monthly maintenance includes regular updates, security patches and proactive maintenance.
What’s included in the monthly maintenance?
Regular software updates
AI models:
- Updates to newer model versions
- Performance improvements
- Bug fixes
Platform software:
- Operating system updates
- Docker/Kubernetes updates
- API gateway updates (LiteLLM)
- Monitoring tool updates
Security updates:
- Security patches (critical, immediate)
- Bug fixes
- Vulnerability fixes
Proactive maintenance
Monitoring and surveillance:
- 24/7 system monitoring
- Performance metrics tracking
- Early problem detection
Optimisations:
- Performance optimisations
- Configuration adjustments
- Resource optimisations
Backup and disaster recovery:
- Regular backups
- Disaster recovery tests
- Data backup
Update process
1. Planning and coordination
Before each update:
- Analysis of update requirements
- Risk assessment
- Coordination with your team
- Maintenance window planning
Communication:
- Advance notice (usually 1–2 weeks before)
- Clear information on changes
- Expected downtime (usually minimal)
2. Update execution
Standard updates:
- Usually during maintenance windows
- Coordinated with your team
- Minimal downtime
Critical updates:
- Security patches: immediately if critical
- Coordinated but prioritised
Zero-downtime updates:
- Possible with Kubernetes clusters
- Rolling updates without downtime
- Automatic rollback on problems
3. Testing and validation
After each update:
- Functional tests
- Performance tests
- Integration tests
- Functionality validation
Maintenance windows
Planned maintenance windows
Typical maintenance windows:
- Weekly: Small updates (usually without downtime)
- Monthly: Larger updates (coordinated)
- Quarterly: Major updates (planned)
Scheduling:
- Usually outside business hours
- Coordinated with your team
- Minimal downtime
Emergency updates
Critical security patches:
- Immediate installation required
- Coordinated but prioritised
- Minimal downtime
Critical bug fixes:
- Quick resolution required
- Coordinated with your team
Update strategies
1. Rolling updates (Kubernetes)
For Kubernetes clusters:
- Updates without downtime
- Incremental update
- Automatic rollback on problems
Advantage: Zero-downtime updates possible
2. Blue-green deployment
For critical systems:
- Parallel systems during updates
- Seamless switchover
- Immediate rollback possible
Advantage: Maximum availability
3. Canary deployments
For larger updates:
- Gradual rollout
- Testing with small user group
- Full rollout after validation
Advantage: Risk minimisation
Backup strategy
Regular backups
What is backed up:
- Configurations
- Models (if custom)
- Data (if stored locally)
- System states
Backup frequency:
- Daily: Automatic backups
- Before updates: Additional backups
- Monthly: Full backups
Disaster recovery
Recovery tests:
- Regular tests of backup restoration
- Validation of recovery times
- Documentation of recovery processes
Monitoring during updates
Real-time monitoring
During updates:
- Live monitoring of system performance
- Automatic alerts on problems
- Immediate response to problems
After updates:
- Validation of functionality
- Performance comparison
- Error detection
Frequent questions
How often are updates carried out?
Standard updates:
- Weekly: Small updates (usually automatic)
- Monthly: Larger updates (coordinated)
- As needed: Security patches (immediate)
Can updates be rolled back?
Yes:
- Automatic rollback on problems (Kubernetes)
- Manual rollback possible
- Backup restoration as fallback
Is downtime communicated?
Yes:
- Advance notice (1–2 weeks before)
- Clear information on expected downtime
- Usually minimal or no downtime
Can updates be postponed?
Yes:
- Non-critical updates can be postponed
- Coordination with your team possible
- Critical security updates have priority
Next steps
Would you like to know more about maintenance and updates?
- Contact us – Get advice on maintenance processes
Sources and further information:
- On-Premise AI for SMEs – Maintenance and remote maintenance