Ceph Storage Engineer (English)
Company
VEXXHOST
Date Posted
29-11-2025
Location
Remote
About VEXXHOST
VEXXHOST is a leading provider of cloud infrastructure solutions, offering four core services: OpenStack, Kubernetes, Zuul, and MigrateKit. We're committed to delivering high-performance, reliable, and secure cloud services to businesses worldwide. Our team is passionate about open-source technology and building cutting-edge infrastructure solutions.
Position Overview
We are looking for a Ceph Storage Engineer to design, deploy, operate, and optimize large-scale distributed storage clusters that power our global cloud platform. The ideal candidate has deep hands-on experience with Ceph, a strong understanding of distributed systems, and a passion for building reliable and scalable infrastructure. You will play a key role in shaping the performance, resiliency, and growth of our storage services.
Responsibilities
- Design, deploy, and maintain Ceph clusters that support large-scale production environments
- Monitor and optimize cluster performance, capacity, and reliability
- Implement best practices for data durability, replication, and recovery
- Troubleshoot storage issues at scale
- Manage lifecycle operations including upgrades, expansions, and migrations
- Collaborate with engineering teams to integrate Ceph with OpenStack and Kubernetes platforms
- Automate operational workflows to ensure efficient and repeatable processes
- Participate in on-call rotations to support critical production systems
- Work directly with clients to fix their technical problems
- Contribute to documentation
Qualifications
- Strong hands-on experience operating Ceph in production environments
- Solid understanding of distributed storage concepts and architecture
- Proficiency with Linux systems administration and networking fundamentals
- Experience with OpenStack Cinder, Glance, or Nova storage backends is a strong plus
- Familiarity with Kubernetes persistent storage concepts is an asset
- Scripting experience with Python, Go, etc
- Knowledge of monitoring tools such as Prometheus, Grafana, or ELK stack
- Ability to diagnose performance bottlenecks and conduct root cause analysis
- Strong problem-solving skills and attention to detail
- Excellent communication and collaboration skills
Nice to Have
- Contributions to Ceph or other open-source projects
- Experience with automation tools such as Ansible or Terraform
- Background in large-scale cloud infrastructure environments
- Understanding of object storage protocols such as S3 or Swift
What We Offer
- Remote-first work environment
- Professional development opportunities and conference attendance
- Access to cutting-edge technology and infrastructure
- Collaborative and inclusive team culture
- Opportunity to work with open-source technologies
- Be part of the foundation of a growing company