Senior Site Reliability Engineer (SRE)
geidea · Le Caire
وصف الوظيفة
About the role
Geidea is seeking a Senior Site Reliability Engineer to ensure the reliability, availability, scalability, and performance of its critical production systems. The role blends software and systems engineering to build resilient platforms, drive automation, and improve observability in a 24/7 environment.
Key responsibilities
- Maintain high availability and performance of production services, define and manage SLAs, SLOs, and SLIs.
- Lead incident management, root‑cause analysis, and post‑incident reviews.
- Design, implement, and maintain monitoring, alerting, and dashboard solutions using tools such as CloudWatch, Grafana, Prometheus, ELK, and Zabbix.
- Automate operational tasks with PowerShell, Bash, and Python; build CI/CD pipelines and apply Infrastructure as Code (Terraform, Ansible).
- Participate in a 24/7 on‑call rotation and handle major incidents.
- Conduct performance tuning, capacity planning, and resource optimization.
- Support security hardening and ensure compliance with IT governance.
Required profile
- Senior‑level experience in site reliability or DevOps engineering.
- Proven ability to work in a fast‑paced, 24/7 production environment.
- Strong problem‑solving skills and experience leading incident response.
Required skills
- Linux and/or Windows Server administration.
- Cloud platforms: AWS, Azure, or GCP.
- Monitoring tools: Grafana, Prometheus, ELK, CloudWatch, Zabbix.
- Containerization: Docker.
- Scripting: PowerShell, Bash, Python.
- Infrastructure as Code: Terraform, Ansible.
- CI/CD pipeline creation and deployment automation.
Questions fréquentes
لماذا تبلغ عن هذا العرض؟
قدم طلبك في 30 ثانية
أدخل بريدك الإلكتروني للتقديم. سيتم إنشاء حساب تلقائياً.
بالمتابعة، أنت توافق على شروط الاستخدام.
لديك حساب بالفعل؟ تسجيل الدخول
عزز فرصك
حمّل سيرتك الذاتية وسنقترح عليك الوظائف التي تناسب ملفك.
جاري تحليل سيرتك الذاتية...
geidea
Le Caire