About The Position:
Our customer is a leading provider of security and intelligence for unmanaged networks.
- Ensure the reliability, availability, and performance of critical systems;
- Develop and maintain automation scripts and tools to streamline operations;
- Develop and maintain monitoring dashboards & alerts;
- Lead incident response efforts and post-mortem analysis to prevent future occurrences;
- Optimize system performance and scalability;
- Implement and maintain security best practices;
- Create and maintain comprehensive documentation for systems and processes;
- Participate in on-call rotations to provide support for critical systems.
- At least 5+ years of experience as a Site Reliability Engineer;
- Experience in software engineering and systems administration;
- Proficiency in one or more programming languages, such as Python, Go;
- Experience with AWS cloud platform;
- Hands-on experience with tools like Terraform, Ansible, or CloudFormation;
- Expertise in Docker and Kubernetes;
- Proficiency with monitoring tools like Prometheus, Grafana;
- Proficiency with logging tools like ELK stack or Loki stack;
- Experience with continuous integration and continuous deployment tools such as Jenkins, GitLab CI, or CircleCI;
- Strong understanding of networking concepts, protocols, and security;
- Bachelor’s degree in computer science, Engineering, or a related field. Advanced degrees are a plus;
- English – Upper-Intermediate+.
5-day working week, 8-hour working day, flexible schedule;
All public holidays are days off;
Vacation and sick leave are covered by the company;
Remote work.