[NEW] Looking for a job in tech? Companies will find you — just fill in your profile
Close

Site Reliability Engineering Manager at Toshiba

location-pointer-icon Warsaw, Wroclaw, Krakow, Katowice, Posnan
B2B
DevOps
remote
Apply

About the Role:

We are looking for a seasoned and strategic Site Reliability Engineering (SRE) Manager tolead and grow our SRE team. You will be responsible for building and managing a team ofengineers who ensure the reliability, scalability, and performance of our mission-critical systemsand services. As the SRE Manager, you will play a key role in the design and execution ofoperational best practices while promoting a culture of automation and continuous improvementacross the engineering teams.

Key Responsibilities:

  1. Team Leadership and Mentorship — Lead, mentor, and grow a team of Site ReliabilityEngineers by providing guidance on best practices, technical decisions, and careerdevelopment.
  2. Operational Excellence — Own the overall reliability, uptime, and performance of thesystems and services, ensuring they meet business SLAs and customer expectations.
  3. Incident Management — Oversee the incident response process, including monitoring,alerting, incident resolution, and root cause analysis, with a focus on improving responsetimes and minimizing impact.
  4. Automation and Tooling — Drive the adoption of automation and self-service tools toreduce manual intervention, improve system reliability, and enhance engineeringproductivity.
  5. Collaboration with Engineering Teams — Work closely with software developers, QAteams, and other stakeholders to embed reliability into the design and development ofapplications and services.
  6. Capacity Planning and Performance Optimization — Manage capacity planning,performance monitoring, and optimization to ensure the infrastructure can scale to meetbusiness needs.
  7. Infrastructure Management — Collaborate with the DevOps and cloud infrastructureteams to manage, maintain, and optimize cloud infrastructure using modern IaC(Infrastructure as Code) tools and methodologies.
  8. Budget and Resource Management — Manage budgets, vendor relationships, andresource allocations to ensure efficient use of infrastructure and technology investments.
  9. Drive SRE Culture — Promote a culture of continuous improvement, emphasizinglearning from failure, monitoring, and proactive problem-solving.
  10. Security and Compliance — Work closely with security teams to implement bestpractices for secure infrastructure and ensure compliance with internal and externalregulations.

Required Qualifications:

  • Bachelor’s degree in Computer Science, Engineering, Information Technology, or equivalent work experience.
  • 5+ years of experience in site reliability engineering, DevOps, or infrastructure roles with at least 2 years in a leadership or management role.
  • Deep knowledge of cloud platforms (AWS, Google Cloud Platform, Azure) and the ability to manage highly available and scalable infrastructure.
  • Hands-on experience with monitoring, alerting, and observability tools (Datadog, Prometheus, Grafana, ELK, etc.).
  • Strong expertise in automation tools and practices (CI/CD pipelines, IaC tools such as Terraform).
  • Solid understanding of containers and orchestration tools (Docker, Kubernetes).
  • Proven experience with incident management, root cause analysis, and post-mortem processes.
  • Deep knowledge of Linux/Unix systems administration and networking concepts (DNS, TCP/IP, load balancing).
  • Strong communication and leadership skills, with the ability to collaborate across teams and functions.

Preferred Qualifications:

  • Experience with large-scale distributed systems and high-availability architectures.
  • Familiarity with security best practices for cloud environments.
  • Experience managing multi-region, multi-cloud deployments.
  • Prior experience working in Agile or Scrum environments.
  • Knowledge of cost optimization strategies for cloud infrastructure

Soft Skills:

  • Strong organizational and time management skills.
  • Ability to influence and inspire teams to adopt best practices.
  • Excellent verbal and written communication skills for both technical and non-technical stakeholders.
  • Ability to think strategically while staying hands-on when necessary.
  • Demonstrated problem-solving skills and a proactive approach to identifying risks and finding solutions
Xenoss
Outsource
100 - 300
Industry
Adtech/Advertising
Founded
2013

This site uses cookies to offer you a better browsing experience.

Find out more on how we use cookies and how to change cookie preferences in our Cookies Policy.

Customize
Save Accept all cookies