We’re looking for a highly technical, independent, and visionary Big Data Engineer to take ownership of our next-generation distributed training pipelines and infrastructure. This is a hands-on, high-impact role in the core of our algorithmic decision-making systems — shaping how models are trained and deployed at a scale across billions of data points in real-time AdTech environments.
You’ll be responsible for designing and building scalable ML systems from the ground up from data ingestion to model training to evaluation. You'll work closely with Algo researchers, data engineers, and production teams to drive innovation and performance improvements throughout the lifecycle.
- Design and build large-scale data processing pipelines.
- Build scalable infrastructure for data preprocessing, feature engineering, and model evaluation.
- Lead the technical design and development of new backend systems: from architecture to production.
- Collaborate cross-functionally with DS, infra teams, Product, BA and Engineering teams to define and deliver impactful solutions.
- Own the full lifecycle of services: tooling, versioning, monitoring, automation, measuring results and quickly responding to critical issues.
- Continuously research and adopt best-in-class practices in MLOps, performance tuning, and distributed systems.
- B.Sc. or M.Sc. in Computer Science, Software Engineering, or other equivalents fields.
- 5+ years of hands-on experience in backend or data engineering.
- Strong Python skills and experience working with distributed systems and parallel data processing frameworks such as Spark (using PySpark or Scala), Dask, or similar technologies. Familiarity with Scala is a strong advantage, especially in performance critical.
- Experience in cloud environments (AWS, GCP, OCI) and containerized deployment (Kubernetes).
- Understanding databases and SQL for data retrieval.
- Strong communication skills and ability to drive initiatives independently.
- A passion for clean code, elegant architecture, and measurable impact.
- Monitoring and alerting tools (e.g. Grafana, Kibana).
- Experience working with in-memory and NoSQL databases (e.g. Aerospike, Redis, Bigtable) to support ultra-fast data access in production-grade ML services.
- Proven track record in designing and scaling ML infrastructure.
- Deep understanding of ML workflows and lifecycle management.
- Polish public holidays.
- 20 working days per year is Non-Operational Allowance and settled to be used for personal recreation matters and are compensated in full. These days have to be used within the year, with no rollover to the next calendar year.
- Health Insurance.
- Gym Subscription (Multisport).