Infrastructure Talent Pool (Storage Engineer, Site Reliability Engineer, MLOps)
Create an account and we will automatically match you with the most relevant job opportunities.
Intro
Cohere is seeking talented individuals to join their Infrastructure Team in roles such as Storage Engineer, Site Reliability Engineer, and MLOps. The team is responsible for building world-class infrastructure critical to Cohere's success in AI systems. They value diversity, inclusivity, and a collaborative work environment. If you have experience in engineering, supporting MLEs or data scientists, designing distributed systems with Kubernetes, and working in complex Linux-based environments, this could be the place for you. Cohere offers a range of perks and benefits to support employees' well-being and personal development.
Tasks
- Designing, deploying, supporting, and troubleshooting in complex Linux-based distributed computing environments
- Running production infrastructure at a large scale
- Working with and supporting MLEs or data scientists
- Synchronizing data between different cloud providers
- Building internal tooling to help data engineers manage costs and data lifecycle optimization
Requirements
- 5+ years of engineering experience running production infrastructure at a large scale
- Experience with Kubernetes and GPU workloads
- Experience with distributed filesystems like Lustre, Weka, or Vast
- Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based distributed computing environments
Benefits
- Open and inclusive culture
- Work on cutting-edge AI research
- Weekly lunch stipend
- Full health and dental benefits
- Parental Leave top-up
- Personal enrichment benefits
- Remote-flexible with offices in multiple locations
- 6 weeks of vacation
Mention baito
You like what we are doing? You can support us by mentioning that you found this job on baito.