ML-Ops Engineer

Toronto, ON

Position Overview:

ShyftLabs is searching for an ML-Ops Engineer to join our expanding team. In this role, you will own our end-to-end Machine Learning platform and core infrastructure, from research to Machine Learning production pipelines. As an ML-Ops Engineer, you’ll work as part of a team of problem solvers, helping to solve complex business issues from strategy to execution. You should be self-directed and comfortable implementing training and inference pipelines in collaboration with a wide range of stakeholders and cross-functional teams. As an ideal candidate, you need to have a strong background in both Machine Learning and Software Development.

ShyftLabs is a growing data product company that was founded in early 2020, and works primarily with Fortune 500 companies. We deliver digital solutions built to help accelerate the growth of businesses in various industries, by focusing on creating value through innovation.

Job Responsibilities:

Design and implement large-scale ML systems to support training and serving workloads.
Collaborating and share knowledge with our cloud ops team to compress time-to-production for Machine Learning.
Build tooling and pipelining abstractions to allow Data scientists to focus on experimentation while empowering self-service workflows to deploy and serve models reliably and consistently.
Help Data Scientists produce clean, reproducible, and highly performant machine learning systems through rigorous code review with a lens on software quality.
Advocate for automation and monitoring at all steps of ML system construction, and help to define best practices based on personal industry experience and research across the Machine Learning team.
Support life cycle management of deployed ML apps (e.g., new releases, change management, monitoring and troubleshooting).
Participate in sprint planning, estimations, and reviews.

Qualifications:

3+ years of software development experience, preferably in Python.
3+ of hands on Machine Learning experience.
Knowledge of AI-Ops concepts and practices. Have worked or knowledge of LLMs, Multimodal and Gen AI implementation.
Experience with maintaining functional, production reference architectures for end-to-end Machine Learning in cloud.
Experience with Vertex AI is considered a must-have.
Must have experience with ML-Ops tools and platforms such as Vertex AI, MLFlow/Airflow and DVC.
Strong Linux system administration skills.
Experience with declarative infrastructure and Kubernetes (GKE) for model serving and scalable inference.
Exposure to automated testing and CI/CD in the ML context.
Knowledge of SQL and relational databases, query authoring (SQL) and designing variety of databases (e.g., Postgres SQL).
Understanding of fundamental ML concepts.
Interest in continually learning and trying new tools.
Strong cross-team communication and collaboration skills.

We are proud to offer a competitive salary alongside a strong healthcare insurance and benefits package. The role is able to be fully remote anywhere within the United States. We pride ourselves on the growth of our employees, offering extensive learning and development resources.

ShyftLabs is an equal-opportunity employer committed to creating a safe, diverse and inclusive environment. We encourage qualified applicants of all backgrounds including ethnicity, religion, disability status, gender identity, sexual orientation, family status, age, nationality, and education levels to apply. If you are contacted for an interview and require accommodation during the interviewing process, please let us know.

Apply