Ctrl IQ is a post Series Ap, focused on modernization of High Performance Computing (HPC) infrastructure and capabilities for not only traditional HPC (e.g. simulation, pharmaceuticals/medicine, energy, aerospace, financial services/trading, etc.) but also enterprise focused computing needs like AI/ML training and inferencing as well as compute and data analytics.


Ctrl IQ develops software infrastructure for the enterprise. We are the founding company behind Rocky Linux and have brought to market major advances within the traditional High Performance Computing (HPC) ecosystem (e.g. simulation, pharmaceuticals/medicine, energy, aerospace, financial services/trading, etc.) as well as enterprise focused computing needs like AI/ML training and inferencing as well as compute and data analytics.


For this position, we are seeking a talented and experienced software/site reliability engineer to build and maintain the infrastructure for our solution of cloud 2.0.


Successful candidates will have interest and experience in some of in the following areas: containers (Singularity, Docker, OCI, etc.), orchestration (Kubernetes/Nomad/Mesosphere), distributed workloads, data movement, AI/ML training, DevOps, container registries, security, PKI, encryption, etc.


SRE’s focus is to serve as the operational/reliability side of the team in order to provide a highly available & hands off deployment of our services.  Both on-prem and cloud deployments are part of the game plan. If this person wants to be a Lead or a Manager there are future growth opportunities as well as Ctrl IQ grows.


Responsibilities:

  • Work closely with the development team.

  • Be part in architecture level discussions, planning, as well as implementation (lines of Go & Terraform code)

  • Research to ensure what we are building is always the best path forward

  • Document each project to facilitate integration for users

  • Drive proof of concepts and minimal viable products for demonstration

  • Release fast and release often software development mentality

  • Delivery of Infrastructure as Code


Skills that will help in general:

  • Friendly, collaborative, humble, honest, and always striving to be better

  • Excellent communication skills

  • Ability to work independently as well as collaboratively in a remote team environment

  • Identify, analyze, and resolve complex software design problems

  • Contributions to open source software projects

  • Experience with Kubernetes

  • Experience with Go

  • Experience with Terraform


Required for the SRE:

  • Excellent communication skills

  • Cloud Experience (AWS/Azure/GCP)

  • Linux fluency

  • 3+ years of SRE/related experience: this shouldn’t be your first rodeo.

  • 2+ years programming experience (A prior role as SWE would be ideal)


We currently offer full benefits (medical, dental, and vision - medical coverage for both employees and their dependents is 80% employer/20% employee) to all of our regular full-time U.S. based employees along with bonuses, stock options and a flexible hours/time-off policy.


Remote work, no required travel for most positions.


This position has been filled. Would you like to see our other open positions?