DevOps / SRE Engineer (Remote)

Job summary:

 

The Internet Archive is looking for an expert DevOps / SRE engineer to join the UX Team, working remotely.

 

You will be one of the primary engineers responsible for the Archive.org website (a Top 250 website) and related services. You will be in charge of maintaining and developing the mostly Ansible-managed production cluster, provisioning and configuring servers, maintaining applications, setting up monitoring and alerts, and generally helping keep things running smoothly. There is also the possibility of contributing to front-end development and participating in other UX-related activities. This is a rare opportunity to become a critical member of a small team making a huge impact in the world.

 

Responsibilities:

 

  • Operationally maintaining Archive.org servers and services

  • Maintaining and evolving the Ansible-based provisioning and configuration infrastructure

  • Collaboratively managing the deployment architecture of our staging and production apps

  • Setting up and maintaining monitoring and alerts

  • Identifying and triaging problems when they arise; researching, building consensus around, and implementing solutions

  • Responding to external stakeholders who have apps hosted in our server cluster

  • Working with other DevOps engineers, both on the UX Team and on other teams

  • Communicating effectively with stakeholders

  • Reducing technical debt

  • Being a role model for effective and collaborative engineering practices

  • Maintaining the blog and other Wordpress sites

 

Requirements:

 

  • 3+ years of relevant work experience in a collaborative software development environment

  • Strong Linux system administration skills

  • Expertise with maintaining and optimizing a server cluster through time

  • Experience setting up monitoring and alerting at all levels within a system

  • Excellent problem-solving and debugging skills

  • Excellent verbal and written communication skills

  • Familiarity with website and server security

  • Comfort working in a loosely structured environment requiring individual autonomy and initiative within one's scope of responsibilities

  • Willingness to learn and change, reach compromise with others

  • Remote work with occasional optional on-sites

 

Preferred Skills:

 

  • Automated server provisioning with Ansible (or similar tooling)
  • Web servers, load-balancing, and caching (e.g. nginx, HAProxy)

  • Network & DNS configuration

  • Containerization and clustering (e.g. Docker, nomad, consul)

  • Monitoring and observability (e.g. Grafana, Prometheus, Loki, Sentry)

  • Git, GitLab

  • JIRA, Agile-ish software development

 

About Us:

 

At the Internet Archive, we believe that access to knowledge is a fundamental human right. We are building a digital library of everything, which anyone can upload to for free. We provide free access to researchers, historians, scholars, and the general public. In the Wayback Machine, we've saved over 866 billion web pages. We protect our users' privacy and provide special access to books for the print-disabled. Two million people visit Archive.org every day.

 

Our headquarters are located in San Francisco, and there we host public forums, art exhibitions, performances, film screenings, and other community events. However, our 150+ employees span the globe.

 

Benefits & Perks:

 

The Internet Archive provides a comprehensive benefits package including: PTO, paid holidays, medical, dental, vision, FSA, commuter, STD, LTD, 401K/Roth accounts. Work-life balance is important to us. For engineers located near HQ, we offer catered Friday lunches.

 

Internet Archive is an Equal Opportunity Employer M/F/D/V/L/G/B/T and will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Chance Ordinance.