Interested in a mission-driven job ensuring perpetual open access to information for a global audience? Enjoy helping scale the use of services and products critical to hundreds of national and international non-profits, libraries, universities, cultural heritage institutions, and mission-driven organizations? If so, the Internet Archive is seeking a Software Engineer for our Archiving & Data Services team.
Internet Archive (IA) is a non-profit digital library, top 200 website at archive.org, and an archive of over 99 petabytes of digital information running in many self-owned and operated data centers. Internet Archive also provides mission-aligned services to thousands of organizations working collaboratively to advance our shared goal of “Universal Access to All Knowledge.” The Archiving & Data Services group provides a suite of paid, SaaS, and free products, as well as community programs, focused on the archiving, management, analysis, and accessibility of digital information. Its services are used by over 1,500 organizations around the world.
We are looking for a motivated, detail-oriented Software Engineer to join our team. The role will focus on Archive-It (archive-it.org), our platform for building, sharing, and preserving web archive collections. This position offers the opportunity to work with a range of technologies and gain deep knowledge about web crawling, archival replay, and large-scale distributed systems. Our services work with petabytes of archived data and facilitate the discovery and use of large-scale digital collections. The Software Engineer will have the unique opportunity to build things that further open access to information and advance the public good.
Key Responsibilities:
Collaborate with team members to understand user needs, design new features, support web crawling and preservation, and improve the performance and reliability of Archive-It and other department products.
Implement, test, and maintain software across our stack (Python, Elasticsearch, Postgres, Temporal, HTML/CSS/JS/TS).
Develop, monitor, and maintain the Archive-It partner application, where web crawls are configured, scheduled, and reported.
Improve a distributed system orchestrating web crawls and post-processing them for long term preservation, indexing for retrieval, deduplication, and reporting.
Participate in code reviews to ensure the quality and stability of our software and diffusion of knowledge across the team.
Document architecture, software, and features for internal and external users.
Qualification and Skills:
Degree in Computer Science or a related field, or equivalent experience, strongly preferred.
Proficiency in Python, with familiarity in Postgres, Elasticsearch, and HTML/CSS/JS preferred.
A strong understanding of web services and distributed systems.
Excellent problem-solving skills, attention to detail, and ability to work both independently and collaboratively.
Experience with web crawling, Django, workflow systems (e.g. Temporal, Airflow), distributed databases (e.g. Cassandra, Scylla), Hadoop, and Ansible are a plus
GitLab, GitHub, Sentry, Grafana, JIRA, are other tools we use.
Our independently operated data centers run Ubuntu Linux VMs and our department runs everything from the VM up, so Linux experience is preferred.
An interest in the Internet Archive’s mission to provide Universal Access to All Knowledge is expected.
Job Details:
Remote applicants preferred. We have headquarters in San Francisco and Vancouver and candidates in those locations will have the option for hybrid remote/in-office arrangements. Candidates will need to have some time overlap with primarily North America (and largely Pacific Time) based colleagues. Compensation and title will be commensurate with experience and the role is open to candidates of varying seniority with a general, but negotiable, salary range of $90,000 to $115,000 based on living in the San Francisco, CA region. Compensation may be adjusted based on the geographic location of the finalist.
Benefits & Perks:
The Internet Archive is a remote first workplace and provides a comprehensive benefits package including; PTO, paid holidays, and medical benefits. Depending on where you live, we also provide these additional benefits; dental, vision, health savings accounts, flex spending accounts, commuter benefits, short term disability, long term disability and retirement programs.
At the Internet Archive, we believe we do our best work when our employees bring together diverse ideas. Members of all groups under represented in the tech industry and library world are strongly encouraged to apply. We are proud to be an equal opportunity workplace and are committed to equal employment opportunity regardless of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, genetic information, veteran status, gender identity or expression, sexual orientation, or any other characteristic protected by applicable federal, state or local law.