Software Engineer (Distributed Systems and Platforms)

Full time, onsite or remote, multiple positions

Shoreline provides real-time automation and control for cloud operations. Operators use Shoreline to orchestrate real-time debugging and automated repair across their fleets and services, reducing tickets and improving availability. As a Software Engineer at Shoreline, you’ll get to work on tools that give them leverage in their work.

We are a small, yet highly effective engineering team made up of team members who truly care about the product and each other.

As part of this team, you’ll have the opportunity to:

  • Build products that genuinely improve operational and SRE work, improve availability, and dramatically reduce MTTR

  • Use modern development practices

  • Have organizational visibility and get to influence the product and roadmap

What you'll be doing

  • Implementing a distributed runtime for Shoreline Op, a purpose-built operations oriented language designed to allow operators to interactively debug operational events and automate remediations 

  • Integrating with platforms such as Kubernetes and cloud services to provide fully automated discovery, monitoring and management of resources across a customer’s environment

  • Building systems with arbitrary scale in mind e.g. millions of nodes

  • Developing the Shoreline Agent that runs on customers’ machines to monitor and execute corrective actions with minimal footprint, and continually push the envelope on performance


Who you are

  • Experienced building high scale and high performance distributed systems including implementation, protocol definition, and proving / verifying correctness.

  • Knowledgeable about distributed algorithms such as consensus algorithms, quorum protocols, clocks, and replication.

  • Experienced designing and implementing stateful distributed systems such as distributed databases.

  • Verbal and written English fluency.


Bonus Points

  • Experience with Elixir / Erlang & OTP or another functional programming language.

  • Experience working with one or more platforms including AWS, GCP, Azure, VMware, and/or Kubernetes.

  • Experience with DevOps including debugging production systems

    Get know us

    At Shoreline, we believe work matters when it is useful – that the time we spend building is recovered a thousandfold in the time saved by our customers no longer doing unproductive manual tasks. We believe, to do our best work, we need to bring our whole selves to work – and be both accepted and respected for doing so. We believe doing something significant requires taking risks and accepting failures. We work at Shoreline because we want to participate in creating the environment in which we thrive.

    We operationalize this by establishing core values by which we measure ourselves. In our performance reviews, we assign equal worth to how well employees exemplify our values as we do to the work they’ve produced. We are growing quickly. When interviewing candidates, we care as much about their fit to our culture as we do their ability to do the work. We strive to hire people better than ourselves, both on ability and on culture.


    • Empathy: We listen to our customers, our colleagues, and ourselves. We strive to understand what is important to others and to help them in meaningful ways. We focus outwards, not upon ourselves.

    • Curiosity, Creativity, and Courage: We use our curiosity to understand what is known. We use our creativity to  envision what could be. We use our courage to realize this future, even when uncomfortable.

    • Perseverance: We trust each other to do their work, and we believe in ourselves to do our own. We do what is necessary to succeed, carrying others along as required. We never give up, knowing we will find a way.

    • Humility: We don’t focus on our past accomplishments or failures. We treat each day as an opportunity to do better and become better. We seek out feedback and consider it a gift when provided.

    • Alacrity: We believe in making decisions quickly. Decisions are usually reversible while time cannot be recovered.

    • Delivery: We are builders. We are aggressive when setting goals, understanding that setting small goals leads to mediocrity. We commit fully to meeting the goals we set.

    How To Apply

    If interested, please design and implement a solution to the following problem.   Please use source control for your solution and include a link to the repository.  If you do not want to apply using this web system, please email the solution (attach or include link to repo/gist) and a resume to cristina@shoreline.io. Please include "Software Engineer - Distributed Systems" in the subject of your email.

    Below, you will find types and interfaces specified in Elixir, Shoreline's main programming language. However, we ask that you please complete your solution in the language of your choice. 

    Problem

    Imagine you are building a system to assign unique numbers to each resource that you manage. You want the ids to be guaranteed unique i.e. no UUIDs.  Since these ids are globally unique, each id can only be given out at most once. The ids are 64 bits long.

    Your service is composed of a set of nodes, each running one process serving ids.  A caller will connect to one of the nodes and ask it for a globally unique id.  There are a fixed number of nodes in the system, up to 1024.  Each node has a numeric id, 0 <= id <= 1023. Each node knows its id at startup and that id never changes for the node.

    Your task is to implement get_id.  When a caller requests a new id, the node it connects to calls its internal get_id function to get a new, globally unique id.    

    defmodule GlobalId do

      @moduledoc """

      GlobalId module contains an implementation of a guaranteed globally unique id system.     

      """


      @doc """

      Please implement the following function.

      64 bit non negative integer output   

      """

      @spec get_id(???) :: non_neg_integer

      def get_id(???) do

          

      end


      #

      # You are given the following helper functions

      # Presume they are implemented - there is no need to implement them. 

      #


      @doc """

      Returns your node id as an integer.

      It will be greater than or equal to 0 and less than or equal to 1024.

      It is guaranteed to be globally unique. 

      """

      @spec node_id() :: non_neg_integer

      def node_id 


      @doc """

      Returns timestamp since the epoch in milliseconds. 

      """

      @spec timestamp() :: non_neg_integer

      def timestamp

    end


    You may add other functions to the implementation in order to complete your solution.  

    Please determine the interface to get_id and please provide an explanation as to what parameters it takes in comments.  

    Assume that any node will not receive more than 100,000 requests per second.  

    Please write answers to the following discussion questions and include them in your solution as comments:

    1. Please describe your solution to get_id and why it is correct i.e. guaranteed globally unique.  

    2. Please explain how your solution achieves the desired performance i.e. 100,000 or more requests per second per node.  How did you verify this?

    3. Please enumerate possible failure cases and describe how your solution correctly handles each case.  How did you verify correctness?  Some example cases:  

    How do you manage uniqueness after a node crashes and restarts?  

    How do you manage uniqueness after the entire system fails and restarts?

    How do you handle software defects?

    We will evaluate your solution for correctness, simplicity, clarity, and robustness. Solutions with minimal coordination and persistent storage are best.  Solutions that provide benchmarks for performance and tests to verify correctness are best.    

    If you have any clarifying questions, please email: cristina@shoreline.io