4 steps to fix security issues of SSH access to production environments

4 steps to fix security issues of SSH access to production environments

Fast access to the right engineers in production is critical for product speed.

But you have many problems if you control access to production using SSH.

Troubleshooting, bug fixes, and incident resolutions depend on fast data access.

Unfortunately, many teams use bad solutions for granting access. And it creates significant security risks to the business or inefficient workflows.


Building infrastructure for production access using SSH is painful.

The missing components in your access management become hidden vulnerabilities. They are not talked about but are huge attack vectors:

  • They lack Single Sign-on & MFA
  • They lack Audit Trials and PII protection
  • They lack Compliance (GDPR, PCI, SOC2, and HIPPA)
  • They lack Developer Experience

But you can use the 80/20 rule and gradually get these features in place with these four steps.

1. Centralize access with systems you already manage

You don't need an LDAP directory if you already use Google Workspaces.

Adding SSO to SSH, Auditing the Syslog of Kubernetes, and Recording a Rails Console sessions. Challenging. Look for tools that can help. One option is using Cloud Shell solutions from AWS/Google Cloud. Another example is Runops. Don't make SSO a big project that needs many new tools. Instead, start integrating what you can to Google OAuth. It is one less tool to set up and manage. LDAP has many extra features, but it's better to have SSO+MFA that comes with Google OAuth than wait 6-8 more months before you can start the LDAP project.

One tool solving 80% of the problem is better than five tools solving 20% of the problem each.

2. Prioritize features relevant to your industry

Some companies need better developer experience and faster access. Conversely, highly regulated businesses have fewer people with access and robust security and compliance.

How many steps does a developer need, from opening the Terminal to getting inside the Rails Console? An SSH-based workflow requires at least ten. Suppose you are in an industry that doesn't have to comply with a lot of regulations. And doesn't deal with sensitive data. Do not spend time on audit features until you nail Developer Experience, SSO, and MFA.

Can you make it two steps instead?

The same is valid on the other side.

Fintechs, for example, have PCI as a requirement for doing business. Depending on their position in the payments stack, conditions are more extensive. You may need 20 steps from Terminal to Console, which is ok (to start).

No audit for Fintechs means no business. You won't start with Developer Experience.

3. Use solutions that solve more than one use case

Reduce complexity by adding Rails Console, AWS/GCP, databases, Kubernetes, Servers, and anything you need to manage in a single tool.

You may have a shiny tool or open-source project that nails the Rails Console use case. But what if you also have to support databases, cloud providers, and other types of access? The shiny tool becomes a problem. You now have two, three, or five different tools to set up and manage. As an example, many companies use Runops to manage cloud provider access. The experience isn't the best as it is limited to the CLI. But they benefit from having a single tool with every use case of their access needs, from Kubernetes to databases.

Delivering a slightly worst UX for everything with a single tool is better than managing different tools for each use case.

4. Add friction to easy but unwanted methods

Many teams do this, but they aren't proud. I wasn't when I had to.

Say the way engineers access the production Rails Console today has security issues. It is the fastest but lacks audit trails or other compliance needs. You could add a form submission to the current process to incentivize the safer method. Now the web console is not the fastest anymore. People hate forms. And they will fall back to the currently most rapid approach and the one you want.

I wouldn't say I like this approach.

But it does the job when you lack time and resources.

A typical scenario is people changing things using the AWS web console instead of an automated IaC pipeline. You can make console access harder to get by putting it behind a Jira request. You aren't revoking access or stopping the current process, but teams will prefer the autonomous approach. Over time you can fix and improve the experience to be better than the console.

You can make the right way the easiest by adding complexity to the easiest but unwanted way.