Toyota Motor Corporation recently suffered a data breach due to a mistakenly exposed access key on GitHub. That hardcoded access key evaded detection for five years! This news joined a long line of headlines about the damage caused by hardcoding secrets in code and how it can lead to a full-blown software supply chain attack. When attackers manage to steal source code, the first thing they do is scan it for secrets to extend the impact of their breach. Both Samsung and Nvidia hacks made by Lapsus$ last year, where thousands of sensitive credentials got exposed through their leaked source code, provide clear examples. To combat this pervasive risk, security teams are turning to code scanners that look for secrets, but soon realize that their visibility into all the places hardcoded secrets can be lurking is incomplete and outdated.
In this article, we’ll overview different areas of the software development lifecycle (SDLC) through which secrets can get exposed and discuss practical prevention and remediation methods. After reading this, you should understand:
- Which techniques are attackers using to steal your hardcoded secrets
- Why accurate visibility into your development pipelines, beyond just source code, is paramount to the success of secret scanning programs
- How to scale secret scanning initiatives to effectively support thousands of developers
Secrets Within the SDLC
When we talk about secrets in software, we generally refer to any sensitive data that takes part in the software development and CI/CD processes that should be kept confidential to an organization, such as passwords, access keys, API Tokens, cryptographic keys, or personally identifiable information (PII).
Modern applications rely on many services, such as external APIs and cloud resources, which require credentials. CI/CD actions perform deployment operations (store the artifact, provision cloud resources, run the application, etc.), which need high-privileged access to the runtime environment. Consequently, a vast number of credentials are hardcoded throughout the SDLC.
The most common place to find secrets is in the source code. Storing and accessing passwords securely requires more effort and can be time-consuming. The more straightforward solution is to hardcode the passwords in the source code and scripts. And the problem gets worse since git history is stored indefinitely. Most developers don’t even realize the password is still detectable even after it got deleted. Eventually, your source code management (SCM) system, be it GitHub, GitLab, or Bitbucket, becomes a source for a high number of hidden secrets that attackers can easily find using one of the many open-source secrets scanners.
You can read more about the dangers of secrets getting exposed through source code here.
Beyond Just Source Code
Even though source code is the most common place, SCMs are not the only services from which secrets can get leaked. Essentially, any service you’re using as part of your SDLC in which data is stored may be the source of secrets leakage. It’s important to be aware of this because once you make sure your source code is clean of secrets, attackers will look for them in other locations. Let’s go over five common examples.
1. Secrets in Build Logs
There are many CI services that help you build your software, such as Jenkins, Bamboo, Travis CI, CircleCI, and TeamCity. Moreover, most SCMs provide their own internal CI solution, like GitHub Actions, GitLab CI, and Bitbucket Pipelines. All these tools provide build output, a.k.a. build log - telling you the story of everything that happened during the CI job process. These build logs often contain sensitive secrets, and bounty hunters have already managed to exploit this concept. An example of a research work presenting the potential risk of secrets getting exposed through build logs was recently seen in the Travis CI case.
A common misconception is thinking that only DevOps administrators have access to these logs, but all too often, they are accessible to all the users of the CI service, which sometimes means the entire organization. Furthermore, there are many cases where these logs are accessible to the entire world:
- CI services that are configured to be publicly accessible and don’t restrict permissions (i.e., allow anonymous access). E.g., in a publicly exposed Jenkins server, where jobs can be accessed without special permissions - the job's output will be publicly accessible.
- Public repositories where its internal CI service is being used. E.g., a public GitHub repository running a GitHub Actions workflow will produce an output that will be visible to everyone.
When credentials get accidentally exposed via build logs, running a secret scanner on your source code will not suffice. You need to scan your build logs as well.
2. Secrets in Artifacts
In the previous section, we used a CI service to build our software, which produced our software artifacts (binary, library package, docker container, image, mobile app, etc.). The next phase is to store that artifact inside our artifact registry – JFrog Artifactory, DockerHub, Nexus Repository, Amazon ECR, Google GCR, etc. And again, most SCMs provide their own internal registries, like GitHub Container Registry, GitLab Container Registry, and Azure Container Registry. There are more locations where artifacts get stored, such as package managers (e.g., NPM, PyPI, RubyGems, Nuget) and mobile applications stores (Play store, App store). Bottom line - artifacts can find their way to various storage services, and each of them is a potential pool of secrets that might get leaked.
When creating a final application artifact, sensitive data can be packed by mistake. This is specifically dangerous if the artifact is being delivered to customers. Developers might be thinking that if they are developing their code inside a private network, then even if they mistakenly put secrets in their source code, then it’s not that big of a deal since they will stay only inside their safe environment. But they might be forgetting that, eventually, they are going to deploy their artifacts to the outer world, so if secrets are mistakenly added to those artifacts, they will no longer be kept inside their safe perimeter, and threat actors will be able to detect them easily. This is exactly what happened in the Codecov incident, where the attackers found credentials (that allowed them to initiate their supply chain attack) inside the codecov image. Other examples presenting the potential risk of secrets getting exposed through software artifacts can be seen in the cases where AWS credentials got leaked through iOS and Android applications and through PyPI packages.
When it comes to docker containers, the risk of accidentally exposing secrets is higher due to the fact that each step in the creation of the container is stored as a layer. For example, a common misconception in cases where sensitive files are used in dockerfiles is thinking that removing that file at some point will prevent exposure.
But that’s wrong - the secret file will be visible in the final container in one of its layers. Note that this example also shows that running a secrets scanner on the source code won’t necessarily help, as there are cases where secrets aren’t hardcoded in the dockerfile. Rather, files that contain secrets can get copied or downloaded into the container. In such cases, the only way to detect these secrets is by scanning the artifact itself.
3. Secrets in Documentation Services
Even though documentation services, such as wikis, Confluence, SharePoint, and so on, are not exactly SDLC assets, they are frequently used to store technical data, and secrets are being shared through these services without realizing the potential damage. Often, too many people have access to this data. It takes just one adversary to scan this data for secrets and use the findings maliciously.
Just like we described how the entire code activity is stored as part of git history, and removing secrets from source code doesn’t save you from its exposure - documentation services suffer from a similar problem. The entire documentation history is usually stored as part of the service. Let’s take a look at Atlassian Confluence as an example. Here’s a page that evidently doesn’t contain secrets:
However, looking at the history of this page reveals otherwise:
And it’s quite straightforward to write a script that traverses through an entire space, across all pages, including full history, and scans it for secrets.
It can even get much worse if the service being used is publicly accessible. As a study case, we scanned various public Confluence spaces and discovered dozens of exposed secrets. For example, a public Confluence space that belongs to a large enterprise was found to contain credentials to their GitHub organization, which could have allowed attackers to log into that organization, steal their source code, modify it, plant a backdoor, etc.
4. Secrets in AppSec services
As ironic as it sounds, an application security service might serve as another source for secrets leakage. AppSec tools usually scan your organization’s assets (source code, artifacts, etc.), and hence, when poorly configured, they might provide unwelcomed entities with access to your most sensitive resources. We recently published an article about the research we conducted on publicly exposed SonarQube servers, which exposed source code that included multiple credentials and confidential information. It is imperative to be aware of this potential attack surface and to make sure the tools you are using are well-configured and don’t allow malicious actors an easy gateway to your secrets.
5. Secrets in Cloud Assets
Modern software ends up being deployed to a cloud-based production, most commonly running in a Kubernetes environment, possibly managed on AWS, GCP, Azure, and so on. These providers offer managed storage and database services, such as Amazon’s S3 buckets, where secrets are often mistakenly placed. Attackers that manage to put their hands on critical cloud assets can use this location to harvest secrets. For example, misconfigured S3 buckets caused various cases of data leaks, such as SEGA's recent assets exposure and the Twilio incident. Another example that stresses the importance of being aware of this attack path can be found in a research work that recently demonstrated how attackers could leverage secrets found in Kubernetes assets to gain cloud takeover.
Prevention and Remediation
There are much more systems and services we can talk about, but the message is clear: secrets can get exposed through various unexpected parts of the SDLC. Let’s talk about mitigation. What can we do to keep our secrets safe?
- Increase awareness. Make sure your entire software development organization is fully aware of the different locations from which sensitive data can get exposed and the dangerous implications of such exposure.
- Harden your systems. Strengthen the security posture of all your SDLC assets and make sure no misconfigurations are introduced in a continuous fashion. Keep your services private and prevent anonymous access.
- Employ strict permissions policies. Wherever data is stored, strong policies must be employed to make sure the data is only accessible to trusted entities. Adhere to the principle of least privilege. Enforce multi-factor authentication.
- Prevent secrets in source code by placing checks at three locations:
- On endpoints – before commit & push (e.g., pre-receive hook)
- As a code-review check – before merge (e.g., PR check)
- Continuous online scan of the codebase
- Continuously scan resources for hardcoded secrets. Make sure to employ automated secret scanning tools to cover the different assets presented in the previous section: source code, build logs, artifacts, documentation pages, etc.
Secrets are a data security problem. They don’t only hide inside source code. Various services being used as part of your software factory can become the whistleblower of your organization. To increase the safety of your resources, gain visibility to all relevant assets, and analyze them for secrets. Place preventive controls. Assess the chance the secret is live and what is the business impact if it is exposed. Prioritize – check validity and service type.
As part of the goal to secure software supply chains, end to end, the Legit Security platform provides an enhanced secrets scanner that will help you cover all aspects described in this article. Visibility, control coverage, prevention, detection, and remediation. Supporting dozens of secret types, scanned through all resources that comprise your CI/CD. If you're interesting in learning more, request a free Rapid Threat Assessment.