What You Should Never Commit to a Git Repository

As a developer, you likely use Git for version control in your projects. Git is an incredibly useful tool for tracking changes, collaborating with others, and maintaining a record of your codebase over time. However, not everything belongs in a Git repository. In fact, there are several types of files you should never commit to Git.

Committing the wrong files to a repository can lead to a host of issues, including:

  • Repository bloat – Committing large, unnecessary files will make your repository larger than it needs to be, slowing down Git operations and consuming excess storage.
  • Security vulnerabilities – Committing sensitive information like credentials or private keys can expose you and your organization to security risks if your repository is compromised.
  • Unnecessary conflicts – Committing automatically generated files or personal configs can cause merge conflicts for you and other team members.
  • Slower Git operations – A bloated repository with a complex history will make common Git operations like cloning, fetching, and pushing slower.

To avoid these problems, it‘s important to be disciplined about what you commit to source control. Let‘s walk through the main categories of files that don‘t belong in a Git repository.

1. Files That Don‘t Belong to the Project

Your Git repository should only contain files that are directly relevant to your project. Accidentally committing unrelated files is a common mistake, especially if you aren‘t carefully controlling what gets staged.

Common examples of unrelated files include:

  • Operating system files like .DS_Store on macOS or Thumbs.db on Windows
  • IDE or text editor project config files like .vscode/ or .idea/
  • Log files
  • Temporary files created during local development and testing

None of these files are directly related to the actual project code, so they don‘t belong in source control. Most of the time these files are already covered by common .gitignore templates. But it‘s still important to double check what you‘re committing and avoid staging unrelated files.

2. Automatically Generated Files

Many projects involve some kind of build process to prepare code for production. This often involves transpiling, compiling, minifying, or otherwise transforming source files into generated files. In most cases, you should only commit your original source files to Git, not the generated output.

Some examples of generated files include:

  • Compiled CSS from a preprocessor like Sass, Less, or Stylus
  • Bundled, transformed, or minified JavaScript from a tool like webpack, Rollup, or Parcel
  • Compiled binaries from languages like Java, C++, or Go
  • Files generated by a static site generator like Hugo or Jekyll

The generated files can always be rebuilt from the source files, so they don‘t need to be tracked in Git. Committing them will just clutter the repository and may cause unnecessary merge conflicts if the generated files change.

One exception is if you‘re building an artifact that needs to be directly consumed somewhere else and you want to track the generated files. For example, you may want to track the built CSS and JS for a library that gets published to npm. But these are rare exceptions to the general rule.

3. Library Dependencies

Most modern projects have dependencies on open-source libraries and frameworks. These are usually installed through a package manager like npm, Composer, or Maven and defined in a configuration file like package.json.

In general, you shouldn‘t commit your installed dependencies to source control. It bloats the repository, makes cloning slower, and can even cause hard-to-troubleshoot issues if different environments have different package versions installed.

Instead, you typically just commit the manifest file that defines your dependencies (like package.json) and let each environment install the dependencies it needs based on that. If you‘re working on an application, this is almost always the right approach.

However, there are some cases where you may want to check in dependencies:

  • A library or framework that isn‘t available through a package manager
  • Specific, pinned versions of dependencies required for an application
  • Cases where you want a guaranteed offline install or to control the exact dependency code

If you do check in dependencies, it‘s usually best to put them in a separate, isolated directory like a lib/ folder. That makes the repository structure cleaner and dependencies easier to identify. But in general, avoid committing dependencies unless you have a compelling reason to.

4. Credentials and Sensitive Information

One of the most dangerous things you can do is commit private credentials or keys to a Git repository, especially if that repository is public. Exposing sensitive information in source control can easily compromise your application and infrastructure.

Examples of information that should never be committed include:

  • Usernames and passwords
  • SSH keys
  • API keys or tokens
  • Encryption keys
  • Other private credentials

If an attacker gains access to valid credentials, they can wreak all kinds of havoc – stealing data, impersonating users, deleting resources, and more. There are bots that scan GitHub and other code hosting platforms for accidentally leaked credentials. So this is a real risk.

Instead of committing credentials directly, use environment variables or a secrets management system to provide credentials to your application. Tools like AWS Secrets Manager or HashiCorp Vault make it easy to securely store and access sensitive information. You can also use local environment variable files like .env, as long as you don‘t commit those files.

If you do accidentally commit something sensitive, remove it from the repository‘s history as soon as possible. If you committed a file with sensitive information, you‘ll need to remove it from the history and force push the updated history. GitHub also has guides for removing sensitive data if you need more detailed instructions.

5. Large Files and Binaries

Git isn‘t well-suited for storing large files, especially large binaries like images, videos, or datasets. Git is designed to track text-based source code, not big blobs of data.

Committing large files has several downsides:

  • Significantly increases repository size
  • Makes common operations like cloning and fetching very slow
  • May hit storage quotas or bandwidth caps on a code hosting platform
  • Consumes a lot of local disk space for every team member

While Git can technically handle binary files, it‘s not optimized for that use case. Versioning large binaries with Git leads to repository bloat because Git stores a full copy of every version of the file. For large, frequently updated binaries, this can quickly balloon the repository size.

Instead of committing large files directly to Git, consider an alternative:

  • Store assets externally and reference them with a URL or file path (e.g. images or videos on a CDN)
  • Use Git Large File Storage (LFS) to store large files outside the repository while still tracking them with Git
  • Break up large files or datasets into smaller chunks that can be stored separately

There are some cases where it may make sense to store larger files or binaries in Git, like if you need to tightly couple an asset with your code. But in general, find an alternative if you can.

6. IDE, Editor, and Shell Configuration Files

Most developers customize their local development environment with personal configuration files and settings. While this improves the local development experience, these personal configurations rarely need to be shared with the entire team.

Examples of personal configs that shouldn‘t be committed include:

  • .bashrc, .bash_profile, or .zshrc customizing your shell
  • .vimrc or .emacs customizing your text editor
  • .idea/ or .vscode/ storing project settings for an IDE
  • Any files starting with ._ which often store personal preferences

There‘s no need to commit these files to source control since they‘re specific to your local environment. Committing them is unnecessary and may cause confusion if other team members have different configurations.

Instead, just add relevant patterns to your .gitignore to avoid accidentally staging personal configs. If there are shared configurations that are required to work on the project, like a common .editorconfig, then those can be committed. But the default should be leaving out personal configs.

Managing Secrets with Environment Variables

Often applications need access to credentials or other sensitive information at runtime, but we‘ve seen that committing secrets to source control is a big no-no. So how do you bridge that gap?

The most common approach is to use environment variables to store secrets outside of code. Environment variables allow you to pass in configuration to a process from the shell or deployment environment. Your application code can then read those values at runtime.

To use environment variables locally, you can add them to a file like .env in your project root:

API_KEY=abc123
DB_PASSWORD=s3cr3t

Then add .env to your .gitignore to avoid committing it. Use a library like dotenv to read the variables into your application code.

For production, you can set the environment variables on your deployment target. Most hosting platforms and deployment tools have a way to securely set environment variables for your production environment.

Using environment variables keeps secrets out of source control while still providing them to your application. Just be sure never to commit your .env file or log the values of secrets at runtime.

Keeping Secrets Out of Git with .gitignore

We‘ve covered a lot of different types of files that should stay out of source control. Remembering all those rules would be a big challenge, especially since many of these files can be generated frequently as part of local development.

Fortunately, you don‘t have to keep a running list in your head. Git provides a .gitignore file that allows you to specify files and patterns that should always be ignored by Git.

Here‘s an example .gitignore covering many common patterns:

# Dependency directories
node_modules/

# Compiled code and build outputs
/dist
/tmp
/out-tsc

# IDE/editor files
.idea/
.project
.classpath
.c9/
*.launch
.settings/
*.sublime-workspace

# System files 
.DS_Store
Thumbs.db

# Env files
.env
.env.local
.env.development.local
.env.test.local
.env.production.local

# Logs
logs
*.log
npm-debug.log*

Each line specifies a file, directory, or wildcard pattern to ignore. When you run commands like git add ., Git will automatically skip over any matching files and not stage them.

You should commit .gitignore to your repository so that everyone on the team shares the same set of ignored files. You can also find many example .gitignore files for common project types on GitHub to use as a starting point.

Cleaning Up a Repository

What if you‘ve already committed files that shouldn‘t be in source control? It‘s never too late to clean things up. Removing files from a Git repository is a bit trickier than adding them, but it can be done.

If you‘ve committed a file containing secrets or credentials, you should:

  1. Rotate the compromised credentials so they‘re no longer valid
  2. Remove the file from the entire Git history, not just the current branch
  3. Force push the updated history to remote repositories
  4. Communicate with your team about what happened and any required local cleanup

Rewriting Git history is tricky business, so if you aren‘t comfortable with the process, find someone who can help.

For less urgent clean up, like removing a large file from the HEAD revision, you can use commands like git rm –cached to remove it from Git without deleting the actual file. You can then add the file to .gitignore and commit to remove it.

When to Break the Rules

We‘ve covered a lot of rules and best practices in this post. But as with most things in software development, there are exceptions to every rule. Sometimes it actually does make sense to commit files that normally wouldn‘t belong in source control.

Some examples of when you may want to bend the rules:

  • Committing a minified JavaScript library that isn‘t available through a package manager in order to simplify installation
  • Tracking a small, stable binary file directly in the repository because it‘s a core part of the application
  • Storing credentials for a local development database to streamline onboarding

The key is to think critically about whether committing the file provides more value than it costs. Committing a 5MB binary is very different than committing a 5GB binary. Use your best judgment and get feedback from your team.

Conclusion

Git is an essential tool for modern software development. But like any tool, it can be misused. Committing the wrong types of files to a Git repository leads to bloated codebases, slow performance, and even security breaches.

Avoid committing:

  1. Files unrelated to the project
  2. Automatically generated files
  3. Library dependencies installed through package managers
  4. Credentials, secrets, and other sensitive information
  5. Large files and binaries
  6. Local developer configs and settings

Follow Git best practices by:

  • Using .gitignore to automatically prevent committing ignored files
  • Leveraging tools like environment variables and secret managers for credentials
  • Removing secrets and other problematic files from the entire Git history
  • Only making exceptions to these rules when the benefits outweigh the costs

By keeping your Git repositories clean and focused, you‘ll have a much easier time managing and scaling your codebase over time.

Do you have any other Git best practices to share? Let me know in the comments!

Similar Posts