Git with Unreal Engine 5
75GB in one commit. How well Git is suited for team projects with Unreal Engine 5.
To make teamwork possible with Unreal Engine, you need version control (or source control, in other words). Many know Git, which is the most widely used version control system for programmers. Git is a distributed system and can work very well with text files. But a game project mainly consists of binary data like textures, 3D models maps etc.
Git LFS allows versioning of large 3D models, textures and maps. It loads the data on-demand, so you can also work on large projects. In this test we use the demo content "Valley of the Ancient", which was released with the Early Access release of the Unreal Engine 5. We upload the complete content with a commit to a Git repository and test how well Git LFS can scale.
Unreal Engine 5 for collaboration
Unreal Engine 5 comes with special features for collaboration. Firstly, there is World Partition, where maps are automatically divided into a grid. And most importantly, changes are made at the actor level. So if I place a tree, the coordinates are not stored in the level file, but in a separate file, which is very small. This way you avoid conflicts or blocking your teammates by file locking. In addition, changes that are committed have a fairly small data size, since the level does not have to be re-uploaded.
Make Git accessible for your artists.
Choice of hosting provider
GitHub is known to everyone. It is the typical platform when you want to provide source code. Since we use Git LFS for binary data, LFS support is the key criterion. GitHub, GitLab, Bitbucket all support Git LFS, but with additional costs. At GitHub data packs have to be purchased. For $5 per month, you get 50 GB of storage and 50 GB of bandwidth. If you upload Valley of the ancient (over 75GB) to GitHub, one commit costs us $10, which does not make sense for our case.That's why we chose Azure Devops for hosting, because there is no extra charge for LFS data. Additionally, there are no limitations regarding storage and data size. Azure Devops can be used completely free of charge with a team of up to 5 people and offers the possibility to set up complete build pipelines similar to GitHub and GitLab.
If you want to dive deeper, we have composed an article about pros & cons of each Git hosting provider.
Git LFS stands for Git Large File Storage. It is an extension to Git to handle large binary data. Every known hosting provider like GitHub supports LFS.With Git LFS, the file is not loaded directly into the Git repository, but into another storage container. There is only a pointer in the Git repository that tells Git where this file is stored. You can think of it as two hard drives. On the internal (smaller) hard disk is the Git repository (i.e. the database), which knows the complete version history and stores text files. On the external (large) hard disk are the 3D models, textures and maps. On the internal hard disk there are only the links pointing to the external hard disk.
Configure the .gitignore and .gitattributes file
To use Git LFS, the appropriate file types must be marked in the .gitattributes file. This is the config file for Git. If this is not done, all binary data will be loaded into the Git repository without LFS.The .gitignore filters file types and folders that are not uploaded to the Git repository. In most cases these are cache files created by Unreal Engine or code editors.Here you can download a preconfigured .gitattributes and .gitignore file. Place the .gitattributes file in the root folder of your repository and the .gitignore file in the folder where your Unreal Engine data resides.
Our main 5 takeaways
How fast is the file tracking in Git? Is it fast enough with 20k files and 100 GB.
Git tracks each file for changes. It looks as if files have been added, deleted, renamed, moved or overwritten. The larger the data and the more data there is in a project, the longer this check takes. However in case of 20k files the slowdown is not a deal breaker. A "git status" command takes approx. 2 seconds.
Can this data be uploaded to a Git repository? Are there no packet loss problems?
In fact, the complete content was uploaded to the Git repository without any problems. Due to lost packages, some data had to be uploaded again, but this happens completely automatically. The whole upload took about 6 hours on a 40Mbit line.
What is the most affordable and easiest way to host a remote Git repository that can handle this amount of data?
Here the answer is quite clear: Azure Devops, as there are no limitations or costs regarding LFS data.
Are there limitations in data size and repository size?
Git has no limitations in terms of data size. In various tests, we have loaded single files with about 32 GB into a Git repository. There are no limitations here. The performance of Git is more affected by binary data that is not managed via LFS and ends up directly in the repository. You should always use the LFS system when working with binary data in Git.
How big is the cache overhead? How much space do I need on the hard disk to work with the project?
This is actually a weakness of a distributed version control system like Git. After a commit, the file is copied to the .git folder. This is the internal Git database, which is not human readble. This means that you always have the current version twice on your hard disk. Older versions can be removed via Git LFS Prune and downloaded from the server if needed. A solution would be data deduplication. Unfortunately, such a feature on Windows is only available on ReFS (an alternative to NTFS) file system, which are also only available on Windows 10 Enterprise. If you have formatted your hard disk with ReFS, you can use deduplication and the problem with duplicated versions would be solved.