I believe git submodules are easy to use, and by the end of this blog post I think you will too.

Before we get to building and explaining the example of using submodules, you might like to install the pre-built example to play along with it.

I think in this case I strongly recommend you install the example and test it so that you can prove for yourself that a working package with submodules is possible.

Play Along Code

This example is precoded and ready to be played along with.

It is available on pypi.org and also github.

Available To Be Directly Installed From Pypi.org

(submodules) [email protected]:~$ python3 -m pip install pkgexamplesubmodules
Collecting pkgexamplesubmodules
  Using cached pkgexamplesubmodules-0.0.1-py3-none-any.whl (4.3 kB)
Installing collected packages: pkgexamplesubmodules
Successfully installed pkgexamplesubmodules-0.0.1
(submodules) [email protected]:~$ 

Source Code Available From Github


https://github.com/RexBytes/pkgexamplesubmodules
0 forks.
0 stars.
0 open issues.

Recent commits:

What Does This Code Do

Before continuing, just to let you know as usual we will be using the source layout for our python package.
This includes the example submodules too. More information on the source layout from our tutorial here, and from the python documentation here.

This packaged code demonstrates the usage of subpackages/submodules inside a parent package. Sub packaging is useful when you have chunks of code that you want to be able to reuse across all of you projects. You can maintain your subpackages as separate git repositories and include them in any new project.

In our example the parent package has the name “pkgexamplesubmodules” and has its own github repository.

The parent package uses components available in TWO other git repositories. To enable access to these components, the parent package includes them in to its package structure as TWO seperate submodules, “rexsubmodule1” and “rexsubmodule2“.

The Visual Studio Code file view below shows the submodules automatically identified by the letter ‘s’ next to the submodule directories.

The submodules contain very simple code just to demonstrate the workings of submodules as part of a larger parent package structure. Each submodule has classes that say hello, and identify which module they are being called from.

Placing Submodules

The submodule “rexsubmodule1” is placed directly in the “/src/” directory, it has the following class code inside its sayhello.py module.

class SayHello:
    def __init__(self):
        self.a_simple_message = "Submodule 1 says hello!"

    def sayhello(self):
        print(self.a_simple_message)

The submodule “rexsubmodule2” is placed directly in the “/src/pkgexamplesubmodules/” directory, it has the following class code inside its sayhello.py module.

class SayHello:
    def __init__(self):
        self.a_simple_message = "Submodule 2 says hello!"

    def sayhello(self):
        print(self.a_simple_message)

I’m placing the submodules in different places to give you two different examples of choices of where you can place your submodules in the future. I’ve not found a solid example showing you that you can do this, and how to do this, elsewhere.

Working With Multiple Source Directories & Linting

You can place your submodule anywhere in your ‘src’ directory, in its root, or in another sub-directory. You can also define more than one ‘src’ directory in your ‘pyproject.toml‘ config file using the following table entry,

[tool.setuptools.packages.find]
where = ["src","my_other_src"]

if you wanted you could rename your src directory entirely, but I recommend sticking with the conventional ‘src’.
If you don’t stick to conventions (or want to add multiple source directories), you will need to set paths for your linter as described below.

!WARNING! You must make these additions to your ‘pyproject.toml’ file before your interactively install your package for editing otherwise any attempts to import python modules in your code will fail. If you do add new source directories mid project, your must first edit your ‘pyproject.toml’, then uninstall and install your package interactively.

If you want linting in visual studio code to continue to find your modules in your new source directories (which in this case are submodules), you must add the following ‘.vscode’ directory at the root of your package, in the same level as the ‘pyproject.toml’ file.

Inside your ‘.vscode’ directory, create a ‘settings.json’ file and add the path to your other source directories/submodules.
Structure your package as you with, but make sure you tell the above file where your extra source directories are.

I like to create a source directory called ‘submodules’, and check out all my submodules there.

My settings looks like this,

Visual Studio Code should now be able to lint your submodule/ other source directories.

If you are lucky, when linting fails, Visual Studio Code sometimes will prompt you to ask if you would like to create the above for you.

Accessing Classes In Your Submodules

The parent package has a module named ‘my_submoduleexample_module.py‘ which imports the classes from the submodules and then using our argparse knowledge, calles them.

import argparse
from rexsubmodule1.src.rexsubmodule1.sayhello import SayHello as SM1_SayHello
from .rexsubmodule2.src.rexsubmodule2.sayhello import SayHello as SM2_SayHello


def my_submodule():
    sub1_sayhello = SM1_SayHello()
    sub2_sayhello = SM2_SayHello()
    main_group_parser = argparse.ArgumentParser(
        description="A submodule package usage example."
    )

    main_group_parser.add_argument(
        "-o",
        "--subone",
        action="store_true",
        help="Submodule one will say hello.",
    )

    main_group_parser.add_argument(
        "-t",
        "--subtwo",
        action="store_true",
        help="Submodule two will say hello.",
    )

    my_args = main_group_parser.parse_args()

    if my_args.subone:
        sub1_sayhello.sayhello()

    if my_args.subtwo:
        sub2_sayhello.sayhello()

Pay close attention to the import statements, and the comments describing where they are importing from.

#[submodule1] Import from a submodule in the root of the /src/ directory
from rexsubmodule1.src.rexsubmodule1.sayhello import SayHello as SM1_SayHello

#[submodule2] Import from a submodule residing inside your package directory /src/pkgexamplesubmodules/ 
from .rexsubmodule2.src.rexsubmodule2.sayhello import SayHello as SM2_SayHello

You might have also noticed I’m also using python import alias names ‘SM1_SayHello‘, ‘SM2_SayHello‘ to identify the classes of the same name ‘SayHello‘ from different submodules.

Running The Example Code

Installing the package ‘pkgexamplesubmodules’, will install the console command ‘rexsubmodule‘.

Let’s look at the argparse help.

(submodules) [email protected]:~$ rexsubmodule --help
usage: rexsubmodule [-h] [-o] [-t]

A submodule package usage example.

options:
  -h, --help    show this help message and exit
  -o, --subone  Submodule one will say hello.
  -t, --subtwo  Submodule two will say hello.
(submodules) [email protected]:~$ 

There are two switches, ‘
–subone‘ which calls the code from rexsubmodule1, and
–subtwo‘ which calls the code from rexsubmodule2.

(submodules) [email protected]:~$ rexsubmodule --subone
Submodule 1 says hello!
(submodules) [email protected]:~$ rexsubmodule --subtwo
Submodule 2 says hello!

This running example shows you that it is possible to,

  • include submodules in your package.
  • place and access a submodule at the /src/ level.
  • place and access a submodule at the package level /srs/pkgexamplemodules/.

Building A Python Package With Submodules

Now we’ve had a play with working code, let’s run through building a python package with submodules, and all the gotchas.

Starting Point

First we create a src layout parent package. It’s the ‘pkgexamplesubmodules‘ package we’re familiar with.

pkgexamplesubmodules/
├── LICENCE
├── pyproject.toml
├── README.md
├── setup.py
└── src
    └── pkgexamplesubmodules
        ├── __init__.py
        └── my_submoduleexample_module.py

Adding Submodules RexSubmodule1 & RexSubmodule2

We need to use the ‘git submodule add‘ command from inside the directory where we want to add a submodule.

We’re going to add it to our src directory

cd ./pkgexamplesubmodules/src

git submodule add [email protected]:RexBytes/rexsubmodule1.git

Your new package layout should look like this,

pkgexamplesubmodules
├── LICENCE
├── pyproject.toml
├── README.md
├── setup.py
└── src
    ├── pkgexamplesubmodules
    │   ├── __init__.py
    │   └── my_submoduleexample_module.py
    │ 
    └── rexsubmodule1                       <--- HERE IS YOUR FIRST SUBMODULE
  

Let’s add the second submodule to our package directory,

cd ./pkgexamplesubmodules/src/pkgexamplesubmodules
git submodule add [email protected]:RexBytes/rexsubmodule2.git

Here is the new layout,

pkgexamplesubmodules
├── LICENCE
├── pyproject.toml
├── README.md
├── setup.py
└── src
    ├── pkgexamplesubmodules
    │   ├── __init__.py
    │   ├── my_submoduleexample_module.py
    │   └── rexsubmodule2                 <--- HERE IS YOUR SECOND SUBMODULE
    │
    └── rexsubmodule1

The submodules are packages in their own right.

The ‘git submodule‘ command ensures that the submodules remain independent of your parent package, and only changes in your parent package are tracked.

If you check the .git/config file of your parent package you will see that git makes a note of all of your subpackages.

cat .git/config

[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[remote "origin"]
        url = [email protected]:RexBytes/pkgexamplesubmodules.git
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
        remote = origin
        merge = refs/heads/main
[user]
        email = [email protected]
        name = goodboy
[submodule "src/rexsubmodule1"]
        url = [email protected]:RexBytes/rexsubmodule1.git
        active = true
[submodule "src/pkgexamplesubmodules/rexsubmodule2"]
        url = gi[email protected]:RexBytes/rexsubmodule2.git
        active = true

If you run a git status on your parent package you will see,

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   .gitmodules
	new file:   src/pkgexamplesubmodules/rexsubmodule2
	new file:   src/rexsubmodule1

that the submodule directories are actually tracked as files.

From the parent package point of view, a submodule is a file that records the commit point of the submodules git repository.
Every time you change the version of your submodule, your parent package will detect this as a change in a file, the change will be a different commit point string.

If you check the git diff, you will also see that the parent package only really cares about the commit point of each submodule.

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git diff --cached ./src/pkgexamplesubmodules/rexsubmodule2
diff --git a/src/pkgexamplesubmodules/rexsubmodule2 b/src/pkgexamplesubmodules/rexsubmodule2
new file mode 160000
index 0000000..66afd9c
--- /dev/null
+++ b/src/pkgexamplesubmodules/rexsubmodule2
@@ -0,0 +1 @@
+Subproject commit 66afd9cf293261fdaf44c7eead0c39dd9b89f67e

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git diff --cached ./src/rexsubmodule1
diff --git a/src/rexsubmodule1 b/src/rexsubmodule1
new file mode 160000
index 0000000..2cd3609
--- /dev/null
+++ b/src/rexsubmodule1
@@ -0,0 +1 @@
+Subproject commit 2cd3609ed534b8dbfe26f6ab8c82ee6fb80a567a

In this case,

+Subproject commit 66afd9cf293261fdaf44c7eead0c39dd9b89f67e‘ for rexsubmodule 1, and
+Subproject commit 2cd3609ed534b8dbfe26f6ab8c82ee6fb80a567a‘ for rexsubmodule 2.

While we are at it, let’s commit and push these submodules, which are changes on the parent package.

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git add .gitmodules ./src/pkgexamplesubmodules/rexsubmodule2/ ./src/rexsubmodule1/
(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git commit .gitmodules ./src/pkgexamplesubmodules/rexsubmodule2/ ./src/rexsubmodule1/ -m "commit submodules"
[main 5c6664b] commit submodules
 3 files changed, 8 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 src/pkgexamplesubmodules/rexsubmodule2
 create mode 160000 src/rexsubmodule1

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git push
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 2 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 550 bytes | 550.00 KiB/s, done.
Total 5 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To rb.github.com:RexBytes/pkgexamplesubmodules.git
   6e6ed93..5c6664b  main -> main
(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ 

You will see ‘create mode 160000‘ next to any new submodules you commit to a project.

You can treat subpackages/submodules as you would treat any of your other git repos.

If you change the commit point in a submodule, by checking out a previous version, or advancing the version by committing, the parent package will pick up on this and give you the same message as above that the commit point has changed.

Commit and push your parent package to sync your package versions.

Just To Prove Active Repo Changes

Just to prove that when you use the git command in the different directories, the active repo changes.

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git remote show origin
* remote origin
  Fetch URL: [email protected]:RexBytes/pkgexamplesubmodules.git
  Push  URL: [email protected]:RexBytes/pkgexamplesubmodules.git
  HEAD branch: main
  Remote branch:
    main tracked
  Local branch configured for 'git pull':
    main merges with remote main
  Local ref configured for 'git push':
    main pushes to main (up to date)


(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ cd ./src/rexsubmodule1/
(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules/src/rexsubmodule1$ git remote show origin
* remote origin
  Fetch URL: [email protected]:RexBytes/rexsubmodule1.git
  Push  URL: [email protected]:RexBytes/rexsubmodule1.git
  HEAD branch: main
  Remote branch:
    main tracked
  Local branch configured for 'git pull':
    main merges with remote main
  Local ref configured for 'git push':
    main pushes to main (up to date)



(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules/src/rexsubmodule1$ cd ../pkgexamplesubmodules/rexsubmodule2/
(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules/src/pkgexamplesubmodules/rexsubmodule2$ git remote show origin
* remote origin
  Fetch URL: [email protected]:RexBytes/rexsubmodule2.git
  Push  URL: [email protected]:RexBytes/rexsubmodule2.git
  HEAD branch: main
  Remote branch:
    main tracked
  Local branch configured for 'git pull':
    main merges with remote main
  Local ref configured for 'git push':
    main pushes to main (up to date)

You can see that running ‘git remote show origin‘ in the parent package, and the submodules, give the name of the different repos.

This should give you the confidence to go ahead and hack away, knowing all 3 package repos are seperated.

Working On Submodule Code

Empty Submodules

Unfortunately when you add submodules to a project, when you check out the parent directory for the first time you will find that the submodule directories are empty.

You need to run the following two commands inside your parent package structure to populate them. Git will populate tham at the commit point you set/commited/pushed for your parent package,

git submodule init

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git submodule init
Submodule 'src/pkgexamplesubmodules/rexsubmodule2' ([email protected]:RexBytes/rexsubmodule2.git) registered for path 'src/pkgexamplesubmodules/rexsubmodule2'
Submodule 'src/rexsubmodule1' ([email protected]:RexBytes/rexsubmodule1.git) registered for path 'src/rexsubmodule1'


and,

git submodule update

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git submodule update
Cloning into '/home/ubuntu/myrepos/rex/pkgexamplesubmodules/src/pkgexamplesubmodules/rexsubmodule2'...
Cloning into '/home/ubuntu/myrepos/rex/pkgexamplesubmodules/src/rexsubmodule1'...
Submodule path 'src/pkgexamplesubmodules/rexsubmodule2': checked out '66afd9cf293261fdaf44c7eead0c39dd9b89f67e'
Submodule path 'src/rexsubmodule1': checked out '2cd3609ed534b8dbfe26f6ab8c82ee6fb80a567a'

If you look inside your submodules, you will find that they are now populated at the commit point you started using them in your project.

Did you notice the following two lines in the output from the ‘git submodule update‘ command above?

Submodule path 'src/pkgexamplesubmodules/rexsubmodule2': checked out '66afd9cf293261fdaf44c7eead0c39dd9b89f67e'
#Equivalent to git checkout '66afd9cf293261fdaf44c7eead0c39dd9b89f67e'

Submodule path 'src/rexsubmodule1': checked out '2cd3609ed534b8dbfe26f6ab8c82ee6fb80a567a'
#Equivalent to git checkout '2cd3609ed534b8dbfe26f6ab8c82ee6fb80a567a'

You can see that git will run the familiar commands to check out a commit point of a repository,

git checkout <commit hash>

You can see the commit hashes for different commit points if you run ‘git log‘ from your local repo directory.

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git log
commit 87f1c1f0592aed98175a5e4f80da5b57ab7c920f (HEAD -> main, tag: v0.0.2, origin/main, origin/HEAD)
Author: goodboy <[email protected]>
Date:   Fri Nov 25 16:25:29 2022 +0100

    save

commit 01610fc5e5490343ae97b00cb233f3fc7e7d9bb6
Author: RexBytes <[email protected]>
Date:   Fri Nov 25 16:17:53 2022 +0100

    Initial commit
(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ 

These commit hashes don’t match the above as I had to re-create my repos half way through this article, but the concept remains.

Developing Submodule Code

A common reason for including submodules in your parent package is to develop your parent module and your submodules in tandem.

To actually work on the code, and not merely refer to it from your parent package you must check out a branch.

/pkgexamplesubmodules/src/rexsubmodule1$ git checkout main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.

You can now work on your submodule package, and git version control it just as you would if it were checked out in its own repo directory.

Make sure you commit any changes.

Commiting Changes To The Parent Module

I have made some basic changes to ‘rexsubmodule1’, and have commited them to the submodule repository.

Let’s take a look at how the parent package detects the changes I made in the submodule.

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git status
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   src/rexsubmodule1 (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

It correctly identifies that there are no changes in the parent package, but does say that ‘rexsubmodule1‘ has changed.

Let’s take a look at the diff changes,

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git diff src/rexsubmodule1
diff --git a/src/rexsubmodule1 b/src/rexsubmodule1
index 2cd3609..3a6cec9 160000
--- a/src/rexsubmodule1
+++ b/src/rexsubmodule1
@@ -1 +1 @@
-Subproject commit 2cd3609ed534b8dbfe26f6ab8c82ee6fb80a567a
+Subproject commit 3a6cec9581d2a598bc764e18c9c7a3003a613eea
(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ 

Remember, I am still in the parent package directory, and the active repo is the parent repository.


The diff only shows that the submodule ‘rexsubmodule1‘ has a new commit point ‘+Subproject commit 3a6cec9581d2a598bc764e18c9c7a3003a613eea‘. It doesn not care, or track at all what the changes are, it will only record the commit point.

Let’s commit this.

(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git commit ./src/rexsubmodule1 -m "test commit"
[main ed9b4a1] test commit
 1 file changed, 1 insertion(+), 1 deletion(-)


(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 2 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 322 bytes | 322.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To rb.github.com:RexBytes/pkgexamplesubmodules.git
   692c484..ed9b4a1  main -> main
(submodules) [email protected]:~/myrepos/rex/pkgexamplesubmodules$ 

In this way, you can change the version of your submodules by checking out the required submodule versions, and then committing the parent package.

The parent package will commit only the commit point of the submodule.

I hope that’s clear.

If not, I hope this makes it clearer… If you look at your repository in github, navigate to the src directory,

you will see that there is NO CODE for your submodule listed there.

Instead, there is a pointer that points to the commit point of your submodules repository.
Click it… it will take you to its own repository.

Make it a habit to push changes to your submodules first before your parent package, otherwise other people working on your project won’t have access to your latest changes.

The documentation says you should use the following push command on your parent packages.

git push --recurse-submodules=check

It should prevent your from pushing if your submodules are not up-to-date remotely, but it didn’t work for me.

Full Package Structure

Just so we have a note of it, here is the full package structure once you have checked out all of the submodules too.

pkgexamplesubmodules
├── LICENCE
├── pyproject.toml
├── README.md
├── setup.py
└── src
    ├── pkgexamplesubmodules                      <-- Your main parent package
    │   ├── __init__.py
    │   ├── my_submoduleexample_module.py
    │   └── rexsubmodule2                          <-- Your second submodule.
    │       ├── LICENCE
    │       ├── pyproject.toml
    │       ├── README.md
    │       └── src
    │           └── rexsubmodule2
    │               ├── __init__.py
    │               └── sayhello.py
    └── rexsubmodule1                              <-- Your first submodule.
        ├── LICENCE
        ├── pyproject.toml
        ├── README.md
        └── src
            └── rexsubmodule1
                ├── __init__.py
                └── sayhello.py

Updating Or Committing Submodule Code With Multiple Git Accounts

At some point you might find yourself checking out multiple submodules from multiple git accounts.

This can happen, how?

You may have a work git account, a client git account, and a personal git account and you might want to share git repositories across projects.

You can manage git users in the exact same way we covered here.

Usage of alias commands are described in the link above, changing users for a submodule is as easy as running your alias from inside the submodule directory.

[email protected]:~/myrepos/rex/pkgexamplesubmodules/src/rexsubmodule1$ git setrb

If you then check the submodule git config file you will see that a git user has been set for that submodule.

You can find the location of your submodule config file by looking at the .git file in your submodules root.

[email protected]:~/myrepos/rex/pkgexamplesubmodules/src/rexsubmodule1$ cat .git
gitdir: ../../.git/modules/src/rexsubmodule1


Notice it is a file, and not as you would expect the usual .git config directory.

This is because the parent git repo keeps the submodules .git directory in a subdirectory of it’s own .git config directory. It then makes a note of its location in a .git file in your submodule root.

This is probably too much information, but you can now confidently commit your submodule as one user, and your parent module as another (Don’t forget to set an alias for your parent module).

[email protected]:~/myrepos/rex/pkgexamplesubmodules/src/rexsubmodule1$ cat ../../.git/modules/src/rexsubmodule1/config 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	worktree = ../../../../src/rexsubmodule1
[remote "origin"]
	url = [email protected]:RexBytes/rexsubmodule1.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
	remote = origin
	merge = refs/heads/main
[user]
	email = [email protected]    <----- Here is the submodule user details.
	name = goodboy

Conclusions

Submodules are just so very useful. Only a tiny little bit complicated, but once you wrap your head around them they can help you write repeatable code that you can use across all of your projects. It has saved me from re-inventing the wheel many times.

One thought on “Working With Git Submodules”
%d bloggers like this: