Storing Settings & Data Files At The Package Level
There may be occasions where you want to store application data and configuration files outside of the reach of your users, and store it directly in the package files themselves.
Example Package
For example, in the following package structure there is a subdirectory ‘data’ what contains a ‘readme.json’ configuration file.
All of the package level file handling code is in the ‘my_datafiles_module.py’.
Package Layout
Remember, even if your directory only contains data files it needs an “__init__.py” file.
pkgexampledatafiles/
├── LICENCE
├── pyproject.toml
├── README.md
├── setup.py
└── src
└── pkgexampledatafiles
├── data
│ ├── __init__.py
│ └── readme.json <---Your data file
├── __init__.py
└── my_datafiles_module.py
Available On Github
The solution you are looking for is already coded and available on github.
Available On Pypi.org
And available on pypi.org https://pypi.org/project/pkgexampledatafiles/ for you to install and play along with.

(pkgexampledatafiles) ubuntu@goodboy:~$ python3 -m pip install pkgexampledatafiles
Collecting pkgexampledatafiles
Downloading pkgexampledatafiles-0.0.1-py3-none-any.whl (3.9 kB)
Installing collected packages: pkgexampledatafiles
Successfully installed pkgexampledatafiles-0.0.1
(pkgexampledatafiles) ubuntu@goodboy:~$ rexdatafile --help
usage: rexdatafile [-h] [-l] [-r] [-c] [-w] [-d]
A package datafiles example
options:
-h, --help show this help message and exit
-l, --list List package data files
-r, --readme Outputs data/readme.json
-c, --copy Copies data/readme.json to data/writetome.json
-w, --writetome Outputs data/writetome.json
-d, --delete Delete data/writetome.json
(pkgexampledatafiles) ubuntu@goodboy:~$
Code Solution
This is quite straightforward when you know it.
Pyproject.toml entry
You need to make the following entry in your pyproject.toml file. The directory containing your data files is referenced in namespace format as “pkgexampledatafiles.data”, and on the right you define your config file pattern, or no pattern.
[tool.setuptools.package-data]
"pkgexampledatafiles.data"=["*.json"]
File access using ‘importlib.resources’
First import this package in your code.
import importlib.resources
Create a traversable resource container joining your datafile name. You can see the namespace reference to the directory in the package “pkgexampledatafiles.data” and also your data filename “readme.json”.
my_traversable_resource_container = importlib.resources.files(
"pkgexampledatafiles.data").joinpath("readme.json")
You then create a context manager what will give you a path to your datafile. Here I show you how an ‘importlib.resources.as_file’ example on usage.
my_pathlib_context_manager = importlib.resources.as_file(
my_traversable_resource_container)
Finally, you can now use this to access the file to read, or to write to.
with my_pathlib_context_manager as fullfilepath:
with open(fullfilepath, "r") as myfile:
text = myfile.read()
print(text)
More clear as mud documentation on importlib.resources available here.
Playalong Example
Please inspect the code at github for tips and tricks on integrating the above on one of your package distributions. I hope by now you have read the other posts of creating a python package running with a console command.
Here we play along. First let’s check the available help file. The console command for our package is “rexdatafile”
(pkgexampledatafiles) ubuntu@goodboy:~$ rexdatafile --help
usage: rexdatafile [-h] [-l] [-r] [-c] [-w] [-d]
A package datafiles example
options:
-h, --help show this help message and exit
-l, --list List package data files
-r, --readme Outputs data/readme.json
-c, --copy Copies data/readme.json to data/writetome.json
-w, --writetome Outputs data/writetome.json
-d, --delete Delete data/writetome.json
(pkgexampledatafiles) ubuntu@goodboy:~$ rexdatafile --list
/home/ubuntu/myenvs/pkgexampledatafiles/lib/python3.10/site-packages/pkgexampledatafiles/data/readme.json
/home/ubuntu/myenvs/pkgexampledatafiles/lib/python3.10/site-packages/pkgexampledatafiles/data/writetome.json
List Our Config Files
You might like to list the config files in your package, and their full location after install.
-l, --list List package data files
Let’s see this in action,
(pkgexampledatafiles) ubuntu@goodboy:~$ rexdatafile --list
/home/ubuntu/myenvs/pkgexampledatafiles/lib/python3.10/site-packages/pkgexampledatafiles/data/readme.json
/home/ubuntu/myenvs/pkgexampledatafiles/lib/python3.10/site-packages/pkgexampledatafiles/data/writetome.json
I’ve actually defined two datafiles to help with this demonstration. “readme.json” is a permanent datafile, and “writetome.json” is a file we are going to write to, delete, and write to again.
So this is what the package will look like wherever it is installed.
pkgexampledatafiles/
├── LICENCE
├── pyproject.toml
├── README.md
├── setup.py
└── src
└── pkgexampledatafiles
├── data
│ ├── __init__.py
│ └── readme.json <---Permanent data file
│ └── writetome.json <---Temporary data file
├── __init__.py
└── my_datafiles_module.py
Let’s list the contents of the “readme.json” datafile, and “writetome.json” using the following switches.
-r, --readme Outputs data/readme.json
-w, --writetome Outputs data/writetome.json
(pkgexampledatafiles) ubuntu@goodboy:~$ rexdatafile --readme
{
"supersecretdata": [
{
"name": "data1",
"datapoint": "A"
},
{
"name": "data2",
"datapoint": "B"
}
]
}
(pkgexampledatafiles) ubuntu@goodboy:~$ rexdatafile --writetome
File does not exist
You can see that our “readme.json” file has contents, and the “writetome.json” does not.
Let’s copy the config file data across using the following switch.
-c, --copy Copies data/readme.json to data/writetome.json
(pkgexampledatafiles) ubuntu@goodboy:~$ rexdatafile --copy
{
"supersecretdata": [
{
"name": "data1",
"datapoint": "A"
},
{
"name": "data2",
"datapoint": "B"
}
]
}
(pkgexampledatafiles) ubuntu@goodboy:~$ rexdatafile --writetome
{
"supersecretdata": [
{
"name": "data1",
"datapoint": "A"
},
{
"name": "data2",
"datapoint": "B"
}
]
}
(pkgexampledatafiles) ubuntu@goodboy:~$
The copy switch outputs the first config file, as well as copything it to the second. I’ve used the –writetome switch to output the
contents to verify that the content has been copied across.
Let’s now delete the contents using the following command,
-d, --delete Delete data/writetome.json
(pkgexampledatafiles) ubuntu@goodboy:~$ rexdatafile --delete
(pkgexampledatafiles) ubuntu@goodboy:~$ rexdatafile --writetome
File does not exist
Checking if the file has been deleted confirms it has.
This should cover everything you need to handle files inside your package.
My implementations, including the argparse calls are detailed at the following location.
Additional Notes
This is an extra note and not included in the above pypi package but I do use it in this blog post.
You may want to group your data and config in a separate directory under ‘src’. In the layout below you can see that I have created a directory in the root of ‘src’ called ‘moredata‘ which contains a data file named ‘moreconfig.json’.
Do not forget to add a ‘__init__.py‘ file in your data directory. This is needed.
pkgexampledatafiles/
├── LICENCE
├── pyproject.toml
├── README.md
├── setup.py
└── src
├── pkgexampledatafiles
│ ├── data
│ │ ├── __init__.py
│ │ └── readme.json
│ ├── __init__.py
│ └── my_datafiles_module.py
└── moredata
├── __init__.py
└── moreconfig.json <---Datafile in another src subdir
You would access this datafile in a similar way as above, but your toml entry would look like this with the added new entry,
[tool.setuptools.package-data]
"pkgexampledatafiles.data"=["*.json"] <-- TOML entry for the /src/pkgexampledatafiles/data directory.
"moredata"=["*.json"] <-- TOML entry for the /src/moredata/ directory.
You should have on entry for each directory, and for each subdirectory you would like access to data files.
Your resource container would look like this,
my_traversable_resource_container = importlib.resources.files("moredata").joinpath("moreconfig.json")
note the top level namespace ‘moredata‘ is being used to reference the ‘/src/moredata‘ in the python package directory.
The .joinpath() method works in the same way as it did before, but here we tell it to use the new datafile with the filename ‘moreconfig.json‘.
Hope that is all more better to understand.
[…] We’re going to build on the work we did here… https://rexbytes.com/2022/11/19/python-packaging-reading-writing-to-datafiles-inside-a-package/ […]