xlfilldown: The Complete Guide to Excel Fill Down
Summary: Everything you need to know about
xlfilldown— the CLI tool and Python library that streams Excel data into SQLite or a new Excel workbook, with hierarchical fill-down, row hashing, and constant-memory processing.
| Key | Value |
|---|---|
| Tool | xlfilldown |
| Python version | 3.9+ |
| Dependencies | openpyxl |
| Install | pip install xlfilldown |
| Output targets | SQLite, Excel (.xlsx) |
| Fill modes | hierarchical, independent |
0. Prerequisites
- Python 3.9 or later installed on your system
- Basic familiarity with the command line
- An Excel
.xlsxfile you want to process
Install xlfilldown with pip or pipx:
pip install xlfilldown
Or if you prefer an isolated install:
pipx install xlfilldown
1. Why Fill Down Is Tricky
Typical spreadsheets use tiered headers: a high-level “group” column appears once, and lower-level detail rows sit beneath it. The hierarchy is obvious to a human eye but catastrophic for a database — every row needs its full context.
The naive fix is pandas ffill(), but that carries each column forward independently. When a new Region appears before Country is set, ffill() drags the old Country into the new Region. Now Osaka is in Europe and nobody notices until the quarterly report is wrong.
xlfilldown solves this with two fill modes:
- Hierarchical (default) — higher tiers reset lower tiers when they change. When Region changes, Country and City both reset. This prevents stale context from leaking across groups.
- Independent — each listed column carries forward on its own, like pandas
ffill(). Use this when your columns are genuinely unrelated.
Choose the mode with --fill-mode hierarchical or --fill-mode independent. Column order matters only in hierarchical mode, where left equals highest tier.
2. Your First Command
Say Sheet1 in data.xlsx looks like this:
| Region | Country | City | Value |
|---|---|---|---|
| Europe | |||
| Germany | |||
| Bonn | 10 | ||
| Berlin | 12 | ||
| Asia | |||
| Japan | |||
| Osaka | 7 | ||
| Kyoto | 8 |
The goal: fill Region, Country, and City so every row carries its full context, then write the result to SQLite.
xlfilldown db \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--fill-cols '["Region","Country","City"]' \
--require-non-null '["Value"]' \
--db out.db \
--table fact_locations \
--row-hash \
--excel-row-numbers \
--drop-blank-rows
Code language: JavaScript (javascript)
After running that command, the fact_locations table in out.db contains:
| row_hash | excel_row | Region | Country | City | Value |
|---|---|---|---|---|---|
| aa93f736… | 4 | Europe | Germany | Bonn | 10 |
| 2ef01629… | 5 | Europe | Germany | Berlin | 12 |
| 9c9f43c6… | 9 | Asia | Japan | Osaka | 7 |
| 55d85bf2… | 10 | Asia | Japan | Kyoto | 8 |
Key details about the output:
- All columns are stored as TEXT, including numeric values like
Value. - Numbers and dates are written as their string representation (e.g.,
10becomes"10"). row_hashis a SHA-256 computed over all data columns in that row.excel_rowis the 1-based row number from the original spreadsheet.- Blank spacer rows and group-header rows (missing
Value) were dropped.
3. Using Column Letters Instead of Names
If your header names are long or contain special characters, point to columns by their Excel letter instead:
xlfilldown db \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--fill-cols-letters A C \
--db out.db
Code language: CSS (css)
- Letters are resolved to header names using the header row you specify.
- If a letter points to an empty header cell,
xlfilldownraises a clear error. --fill-colsand--fill-cols-lettersare mutually exclusive — pick one.
4. Hierarchical vs Independent Fill
4.1 Hierarchical (default)
xlfilldown db \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--fill-cols '["Region","Country","City"]' \
--fill-mode hierarchical \
--db out.db
Code language: JavaScript (javascript)
When Region changes, both Country and City reset on that row. When Country changes, City resets. This prevents stale values from leaking into a new group.
Use hierarchical mode when your data has natural tiers: Region, Country, City or Department, Team, Employee.
4.2 Independent
xlfilldown db \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--fill-cols '["Region","Country","City"]' \
--fill-mode independent \
--db out.db
Code language: JavaScript (javascript)
Each column carries forward on its own, regardless of what happens in other columns. This behaves like pandas ffill().
Use independent mode when your columns are unrelated — for example, Last Seen Date and Owner have no hierarchical relationship.
Tip: If you see old City names leaking into a new Country, switch to hierarchical mode.
5. Dropping Unwanted Rows
Two filters apply after fill-down:
Drop blank spacer rows — removes rows that are blank across all your fill columns:
--drop-blank-rows
This is useful when your source sheet has empty rows used as visual separators.
Require non-null — drops any row where a listed column is still blank after filling:
--require-non-null '["Region","City","Value"]'
Code language: JavaScript (javascript)
You can also specify this by letter:
--require-non-null-letters A D
Code language: JavaScript (javascript)
Name-based and letter-based lists can be combined. They are merged and de-duplicated internally.
6. Writing to Excel Instead of SQLite
Replace the db subcommand with xlsx:
xlfilldown xlsx \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--fill-cols '["Region","Country","City"]' \
--outfile processed.xlsx \
--outsheet "Filled" \
--if-exists replace \
--row-hash \
--excel-row-numbers
Code language: JavaScript (javascript)
The --if-exists flag controls what happens when the output sheet already exists:
| Value | Behavior |
|---|---|
error | Raise an error if the sheet exists (default) |
replace | Delete and recreate the sheet |
append | Append rows after validating that headers match exactly, including optional columns like row_hash and excel_row |
Tip: When creating a new workbook,
xlfilldownuses openpyxl’swrite_only=Truemode for streaming speed.
7. Append Semantics
Appending only works if the destination columns match exactly and in order:
- SQLite — column names and order must match the existing table schema.
- Excel — the header row in the destination sheet must equal the expected column list, including any optional columns like
row_hashandexcel_rowif enabled.
This strictness prevents silent data drift. If your schema has changed (for example, you added --row-hash after the initial load), use --if-exists replace or adjust the destination to match.
8. Raw Ingest (No Fill Down)
Sometimes you want to load a sheet exactly as-is, without applying any fill-down logic. This is useful when you just need audit columns (row_hash, excel_row) and plan to handle nulls later.
Pass --ingest-mode raw to skip fill-down entirely:
xlfilldown db \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--db out.db \
--table raw_ingest \
--ingest-mode raw \
--row-hash \
--excel-row-numbers \
--if-exists replace
Code language: CSS (css)
The same flag works with the xlsx subcommand:
xlfilldown xlsx \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--outfile out.xlsx \
--outsheet "RawSheet" \
--ingest-mode raw \
--row-hash \
--excel-row-numbers \
--if-exists replace
Code language: JavaScript (javascript)
9. Python API
Both output targets are available as Python functions with the same options as the CLI.
9.1 Write to SQLite
from xlfilldown.api import ingest_excel_to_sqlite
summary = ingest_excel_to_sqlite(
file="data.xlsx",
sheet="Sheet1",
header_row=1,
fill_cols=["Region", "Country", "City"],
db="out.db",
table="fact_locations",
drop_blank_rows=True,
require_non_null=["Value"],
row_hash=True,
excel_row_numbers=True,
if_exists="replace",
batch_size=1000,
fill_mode="hierarchical",
)
print(summary)
Code language: PHP (php)
9.2 Write to Excel
from xlfilldown.api import ingest_excel_to_excel
summary = ingest_excel_to_excel(
file="data.xlsx",
sheet="Sheet1",
header_row=1,
fill_cols=["Region", "Country", "City"],
outfile="processed.xlsx",
outsheet="Filled",
drop_blank_rows=True,
require_non_null=["Value"],
row_hash=True,
excel_row_numbers=True,
if_exists="replace",
fill_mode="hierarchical",
)
print(summary)
Code language: PHP (php)
9.3 Raw ingest via Python
from xlfilldown.api import ingest_excel_to_sqlite
summary = ingest_excel_to_sqlite(
file="data.xlsx",
sheet="Sheet1",
header_row=1,
ingest_mode="raw",
db="out.db",
table="raw_ingest",
row_hash=True,
excel_row_numbers=True,
if_exists="replace",
)
print(summary)
Code language: PHP (php)
Use fill_cols_letters=["A", "B", "C"] instead of fill_cols to specify columns by letter. Use require_non_null_letters=["A", "D"] for the same with the non-null filter. Name-based and letter-based arguments can be combined — they are merged and de-duplicated.
The return value is a summary object with row counts and flags indicating whether hash and row-number columns were written.
10. Verifying Results
10.1 SQLite
-- Check first 5 rows
SELECT excel_row, Region, Country, City, Value
FROM fact_locations LIMIT 5;
-- Find potentially duplicated logical rows
SELECT row_hash, COUNT(*) c
FROM fact_locations GROUP BY 1 HAVING c > 1;
-- Spot rows missing required context
SELECT * FROM fact_locations
WHERE Region IS NULL OR Country IS NULL;
Code language: PHP (php)
10.2 Excel
Open the output workbook and inspect the header row. If you enabled --row-hash or --excel-row-numbers, those columns appear after the data headers.
11. Recipes
11.1 Group headers with detail rows — keep only detail
Your sheet has “group header” rows that contain just a Region value and detail rows that contain a Value. Drop the group headers:
xlfilldown db \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--fill-cols '["Region","Country","City"]' \
--require-non-null '["Value"]' \
--drop-blank-rows \
--db out.db \
--table facts
Code language: JavaScript (javascript)
The --require-non-null '["Value"]' flag ensures only rows with an actual Value survive.
11.2 Messy header names — use letters
Headers like "anothercolumn,3" are awkward to type in JSON. Use letters instead:
xlfilldown xlsx \
--infile messy.xlsx \
--insheet "Raw" \
--header-row 3 \
--fill-cols-letters A C AE \
--outfile clean.xlsx \
--outsheet "Filled"
Code language: JavaScript (javascript)
11.3 Strict filtering after fill
Drop any row where Region or City is still missing after fill-down:
xlfilldown db \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--fill-cols '["Region","Country","City"]' \
--require-non-null '["Region","City"]' \
--db out.db
Code language: JavaScript (javascript)
11.4 Subset columns in the output
xlfilldown ingests all non-empty headers by design (this keeps hashing stable and predictable). To narrow the output, create a view in SQLite:
CREATE VIEW slim_facts AS
SELECT Region, City, Value FROM fact_locations;
Code language: PHP (php)
Or do a second-pass filter in your downstream tool.
12. Performance Tips
--batch-size 1000is a good default. Increase it for very wide tables or slow disks.- When
--row-hashor--excel-row-numbersare enabled,xlfilldowncreates helpful indexes automatically. - The source workbook is opened with openpyxl’s
read_only=True. New output workbooks usewrite_only=True. - Processing is row-streamed and runs in constant memory, so it handles large sheets without issue.
13. Troubleshooting
Common error messages and what to do about them:
- “Use only one of –fill-cols or –fill-cols-letters” — You passed both. Pick one input style per invocation.
- “Column letter X refers to an empty header cell” — The cell at that letter in your header row is blank, whitespace, or
nan. Either populate it in Excel or switch to--fill-colswith explicit names. - “Duplicate header names found” — Rename the duplicates in your sheet.
xlfilldownrequires unique headers after trimming whitespace and normalizingnanvalues. - “Header row N exceeds sheet max row M” — You pointed at the wrong header row. Check your
--header-rowvalue. - Append failed with schema mismatch — The destination columns no longer match. You may have added
--row-hashor changed header order since the original load. Use--if-exists replaceor adjust the destination schema to match.
14. FAQ
Q: Does fill-down apply to completely empty rows?
A: No. Rows that are empty across all headers are preserved as-is (or dropped if you pass --drop-blank-rows). The carry state persists past them, so the next non-empty row still gets filled correctly.
Q: Why is everything stored as TEXT?
A: Determinism. Values are canonicalized to stable text for consistent storage and hashing. Cast to your desired types in downstream tools.
Q: Does hashing treat 1, 1.0, and Decimal('1.00') the same?
A: Yes. They all canonicalize to "1", so they produce the same hash.
Q: Can I keep Excel’s original row numbers?
A: Yes. Pass --excel-row-numbers to add an excel_row TEXT column containing the 1-based row number from the source sheet.
Q: Can I preview data before writing?
A: Not built-in yet. A common pattern is writing to a temporary SQLite database and inspecting it with sqlite3 or DB Browser for SQLite.
15. Cheat Sheets
15.1 CLI — SQLite
# Basic fill-down to SQLite
xlfilldown db --infile in.xlsx --insheet S --header-row 1 \
--fill-cols '["A","B"]' \
--db out.db
# Hierarchical tiers + filters + audit columns
xlfilldown db --infile in.xlsx --insheet S --header-row 1 \
--fill-cols '["Region","Country","City"]' \
--drop-blank-rows \
--require-non-null '["City","Value"]' \
--row-hash --excel-row-numbers \
--db out.db --table facts --if-exists replace
# Raw ingest (no fill-down)
xlfilldown db --infile in.xlsx --insheet S --header-row 1 \
--ingest-mode raw --row-hash --excel-row-numbers \
--db out.db --table raw --if-exists replace
Code language: PHP (php)
15.2 CLI — Excel
xlfilldown xlsx --infile in.xlsx --insheet S --header-row 1 \
--fill-cols-letters A C AE \
--outfile processed.xlsx --outsheet Filled \
--if-exists replace --row-hash --excel-row-numbers
Code language: CSS (css)
15.3 Python
from xlfilldown.api import ingest_excel_to_sqlite
summary = ingest_excel_to_sqlite(
file="in.xlsx",
sheet="S",
header_row=1,
fill_cols=["Tier1", "Tier2", "Tier3"],
db="out.db",
drop_blank_rows=True,
require_non_null=["Tier3"],
row_hash=True,
excel_row_numbers=True,
if_exists="replace",
fill_mode="hierarchical",
)
print(summary)
Code language: PHP (php)
Summary
xlfilldown takes messy, tiered Excel spreadsheets and turns them into clean, flat data — either in SQLite or a new Excel workbook. The key concepts covered in this guide:
- Hierarchical fill resets lower tiers when higher tiers change, preventing stale context from leaking across groups.
- Independent fill carries each column forward separately, like pandas
ffill(). - Row hashing and Excel row numbers provide audit trails back to the source data.
- Raw ingest loads sheets as-is when you want audit columns without fill-down logic.
- The CLI and Python API offer identical functionality — pick whichever fits your workflow.
For quick reference, see the cheat sheets in section 15.

One Comment