xlfilldown streams an Excel sheet into SQLite or a new Excel sheet, forward-filling selected columns by header name (or letters), optionally keeping original Excel row numbers and computing a stable SHA-256 row hash. It works in constant memory, so it’s great for big workbooks.
0) Install
pip install xlfilldown
# or
pipx install xlfilldown
Python ≥ 3.9. Depends on openpyxl.
1) Why “fill-down” is tricky (and how xlfilldown solves it)
Typical spreadsheets use tiered headers: a high-level “group” column appears once, and lower tiers appear below it. Naïve tutorials copy values down independently, which can accidentally leak stale context.
xlfilldown supports two modes:
- hierarchical (default): higher tiers reset lower tiers when they change
(great for “Region > Country > City” style data) - independent: pandas-style
ffill; each listed column carries on its own
You choose the mode with --fill-mode hierarchical|independent. Order matters only in hierarchical mode (left = highest tier).
2) Your first 3-minute success
Say Sheet1 looks like this:
| Region | Country | City | Value |
|---|---|---|---|
| Europe | |||
| Germany | |||
| Bonn | 10 | ||
| Berlin | 12 | ||
| Asia | |||
| Japan | |||
| Osaka | 7 | ||
| Kyoto | 8 |
Goal: fill Region/Country/City so every row has context, then write to SQLite.
xlfilldown db \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--fill-cols '["Region","Country","City"]' \
--require-non-null '["Value"]' \
--db out.db \
--table fact_locations \
--row-hash \
--excel-row-numbers \
--drop-blank-rows
What you get (conceptually):
row_hash excel_row Region Country City Value
... "4" Europe Germany Bonn "10"
... "5" Europe Germany Berlin "12"
... "9" Asia Japan Osaka "7"
... "10" Asia Japan Kyoto "8"
| row_hash | excel_row | Region | Country | City | Value |
|---|---|---|---|---|---|
| aa93f736faca44b3734f2784f5beac08eb2a8c0049cda70c6ec5a736159848e6 | 4 | Europe | Germany | Bonn | 10 |
| 2ef0162949e29c587b1071c6435be56e1cb338986edc541974f353e1d7b69f56 | 5 | Europe | Germany | Berlin | 12 |
| 9c9f43c630a1a3ecf50ff26a8b07dc2cd005adfeb2d179bd2c32ee8e0e232fd7 | 9 | Asia | Japan | Osaka | 7 |
| 55d85bf23152d147c61fef8349a386c4e62edc847797e072909d3d2313164044 | 10 | Asia | Japan | Kyoto | 8 |
Notes:
- All destination columns are stored as TEXT (including
excel_row). - Numbers and dates are written as canonical strings (e.g.,
1.0 → "1"). row_hashis a SHA-256 over the data columns (in header order) after fill.
3) Using column letters (A, C, AE…)
Don’t want to type headers? Point to the header row and pass letters:
xlfilldown db \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--fill-cols-letters A C \
--db out.db
- Letters are mapped to header names in the header row you provide.
- If a letter points to an empty header, xlfilldown will error clearly.
--fill-colsand--fill-cols-lettersare mutually exclusive.
4) Hierarchical vs Independent (with examples)
Hierarchical (default)
--fill-cols '["Region","Country","City"]'
--fill-mode hierarchical # default; this line can be omitted
Behavior:
- When Region changes, both Country and City carries reset on that row.
- When Country changes, City resets on that row.
Independent
--fill-cols '["Region","Country","City"]'
--fill-mode independent
Behavior:
- Each of the three columns carries forward independently (like pandas
ffill). - Order does not matter.
Tip: If you saw “leaking” of old City names into a new Country, you probably wanted hierarchical.
5) Dropping rows you don’t want
Two powerful filters apply after padding:
Spacer rows (drop rows blank across your fill columns):
--drop-blank-rows
Useful when you have visual separators in your source sheet.
Required non-null (drop if any listed headers are blank after fill):
--require-non-null '["Region","City","Value"]'
# or by letters, resolved to names via header row:
--require-non-null-letters A D
You can combine name- and letter-based lists; they’re merged and de-duplicated.
6) Writing to Excel instead of SQLite
xlfilldown xlsx \
--infile data.xlsx \
--insheet "Sheet1" \
--header-row 1 \
--fill-cols '["Region","Country","City"]' \
--outfile processed.xlsx \
--outsheet "Filled" \
--if-exists replace \
--row-hash \
--excel-row-numbers
Sheet-level --if-exists:
fail(default) – error if the sheet existsreplace– recreate the sheet freshappend– append after validating headers match exactly (including optional columns likeexcel_rowandrow_hash).
Pro move: When creating a new workbook, xlfilldown uses write_only=True for streaming speed.
7) Append semantics (exact-schema match)
Appending only works if destination columns match exactly and in order:
- SQLite:
[row_hash?] [excel_row?] + headers…as TEXT, exact order. - Excel:
Header row must equal the expected list (including optional columns if enabled).
This rigor prevents silent data drift.
8) Verifying results (quick queries)
SQLite:
-- Check first 5 rows
SELECT excel_row, Region, Country, City, Value FROM fact_locations LIMIT 5;
-- Find potentially duplicated logical rows
SELECT row_hash, COUNT(*) c FROM fact_locations GROUP BY 1 HAVING c > 1;
-- Spot rows missing required context (should be none if you used --require-non-null)
SELECT * FROM fact_locations WHERE Region IS NULL OR Country IS NULL;
Excel:
- Open
processed.xlsx, inspect the header row. - If you used
--row-hash/--excel-row-numbers, they appear before data headers.
9) Python API (same power, full control)
from xlfilldown.api import ingest_excel_to_sqlite, ingest_excel_to_excel
## Python API
```python
from xlfilldown.api import ingest_excel_to_sqlite, ingest_excel_to_excel
# → SQLite
summary = ingest_excel_to_sqlite(
file="data.xlsx",
sheet="Sheet1",
header_row=1,
# choose one:
fill_cols=["columnname1", "columnname2", "anothercolumn,3"], # by header names
# fill_cols_letters=["A", "B", "C"], # or by Excel letters
db="out.db",
table=None,
drop_blank_rows=True,
# choose one (or both, merged & de-duped):
require_non_null=["columnname1", "columnname2"], # by header names
# require_non_null_letters=["A", "B"], # or by Excel letters
row_hash=True,
excel_row_numbers=True,
if_exists="replace",
batch_size=1000,
fill_mode="hierarchical", # default hierarchical fill
# fill_mode="independent", # independent (pandas-style) fill
)
# → Excel
summary = ingest_excel_to_excel(
file="data.xlsx",
sheet="Sheet1",
header_row=1,
fill_cols=["columnname1", "columnname2", "anothercolumn,3"],
# or: fill_cols_letters=["A", "B", "C"],
outfile="out.xlsx",
outsheet=None,
drop_blank_rows=True,
require_non_null=["columnname1", "columnname2"],
# or: require_non_null_letters=["A", "B"],
row_hash=True,
excel_row_numbers=True,
if_exists="replace",
fill_mode="independent", # independent (pandas-style) fill
# fill_mode="hierarchical", # hierarchical (default)
)
Return values include row counts and flags indicating whether hash/row-numbers were written.
10) Recipes you’ll actually use
A. Group header rows + detail rows → keep only detail
Your sheet uses “group header” rows with just Region, then detail rows with Value.
xlfilldown db \
--infile data.xlsx --insheet "Sheet1" --header-row 1 \
--fill-cols '["Region","Country","City"]' \
--require-non-null '["Value"]' \ # drop group headers lacking Value
--drop-blank-rows \
--db out.db --table facts
B. Only certain columns matter
You have many headers but only want a subset in the destination?
Create a view (SQLite) or a second pass to slim the data. (xlfilldown ingests all non-empty headers by design; this keeps hashing stable and predictable.)
C. Use letters when header names are messy
Headers like "anothercolumn,3" are awkward to type:
xlfilldown xlsx \
--infile messy.xlsx --insheet "Raw" --header-row 3 \
--fill-cols-letters A C AE \
--outfile clean.xlsx --outsheet "Filled"
D. Strict filtering after fill
Drop any row where Region or City is missing after padding:
--require-non-null '["Region","City"]'
11) Performance tips
- Batching (SQLite):
--batch-size 1000is a good default. Increase for very wide tables or slow disks. - Indexes: When
--row-hashor--excel-row-numbersare enabled, helpful indexes are created automatically. - Streaming I/O: Source is opened with
read_only=True, data_only=True. New Excel workbooks usewrite_only=True. - Memory: Processing is row-streamed; works well on large sheets.
12) Troubleshooting (messages you’ll see and what to do)
- “Use only one of –fill-cols or –fill-cols-letters”
Pick one input style. - “Column letter X refers to an empty header cell”
Your header cell is blank (or whitespace/“nan”). Either populate it in Excel or switch to--fill-colswith explicit names. - “Duplicate header names found”
Rename duplicates in your sheet; xlfilldown requires unique headers (after trim/“nan” normalization). - “Header row N exceeds sheet max row M”
You pointed at the wrong header row. - Append failed with schema mismatch
You changed optional columns (e.g., addedrow_hash) or header order. Use--if-exists replaceor make the destination match.
13) When to use hierarchical vs independent
Hierarchical
- Data has tiers of context: Region → Country → City
- When higher tier changes, you want lower tiers to reset (avoid stale carry)
Independent
- Columns are unrelated (e.g., “Last Seen Date” and “Owner”)
- You explicitly want each column to “carry forward” separately
14) FAQ
Q: Do you fill completely empty rows?
A: No. Rows that are empty across all headers are preserved (or dropped if --drop-blank-rows). The carry persists past them.
Q: Why is everything TEXT in SQLite/Excel output?
A: Determinism. Values are canonicalized to stable text for storage and hashing. You can cast in downstream tools.
Q: Does hashing treat 1, 1.0, and Decimal('1.00') the same?
A: Yes. They canonicalize to "1".
Q: Can I keep Excel’s original row numbers?
A: Yes: --excel-row-numbers adds an excel_row TEXT column (1-based).
Q: Can I preview before writing?
A: Not built-in yet. A common pattern is writing to a temp SQLite and inspecting with sqlite3/DB Browser.
15) Copy-paste cheat sheets
CLI (SQLite)
# basic
xlfilldown db --infile in.xlsx --insheet S --header-row 1 \
--fill-cols '["A","B"]' \
--db out.db
# hierarchical tiers + filters + extras
xlfilldown db --infile in.xlsx --insheet S --header-row 1 \
--fill-cols '["Region","Country","City"]' \
--drop-blank-rows \
--require-non-null '["City","Value"]' \
--row-hash --excel-row-numbers \
--db out.db --table facts --if-exists replace
CLI (Excel)
xlfilldown xlsx --infile in.xlsx --insheet S --header-row 1 \
--fill-cols-letters A C AE \
--outfile processed.xlsx --outsheet Filled \
--if-exists replace --row-hash --excel-row-numbers
Python
from xlfilldown.api import ingest_excel_to_sqlite
summary = ingest_excel_to_sqlite(
file="in.xlsx",
sheet="S",
header_row=1,
# choose one:
fill_cols=["Tier1","Tier2","Tier3"],
# fill_cols_letters=["A","B","C"],
db="out.db",
drop_blank_rows=True,
require_non_null=["Tier3"],
row_hash=True,
excel_row_numbers=True,
if_exists="replace",
fill_mode="hierarchical", # or "independent"
)
print(summary)
Raw Ingest Examples
Sometimes you may want to ingest a sheet exactly as-is, without applying any fill-down logic.
This is useful if you just want to load the data into SQLite or Excel with audit columns (row_hash, excel_row) and handle nulls/blanks later.
CLI: Raw ingest to SQLite
xlfilldown db \
--infile data.xlsx --insheet Sheet1 --header-row 1 \
--db out.db --table raw_ingest \
--ingest-mode raw \
--row-hash --excel-row-numbers --if-exists replace
CLI: Raw ingest to Excel
xlfilldown xlsx \
--infile data.xlsx --insheet Sheet1 --header-row 1 \
--outfile out.xlsx --outsheet RawSheet \
--ingest-mode raw \
--row-hash --excel-row-numbers --if-exists replace
Python API: Raw ingest to SQLite
from xlfilldown.api import ingest_excel_to_sqlite
summary = ingest_excel_to_sqlite(
file="data.xlsx", sheet="Sheet1", header_row=1,
ingest_mode="raw", # skip fill-down
db="out.db", table="raw_ingest",
row_hash=True, excel_row_numbers=True,
if_exists="replace",
)
print(summary)
Python API: Raw ingest to Excel
from xlfilldown.api import ingest_excel_to_excel
summary = ingest_excel_to_excel(
file="data.xlsx", sheet="Sheet1", header_row=1,
ingest_mode="raw", # skip fill-down
outfile="out.xlsx", outsheet="RawSheet",
row_hash=True, excel_row_numbers=True,
if_exists="replace",
)
print(summary)

[…] xlfilldown: the complete guide ( Excel Fill Down ) One Million Rows: Testing xlfilldown Performance and Constant-Memory Design […]