xlfilldown: The Complete Guide to Excel Fill Down

Summary: Everything you need to know about xlfilldown — the CLI tool and Python library that streams Excel data into SQLite or a new Excel workbook, with hierarchical fill-down, row hashing, and constant-memory processing.

Key	Value
Tool	`xlfilldown`
Python version	3.9+
Dependencies	`openpyxl`
Install	`pip install xlfilldown`
Output targets	SQLite, Excel (.xlsx)
Fill modes	`hierarchical`, `independent`

0. Prerequisites

Python 3.9 or later installed on your system
Basic familiarity with the command line
An Excel .xlsx file you want to process

Install xlfilldown with pip or pipx:

pip install xlfilldown

Or if you prefer an isolated install:

pipx install xlfilldown

1. Why Fill Down Is Tricky

Typical spreadsheets use tiered headers: a high-level “group” column appears once, and lower-level detail rows sit beneath it. The hierarchy is obvious to a human eye but catastrophic for a database — every row needs its full context.

The naive fix is pandas ffill(), but that carries each column forward independently. When a new Region appears before Country is set, ffill() drags the old Country into the new Region. Now Osaka is in Europe and nobody notices until the quarterly report is wrong.

xlfilldown solves this with two fill modes:

Hierarchical (default) — higher tiers reset lower tiers when they change. When Region changes, Country and City both reset. This prevents stale context from leaking across groups.
Independent — each listed column carries forward on its own, like pandas ffill(). Use this when your columns are genuinely unrelated.

Choose the mode with --fill-mode hierarchical or --fill-mode independent. Column order matters only in hierarchical mode, where left equals highest tier.

2. Your First Command

Say Sheet1 in data.xlsx looks like this:

Region	Country	City	Value
Europe
	Germany
		Bonn	10
		Berlin	12

Asia
	Japan
		Osaka	7
		Kyoto	8

The goal: fill Region, Country, and City so every row carries its full context, then write the result to SQLite.

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --require-non-null '["Value"]' \
  --db out.db \
  --table fact_locations \
  --row-hash \
  --excel-row-numbers \
  --drop-blank-rows


Code language: JavaScript (javascript)

After running that command, the fact_locations table in out.db contains:

row_hash	excel_row	Region	Country	City	Value
aa93f736…	4	Europe	Germany	Bonn	10
2ef01629…	5	Europe	Germany	Berlin	12
9c9f43c6…	9	Asia	Japan	Osaka	7
55d85bf2…	10	Asia	Japan	Kyoto	8

Key details about the output:

All columns are stored as TEXT, including numeric values like Value.
Numbers and dates are written as their string representation (e.g., 10 becomes "10").
row_hash is a SHA-256 computed over all data columns in that row.
excel_row is the 1-based row number from the original spreadsheet.
Blank spacer rows and group-header rows (missing Value) were dropped.

3. Using Column Letters Instead of Names

If your header names are long or contain special characters, point to columns by their Excel letter instead:

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols-letters A C \
  --db out.db


Code language: CSS (css)

Letters are resolved to header names using the header row you specify.
If a letter points to an empty header cell, xlfilldown raises a clear error.
--fill-cols and --fill-cols-letters are mutually exclusive — pick one.

4. Hierarchical vs Independent Fill

4.1 Hierarchical (default)

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --fill-mode hierarchical \
  --db out.db


Code language: JavaScript (javascript)

When Region changes, both Country and City reset on that row. When Country changes, City resets. This prevents stale values from leaking into a new group.

Use hierarchical mode when your data has natural tiers: Region, Country, City or Department, Team, Employee.

4.2 Independent

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --fill-mode independent \
  --db out.db


Code language: JavaScript (javascript)

Each column carries forward on its own, regardless of what happens in other columns. This behaves like pandas ffill().

Use independent mode when your columns are unrelated — for example, Last Seen Date and Owner have no hierarchical relationship.

Tip: If you see old City names leaking into a new Country, switch to hierarchical mode.

5. Dropping Unwanted Rows

Two filters apply after fill-down:

Drop blank spacer rows — removes rows that are blank across all your fill columns:

--drop-blank-rows

This is useful when your source sheet has empty rows used as visual separators.

Require non-null — drops any row where a listed column is still blank after filling:

--require-non-null '["Region","City","Value"]'


Code language: JavaScript (javascript)

You can also specify this by letter:

--require-non-null-letters A D


Code language: JavaScript (javascript)

Name-based and letter-based lists can be combined. They are merged and de-duplicated internally.

6. Writing to Excel Instead of SQLite

Replace the db subcommand with xlsx:

xlfilldown xlsx \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --outfile processed.xlsx \
  --outsheet "Filled" \
  --if-exists replace \
  --row-hash \
  --excel-row-numbers


Code language: JavaScript (javascript)

The --if-exists flag controls what happens when the output sheet already exists:

Value	Behavior
`error`	Raise an error if the sheet exists (default)
`replace`	Delete and recreate the sheet
`append`	Append rows after validating that headers match exactly, including optional columns like `row_hash` and `excel_row`

Tip: When creating a new workbook, xlfilldown uses openpyxl’s write_only=True mode for streaming speed.

7. Append Semantics

Appending only works if the destination columns match exactly and in order:

SQLite — column names and order must match the existing table schema.
Excel — the header row in the destination sheet must equal the expected column list, including any optional columns like row_hash and excel_row if enabled.

This strictness prevents silent data drift. If your schema has changed (for example, you added --row-hash after the initial load), use --if-exists replace or adjust the destination to match.

8. Raw Ingest (No Fill Down)

Sometimes you want to load a sheet exactly as-is, without applying any fill-down logic. This is useful when you just need audit columns (row_hash, excel_row) and plan to handle nulls later.

Pass --ingest-mode raw to skip fill-down entirely:

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --db out.db \
  --table raw_ingest \
  --ingest-mode raw \
  --row-hash \
  --excel-row-numbers \
  --if-exists replace


Code language: CSS (css)

The same flag works with the xlsx subcommand:

xlfilldown xlsx \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --outfile out.xlsx \
  --outsheet "RawSheet" \
  --ingest-mode raw \
  --row-hash \
  --excel-row-numbers \
  --if-exists replace


Code language: JavaScript (javascript)

9. Python API

Both output targets are available as Python functions with the same options as the CLI.

9.1 Write to SQLite

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country", "City"],
    db="out.db",
    table="fact_locations",
    drop_blank_rows=True,
    require_non_null=["Value"],
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
    batch_size=1000,
    fill_mode="hierarchical",
)
print(summary)


Code language: PHP (php)

9.2 Write to Excel

from xlfilldown.api import ingest_excel_to_excel

summary = ingest_excel_to_excel(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country", "City"],
    outfile="processed.xlsx",
    outsheet="Filled",
    drop_blank_rows=True,
    require_non_null=["Value"],
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
    fill_mode="hierarchical",
)
print(summary)


Code language: PHP (php)

9.3 Raw ingest via Python

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    ingest_mode="raw",
    db="out.db",
    table="raw_ingest",
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
)
print(summary)


Code language: PHP (php)

Use fill_cols_letters=["A", "B", "C"] instead of fill_cols to specify columns by letter. Use require_non_null_letters=["A", "D"] for the same with the non-null filter. Name-based and letter-based arguments can be combined — they are merged and de-duplicated.

The return value is a summary object with row counts and flags indicating whether hash and row-number columns were written.

10. Verifying Results

10.1 SQLite

-- Check first 5 rows
SELECT excel_row, Region, Country, City, Value
FROM fact_locations LIMIT 5;

-- Find potentially duplicated logical rows
SELECT row_hash, COUNT(*) c
FROM fact_locations GROUP BY 1 HAVING c > 1;

-- Spot rows missing required context
SELECT * FROM fact_locations
WHERE Region IS NULL OR Country IS NULL;


Code language: PHP (php)

10.2 Excel

Open the output workbook and inspect the header row. If you enabled --row-hash or --excel-row-numbers, those columns appear after the data headers.

11. Recipes

11.1 Group headers with detail rows — keep only detail

Your sheet has “group header” rows that contain just a Region value and detail rows that contain a Value. Drop the group headers:

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --require-non-null '["Value"]' \
  --drop-blank-rows \
  --db out.db \
  --table facts


Code language: JavaScript (javascript)

The --require-non-null '["Value"]' flag ensures only rows with an actual Value survive.

11.2 Messy header names — use letters

Headers like "anothercolumn,3" are awkward to type in JSON. Use letters instead:

xlfilldown xlsx \
  --infile messy.xlsx \
  --insheet "Raw" \
  --header-row 3 \
  --fill-cols-letters A C AE \
  --outfile clean.xlsx \
  --outsheet "Filled"


Code language: JavaScript (javascript)

11.3 Strict filtering after fill

Drop any row where Region or City is still missing after fill-down:

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --require-non-null '["Region","City"]' \
  --db out.db


Code language: JavaScript (javascript)

11.4 Subset columns in the output

xlfilldown ingests all non-empty headers by design (this keeps hashing stable and predictable). To narrow the output, create a view in SQLite:

CREATE VIEW slim_facts AS
SELECT Region, City, Value FROM fact_locations;


Code language: PHP (php)

Or do a second-pass filter in your downstream tool.

12. Performance Tips

--batch-size 1000 is a good default. Increase it for very wide tables or slow disks.
When --row-hash or --excel-row-numbers are enabled, xlfilldown creates helpful indexes automatically.
The source workbook is opened with openpyxl’s read_only=True. New output workbooks use write_only=True.
Processing is row-streamed and runs in constant memory, so it handles large sheets without issue.

13. Troubleshooting

Common error messages and what to do about them:

“Use only one of –fill-cols or –fill-cols-letters” — You passed both. Pick one input style per invocation.
“Column letter X refers to an empty header cell” — The cell at that letter in your header row is blank, whitespace, or nan. Either populate it in Excel or switch to --fill-cols with explicit names.
“Duplicate header names found” — Rename the duplicates in your sheet. xlfilldown requires unique headers after trimming whitespace and normalizing nan values.
“Header row N exceeds sheet max row M” — You pointed at the wrong header row. Check your --header-row value.
Append failed with schema mismatch — The destination columns no longer match. You may have added --row-hash or changed header order since the original load. Use --if-exists replace or adjust the destination schema to match.

14. FAQ

Q: Does fill-down apply to completely empty rows?

A: No. Rows that are empty across all headers are preserved as-is (or dropped if you pass --drop-blank-rows). The carry state persists past them, so the next non-empty row still gets filled correctly.

Q: Why is everything stored as TEXT?

A: Determinism. Values are canonicalized to stable text for consistent storage and hashing. Cast to your desired types in downstream tools.

Q: Does hashing treat 1, 1.0, and Decimal('1.00') the same?

A: Yes. They all canonicalize to "1", so they produce the same hash.

Q: Can I keep Excel’s original row numbers?

A: Yes. Pass --excel-row-numbers to add an excel_row TEXT column containing the 1-based row number from the source sheet.

Q: Can I preview data before writing?

A: Not built-in yet. A common pattern is writing to a temporary SQLite database and inspecting it with sqlite3 or DB Browser for SQLite.

15. Cheat Sheets

15.1 CLI — SQLite

# Basic fill-down to SQLite
xlfilldown db --infile in.xlsx --insheet S --header-row 1 \
  --fill-cols '["A","B"]' \
  --db out.db

# Hierarchical tiers + filters + audit columns
xlfilldown db --infile in.xlsx --insheet S --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --drop-blank-rows \
  --require-non-null '["City","Value"]' \
  --row-hash --excel-row-numbers \
  --db out.db --table facts --if-exists replace

# Raw ingest (no fill-down)
xlfilldown db --infile in.xlsx --insheet S --header-row 1 \
  --ingest-mode raw --row-hash --excel-row-numbers \
  --db out.db --table raw --if-exists replace


Code language: PHP (php)

15.2 CLI — Excel

xlfilldown xlsx --infile in.xlsx --insheet S --header-row 1 \
  --fill-cols-letters A C AE \
  --outfile processed.xlsx --outsheet Filled \
  --if-exists replace --row-hash --excel-row-numbers


Code language: CSS (css)

15.3 Python

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="in.xlsx",
    sheet="S",
    header_row=1,
    fill_cols=["Tier1", "Tier2", "Tier3"],
    db="out.db",
    drop_blank_rows=True,
    require_non_null=["Tier3"],
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
    fill_mode="hierarchical",
)
print(summary)


Code language: PHP (php)

Summary

xlfilldown takes messy, tiered Excel spreadsheets and turns them into clean, flat data — either in SQLite or a new Excel workbook. The key concepts covered in this guide:

Hierarchical fill resets lower tiers when higher tiers change, preventing stale context from leaking across groups.
Independent fill carries each column forward separately, like pandas ffill().
Row hashing and Excel row numbers provide audit trails back to the source data.
Raw ingest loads sheets as-is when you want audit columns without fill-down logic.
The CLI and Python API offer identical functionality — pick whichever fits your workflow.

For quick reference, see the cheat sheets in section 15.

0. Prerequisites

1. Why Fill Down Is Tricky

2. Your First Command

3. Using Column Letters Instead of Names

4. Hierarchical vs Independent Fill

4.1 Hierarchical (default)

4.2 Independent

5. Dropping Unwanted Rows

6. Writing to Excel Instead of SQLite

7. Append Semantics

8. Raw Ingest (No Fill Down)

9. Python API

9.1 Write to SQLite

9.2 Write to Excel

9.3 Raw ingest via Python

10. Verifying Results

10.1 SQLite

10.2 Excel

11. Recipes

11.1 Group headers with detail rows — keep only detail

11.2 Messy header names — use letters

11.3 Strict filtering after fill

11.4 Subset columns in the output

12. Performance Tips

13. Troubleshooting

14. FAQ

15. Cheat Sheets

15.1 CLI — SQLite

15.2 CLI — Excel

15.3 Python

Summary

Share this:

Like this:

Similar Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

One Comment

Leave a Reply Cancel reply