xlfilldown streams an Excel sheet into SQLite or a new Excel sheet, forward-filling selected columns by header name (or letters), optionally keeping original Excel row numbers and computing a stable SHA-256 row hash. It works in constant memory, so it’s great for big workbooks.

0) Install

pip install xlfilldown
# or
pipx install xlfilldown

Python ≥ 3.9. Depends on openpyxl.

1) Why “fill-down” is tricky (and how xlfilldown solves it)

Typical spreadsheets use tiered headers: a high-level “group” column appears once, and lower tiers appear below it. Naïve tutorials copy values down independently, which can accidentally leak stale context.

xlfilldown supports two modes:

  • hierarchical (default): higher tiers reset lower tiers when they change
    (great for “Region > Country > City” style data)
  • independent: pandas-style ffill; each listed column carries on its own

You choose the mode with --fill-mode hierarchical|independent. Order matters only in hierarchical mode (left = highest tier).

2) Your first 3-minute success

Say Sheet1 looks like this:

RegionCountryCityValue
Europe
Germany
Bonn10
Berlin12
Asia
Japan
Osaka7
Kyoto8

Goal: fill Region/Country/City so every row has context, then write to SQLite.

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --require-non-null '["Value"]' \
  --db out.db \
  --table fact_locations \
  --row-hash \
  --excel-row-numbers \
  --drop-blank-rows

What you get (conceptually):

row_hash  excel_row  Region  Country  City   Value
...       "4"        Europe  Germany  Bonn    "10"
...       "5"        Europe  Germany  Berlin  "12"
...       "9"        Asia    Japan    Osaka   "7"
...       "10"       Asia    Japan    Kyoto   "8"
row_hashexcel_rowRegionCountryCityValue
aa93f736faca44b3734f2784f5beac08eb2a8c0049cda70c6ec5a736159848e64EuropeGermanyBonn10
2ef0162949e29c587b1071c6435be56e1cb338986edc541974f353e1d7b69f565EuropeGermanyBerlin12
9c9f43c630a1a3ecf50ff26a8b07dc2cd005adfeb2d179bd2c32ee8e0e232fd79AsiaJapanOsaka7
55d85bf23152d147c61fef8349a386c4e62edc847797e072909d3d231316404410AsiaJapanKyoto8

Notes:

  • All destination columns are stored as TEXT (including excel_row).
  • Numbers and dates are written as canonical strings (e.g., 1.0 → "1").
  • row_hash is a SHA-256 over the data columns (in header order) after fill.

3) Using column letters (A, C, AE…)

Don’t want to type headers? Point to the header row and pass letters:

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols-letters A C \
  --db out.db
  • Letters are mapped to header names in the header row you provide.
  • If a letter points to an empty header, xlfilldown will error clearly.
  • --fill-cols and --fill-cols-letters are mutually exclusive.

4) Hierarchical vs Independent (with examples)

Hierarchical (default)

--fill-cols '["Region","Country","City"]'
--fill-mode hierarchical   # default; this line can be omitted

Behavior:

  • When Region changes, both Country and City carries reset on that row.
  • When Country changes, City resets on that row.

Independent

--fill-cols '["Region","Country","City"]'
--fill-mode independent

Behavior:

  • Each of the three columns carries forward independently (like pandas ffill).
  • Order does not matter.

Tip: If you saw “leaking” of old City names into a new Country, you probably wanted hierarchical.

5) Dropping rows you don’t want

Two powerful filters apply after padding:

Spacer rows (drop rows blank across your fill columns):

--drop-blank-rows

Useful when you have visual separators in your source sheet.

Required non-null (drop if any listed headers are blank after fill):

--require-non-null '["Region","City","Value"]'
# or by letters, resolved to names via header row:
--require-non-null-letters A D

You can combine name- and letter-based lists; they’re merged and de-duplicated.

6) Writing to Excel instead of SQLite

xlfilldown xlsx \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --outfile processed.xlsx \
  --outsheet "Filled" \
  --if-exists replace \
  --row-hash \
  --excel-row-numbers

Sheet-level --if-exists:

  • fail (default) – error if the sheet exists
  • replace – recreate the sheet fresh
  • append – append after validating headers match exactly (including optional columns like excel_row and row_hash).

Pro move: When creating a new workbook, xlfilldown uses write_only=True for streaming speed.

7) Append semantics (exact-schema match)

Appending only works if destination columns match exactly and in order:

  • SQLite:
    [row_hash?] [excel_row?] + headers… as TEXT, exact order.
  • Excel:
    Header row must equal the expected list (including optional columns if enabled).

This rigor prevents silent data drift.

8) Verifying results (quick queries)

SQLite:

-- Check first 5 rows
SELECT excel_row, Region, Country, City, Value FROM fact_locations LIMIT 5;

-- Find potentially duplicated logical rows
SELECT row_hash, COUNT(*) c FROM fact_locations GROUP BY 1 HAVING c > 1;

-- Spot rows missing required context (should be none if you used --require-non-null)
SELECT * FROM fact_locations WHERE Region IS NULL OR Country IS NULL;

Excel:

  • Open processed.xlsx, inspect the header row.
  • If you used --row-hash/--excel-row-numbers, they appear before data headers.

9) Python API (same power, full control)

from xlfilldown.api import ingest_excel_to_sqlite, ingest_excel_to_excel

## Python API

```python
from xlfilldown.api import ingest_excel_to_sqlite, ingest_excel_to_excel

# → SQLite
summary = ingest_excel_to_sqlite(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    # choose one:
    fill_cols=["columnname1", "columnname2", "anothercolumn,3"],   # by header names
    # fill_cols_letters=["A", "B", "C"],                           # or by Excel letters
    db="out.db",
    table=None,
    drop_blank_rows=True,
    # choose one (or both, merged & de-duped):
    require_non_null=["columnname1", "columnname2"],               # by header names
    # require_non_null_letters=["A", "B"],                         # or by Excel letters
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
    batch_size=1000,
    fill_mode="hierarchical",    # default hierarchical fill
    # fill_mode="independent",   # independent (pandas-style) fill
)

# → Excel
summary = ingest_excel_to_excel(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["columnname1", "columnname2", "anothercolumn,3"],
    # or: fill_cols_letters=["A", "B", "C"],
    outfile="out.xlsx",
    outsheet=None,
    drop_blank_rows=True,
    require_non_null=["columnname1", "columnname2"],
    # or: require_non_null_letters=["A", "B"],
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
    fill_mode="independent",     # independent (pandas-style) fill
    # fill_mode="hierarchical",  # hierarchical (default)
)

Return values include row counts and flags indicating whether hash/row-numbers were written.

10) Recipes you’ll actually use

A. Group header rows + detail rows → keep only detail

Your sheet uses “group header” rows with just Region, then detail rows with Value.

xlfilldown db \
  --infile data.xlsx --insheet "Sheet1" --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --require-non-null '["Value"]' \  # drop group headers lacking Value
  --drop-blank-rows \
  --db out.db --table facts

B. Only certain columns matter

You have many headers but only want a subset in the destination?
Create a view (SQLite) or a second pass to slim the data. (xlfilldown ingests all non-empty headers by design; this keeps hashing stable and predictable.)

C. Use letters when header names are messy

Headers like "anothercolumn,3" are awkward to type:

xlfilldown xlsx \
  --infile messy.xlsx --insheet "Raw" --header-row 3 \
  --fill-cols-letters A C AE \
  --outfile clean.xlsx --outsheet "Filled"

D. Strict filtering after fill

Drop any row where Region or City is missing after padding:

--require-non-null '["Region","City"]'

11) Performance tips

  • Batching (SQLite): --batch-size 1000 is a good default. Increase for very wide tables or slow disks.
  • Indexes: When --row-hash or --excel-row-numbers are enabled, helpful indexes are created automatically.
  • Streaming I/O: Source is opened with read_only=True, data_only=True. New Excel workbooks use write_only=True.
  • Memory: Processing is row-streamed; works well on large sheets.

12) Troubleshooting (messages you’ll see and what to do)

  • “Use only one of –fill-cols or –fill-cols-letters”
    Pick one input style.
  • “Column letter X refers to an empty header cell”
    Your header cell is blank (or whitespace/“nan”). Either populate it in Excel or switch to --fill-cols with explicit names.
  • “Duplicate header names found”
    Rename duplicates in your sheet; xlfilldown requires unique headers (after trim/“nan” normalization).
  • “Header row N exceeds sheet max row M”
    You pointed at the wrong header row.
  • Append failed with schema mismatch
    You changed optional columns (e.g., added row_hash) or header order. Use --if-exists replace or make the destination match.

13) When to use hierarchical vs independent

Hierarchical

  • Data has tiers of context: Region → Country → City
  • When higher tier changes, you want lower tiers to reset (avoid stale carry)

Independent

  • Columns are unrelated (e.g., “Last Seen Date” and “Owner”)
  • You explicitly want each column to “carry forward” separately

14) FAQ

Q: Do you fill completely empty rows?
A: No. Rows that are empty across all headers are preserved (or dropped if --drop-blank-rows). The carry persists past them.

Q: Why is everything TEXT in SQLite/Excel output?
A: Determinism. Values are canonicalized to stable text for storage and hashing. You can cast in downstream tools.

Q: Does hashing treat 1, 1.0, and Decimal('1.00') the same?
A: Yes. They canonicalize to "1".

Q: Can I keep Excel’s original row numbers?
A: Yes: --excel-row-numbers adds an excel_row TEXT column (1-based).

Q: Can I preview before writing?
A: Not built-in yet. A common pattern is writing to a temp SQLite and inspecting with sqlite3/DB Browser.

15) Copy-paste cheat sheets

CLI (SQLite)

# basic
xlfilldown db --infile in.xlsx --insheet S --header-row 1 \
  --fill-cols '["A","B"]' \
  --db out.db

# hierarchical tiers + filters + extras
xlfilldown db --infile in.xlsx --insheet S --header-row 1 \
  --fill-cols '["Region","Country","City"]' \
  --drop-blank-rows \
  --require-non-null '["City","Value"]' \
  --row-hash --excel-row-numbers \
  --db out.db --table facts --if-exists replace

CLI (Excel)

xlfilldown xlsx --infile in.xlsx --insheet S --header-row 1 \
  --fill-cols-letters A C AE \
  --outfile processed.xlsx --outsheet Filled \
  --if-exists replace --row-hash --excel-row-numbers

Python

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="in.xlsx",
    sheet="S",
    header_row=1,
    # choose one:
    fill_cols=["Tier1","Tier2","Tier3"],
    # fill_cols_letters=["A","B","C"],
    db="out.db",
    drop_blank_rows=True,
    require_non_null=["Tier3"],
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
    fill_mode="hierarchical",    # or "independent"
)
print(summary)

Raw Ingest Examples

Sometimes you may want to ingest a sheet exactly as-is, without applying any fill-down logic.
This is useful if you just want to load the data into SQLite or Excel with audit columns (row_hash, excel_row) and handle nulls/blanks later.

CLI: Raw ingest to SQLite

xlfilldown db \
  --infile data.xlsx --insheet Sheet1 --header-row 1 \
  --db out.db --table raw_ingest \
  --ingest-mode raw \
  --row-hash --excel-row-numbers --if-exists replace

CLI: Raw ingest to Excel

xlfilldown xlsx \
  --infile data.xlsx --insheet Sheet1 --header-row 1 \
  --outfile out.xlsx --outsheet RawSheet \
  --ingest-mode raw \
  --row-hash --excel-row-numbers --if-exists replace

Python API: Raw ingest to SQLite

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="data.xlsx", sheet="Sheet1", header_row=1,
    ingest_mode="raw",                      # skip fill-down
    db="out.db", table="raw_ingest",
    row_hash=True, excel_row_numbers=True,
    if_exists="replace",
)
print(summary)

Python API: Raw ingest to Excel

from xlfilldown.api import ingest_excel_to_excel

summary = ingest_excel_to_excel(
    file="data.xlsx", sheet="Sheet1", header_row=1,
    ingest_mode="raw",                      # skip fill-down
    outfile="out.xlsx", outsheet="RawSheet",
    row_hash=True, excel_row_numbers=True,
    if_exists="replace",
)
print(summary)

One thought on “xlfilldown: the complete guide ( Excel Fill Down )”

Leave a Reply