0 MPG Spark

From Fuel Reports to Full Ledger Transparency

This didn’t start with the general ledger.

It started with fuel.

The Spark

While reviewing fuel usage reports in agenda packets, I noticed something visually strange — repeated 0 MPG calculations.

Not one or two.

Many.

I brought it up. It was noted. Then it resurfaced months later.

But I couldn’t shake the question:

Was this just a formatting issue… or a data issue?

So I tried extracting data from the PDFs embedded in the Board of Supervisors’ agenda packets.

That’s when I ran into the first wall.

Every month, the report format changed.

Column order shifted. Headings moved. Totals appeared in different positions. It wasn’t structured data — it was a formatted report.

If I wanted answers, I needed the source data.

My First FOIA

So I filed my first FOIA request with the county (25-849) to obtain a raw export.

When I received it, I started with Google Sheets.

Even at that level, patterns emerged quickly — inconsistencies, edge cases, structural anomalies. I compiled a report and submitted it.

That’s when I realized:

If structured data can reveal signals in one report, what might be visible in the full Accounts Payable ledger?

So I filed another FOIA request (25-899) — this time for the AP report export.

That request involved significant back-and-forth communication. The initial response was a PDF version of the report. PDFs are readable, but they are not analyzable at scale. I asked whether a data export was available instead.

Eventually, I received the data in CSV format.

And almost immediately, I understood something important:

This wasn’t a clean database export.

It was a print-formatted report exported into CSV.

Headers repeated every page. Totals were embedded between transaction rows. Multi-line column titles were flattened awkwardly. In other words, it was structured like a human-readable report, not like a machine-readable dataset.

Still — the underlying data was there.

It just required parsing, validation, and reconstruction.

And that’s a challenge I was more than willing to take on.

What the CSV Actually Looked Like

Each file:

About 2MB
Thousands of rows
Structured like a printed register
Included page headers repeated every page
Included multi-row column headers
Included row totals (“CHECK TOTAL”) embedded throughout
Included spacing and alignment artifacts from the AS/400 report layout

A header wasn’t one row.

It was three.

Every page repeated:

A page banner row
A county name row
Two rows of wrapped column titles
Cells were delimited with a comma-space

The data wasn’t tabular in the modern sense — it was visually formatted.

If I wanted usable data, I had to strip all of that away.

Building the Import Pipeline

At first, I overengineered it.

My original goal wasn’t just to import data — it was to detect changes between different exports covering the same date range. I started designing a system that would hash rows, compare historical snapshots, and flag deltas between versions.

Technically, it worked.

Practically, it wasn’t necessary.

I later realized I already had access to a journal export that reflects transactional changes. That meant I didn’t need a complex diff engine layered on top of the Accounts Payable report. I was solving a problem that another dataset already addressed.

So I stepped back.

Instead of building something intricate and brittle, I rebuilt the upload process from the ground up — simpler, safer, and purpose-built.

Initially, I had been running scripts manually.
Wiping tables.
Re-importing data.

It worked — but it wasn’t safe.

Now, the ingestion process is structured and controlled:

Upload a CSV file
Automatically validate its structure
Parse and normalize each row (excluding page headers, embedded totals, and report artifacts)
Stage the cleaned data in a temporary table
Insert any new vendor lookups
Compute the min/max date range represented in the file
Delete existing ledger entries within that same range
Insert staged data into the production ledger
Commit everything inside a single database transaction

If anything fails at any step, the entire transaction rolls back.

No partial imports.
No destructive table wipes.
No uncertainty about what was replaced.

Just controlled, deterministic replacement of the exact date range being updated.

The end result isn’t flashy.

But it’s reliable.

And when you’re dealing with financial data, reliability is the foundation.

Data Integrity Safeguards

The upload script:

Verifies the file begins with AP308,
Skips repeated page headers
Skips empty purchase order rows
Ignores embedded “CHECK TOTAL” rows
Validates numeric fields strictly
Validates date formats
Validates account number structure
Validates money format (including negative values)
Tracks min/max check dates
Tracks min/max account paid months

If something doesn’t look right, it:

Displays the row number
Shows what failed validation
Continues scanning up to 10 errors
Refuses to complete the import

This gives me a chance to examine patterns instead of blindly accepting flawed data.

What I Found

1. `/0000` Invoice Dates

Some invoice date fields contained: /0000

That’s not random corruption.

It’s legacy behavior.

The AS/400 doesn’t support nullable date fields. So unknown dates are represented as placeholder values.

It makes sense in that environment.

But in a modern schema, that field should be nullable.

I currently apply a temporary workaround during import, but I plan to update the schema properly.

2. Merged Fields: Account Paid + Net Amount

In some rows, I encountered this:

			
Acct Paid,Net
2026/011141730.00 (corrupt)
2026/01,1141730.00 (fixed)

That’s two fields smashed together.

When I inserted the missing delimiter manually, the row parsed correctly.

But that raises a deeper question:

Why did the delimiter disappear?

That’s not user error.
That’s not formatting.
That’s export behavior.

And because one affected row also introduced a blank field that shifted subsequent columns, I’m cautious about auto-repairing it until I fully understand the pattern.

3. Net Amount Attached to Check Number

Example:

			
Net,Check No
541.41-3002090 (corrupt)
541.41,3002090 (fixed)

Instead of a comma delimiter separating the net amount from the check number, the two values were joined with a dash.

At first glance, that dash raises questions. Is it a negative indicator? Is it part of the amount? Or is it simply acting as a misplaced separator?

In context, it is not a credit. It is not a negative value. It is two separate fields that lost their proper delimiter during export.

Once the fields were separated correctly, the row parsed as expected and aligned with surrounding transactions.

That distinction matters.

This isn’t a data-entry mistake by a department.
It isn’t a bookkeeping adjustment.
It’s a structural formatting defect in the exported report.

And structural defects are important to identify — not because they imply wrongdoing, but because they affect confidence in downstream analysis. When a delimiter disappears, columns shift. When columns shift, numbers land in the wrong fields. And when that happens silently, it can distort summaries, totals, and trends.

That’s exactly why the import script validates each column strictly and refuses to proceed when something doesn’t align.

Data integrity isn’t just about totals adding up.

It’s about fields meaning what they are supposed to mean.

Why This Matters

If financial oversight means scanning individual transactions one at a time, we will always be reactive.

Real oversight means revealing signals:

Trend anomalies
Vendor concentration shifts
Outlier payments
Date inconsistencies
Batch irregularities
Duplicate check behaviors
Structural export defects

Signals require clean, queryable data.

The CSV exports themselves are formatted like printed reports, complete with row totals embedded throughout the file. Those totals must be stripped away to access actual transactional data.

Once the noise is removed, the signal becomes visible.

Why I Went Back to 2014

In February, I began requesting exports back to 2014.

That year matters.

It’s the earliest point tied to several issues with unclaimed checks.

If you want to understand a system, you don’t start at last month.

You start at the beginning of the pattern.

Now I can:

Filter by vendor
Filter by account structure
Chart spending over time
Compare account paid dates vs check dates
Audit individual transactions
Reconcile across exports
Replace data ranges cleanly when updated exports arrive

All without wiping tables or manually reshaping files.

This Isn’t Just for Me

If supervisors genuinely want stronger financial oversight, we need better transparency tooling.

Not more PDFs.

Not more static reports.

Structured exports.
Queryable data.
Signal visibility.

This platform isn’t about accusation.

It’s about structure.

When you can see patterns clearly, conversations change.

What’s Next

There are still schema improvements to make:

Make invoice date nullable
Refine automated repair logic for consistent export defects
Track quarantined rows separately for review
Cross-reference with other exports

But the core is now stable.

No more manual SQL wipes.
No more hand-editing CSV files.
No more one-off scripts.

Just upload → validate → stage → replace range → commit.

What Would Make This Beautiful

Right now, the system works — but it isn’t elegant.

Each update requires a formal records request.
Each month requires communication.
Each export must be manually delivered.

Over time, that becomes slow. And costly. And unnecessary.

What would make this truly powerful — truly beautiful — is simple:

A monthly machine-readable export of the general ledger.

If a CSV were published (or even made available through a predictable endpoint), I could:

Automatically fetch the latest month
Validate it
Import it
Recompute signals
Update public-facing dashboards

All without human intervention.

That’s not about convenience.

That’s about transparency infrastructure.

When financial data is published in structured form on a regular cadence, oversight shifts from reactive to continuous. Instead of waiting for questions, the system itself surfaces trends — vendor concentration, unusual check sequences, timing shifts, fund movements.

Right now, the data exists.
It just isn’t operationalized for public analysis.

So for now, I file the request.
I receive the export.
I upload it.
I run the pipeline.

But the future version of this — the one aligned with real transparency — is automated, predictable, and routine.

Not because I need it.

Because a community that values oversight shouldn’t depend on manual extraction to see its own financial signals.

Closing Thought

This started with 0 MPG fuel reports.

It led to my first FOIA with the county.

It led to structured exports.

It led to deeper pattern recognition.

Oversight isn’t about suspicion.

It’s about clarity.

And clarity starts with clean data.