From Fuel Reports to Full Ledger Transparency
This didn’t start with the general ledger.
It started with fuel.
The Spark
While reviewing fuel usage reports in agenda packets, I noticed something visually strange — repeated 0 MPG calculations.
Not one or two.
Many.
I brought it up. It was noted. Then it resurfaced months later.
But I couldn’t shake the question:
Was this just a formatting issue… or a data issue?
So I tried extracting data from the PDFs embedded in the Board of Supervisors’ agenda packets.
That’s when I ran into the first wall.
Every month, the report format changed.
Column order shifted. Headings moved. Totals appeared in different positions. It wasn’t structured data — it was a formatted report.
If I wanted answers, I needed the source data.
My First FOIA
So I filed my first FOIA request with the county (25-849) to obtain a raw export.
When I received it, I started with Google Sheets.
Even at that level, patterns emerged quickly — inconsistencies, edge cases, structural anomalies. I compiled a report and submitted it.
That’s when I realized:
If structured data can reveal signals in one report, what might be visible in the full Accounts Payable ledger?
So I filed another FOIA request (25-899) — this time for the AP report export.
That request involved significant back-and-forth communication. The initial response was a PDF version of the report. PDFs are readable, but they are not analyzable at scale. I asked whether a data export was available instead.
Eventually, I received the data in CSV format.
And almost immediately, I understood something important:
This wasn’t a clean database export.
It was a print-formatted report exported into CSV.
Headers repeated every page. Totals were embedded between transaction rows. Multi-line column titles were flattened awkwardly. In other words, it was structured like a human-readable report, not like a machine-readable dataset.
Still — the underlying data was there.
It just required parsing, validation, and reconstruction.
And that’s a challenge I was more than willing to take on.
What the CSV Actually Looked Like
Each file:
- About 2MB
- Thousands of rows
- Structured like a printed register
- Included page headers repeated every page
- Included multi-row column headers
- Included row totals (“CHECK TOTAL”) embedded throughout
- Included spacing and alignment artifacts from the AS/400 report layout
A header wasn’t one row.
It was three.
Every page repeated:
- A page banner row
- A county name row
- Two rows of wrapped column titles
- Cells were delimited with a comma-space
The data wasn’t tabular in the modern sense — it was visually formatted.
If I wanted usable data, I had to strip all of that away.
Building the Import Pipeline
At first, I overengineered it.
My original goal wasn’t just to import data — it was to detect changes between different exports covering the same date range. I started designing a system that would hash rows, compare historical snapshots, and flag deltas between versions.
Technically, it worked.
Practically, it wasn’t necessary.
I later realized I already had access to a journal export that reflects transactional changes. That meant I didn’t need a complex diff engine layered on top of the Accounts Payable report. I was solving a problem that another dataset already addressed.
So I stepped back.
Instead of building something intricate and brittle, I rebuilt the upload process from the ground up — simpler, safer, and purpose-built.
Initially, I had been running scripts manually.
Wiping tables.
Re-importing data.
It worked — but it wasn’t safe.
Now, the ingestion process is structured and controlled:
- Upload a CSV file
- Automatically validate its structure
- Parse and normalize each row (excluding page headers, embedded totals, and report artifacts)
- Stage the cleaned data in a temporary table
- Insert any new vendor lookups
- Compute the min/max date range represented in the file
- Delete existing ledger entries within that same range
- Insert staged data into the production ledger
- Commit everything inside a single database transaction
If anything fails at any step, the entire transaction rolls back.
No partial imports.
No destructive table wipes.
No uncertainty about what was replaced.
Just controlled, deterministic replacement of the exact date range being updated.
The end result isn’t flashy.
But it’s reliable.
And when you’re dealing with financial data, reliability is the foundation.
Data Integrity Safeguards
The upload script:
- Verifies the file begins with
AP308, - Skips repeated page headers
- Skips empty purchase order rows
- Ignores embedded “CHECK TOTAL” rows
- Validates numeric fields strictly
- Validates date formats
- Validates account number structure
- Validates money format (including negative values)
- Tracks min/max check dates
- Tracks min/max account paid months
If something doesn’t look right, it:
- Displays the row number
- Shows what failed validation
- Continues scanning up to 10 errors
- Refuses to complete the import
This gives me a chance to examine patterns instead of blindly accepting flawed data.
What I Found
1. /0000 Invoice Dates
Some invoice date fields contained: /0000
That’s not random corruption.
It’s legacy behavior.
The AS/400 doesn’t support nullable date fields. So unknown dates are represented as placeholder values.
It makes sense in that environment.
But in a modern schema, that field should be nullable.
I currently apply a temporary workaround during import, but I plan to update the schema properly.
2. Merged Fields: Account Paid + Net Amount
In some rows, I encountered this:
Acct Paid,Net2026/011141730.00 (corrupt)2026/01,1141730.00 (fixed)
That’s two fields smashed together.
When I inserted the missing delimiter manually, the row parsed correctly.
But that raises a deeper question:
Why did the delimiter disappear?
That’s not user error.
That’s not formatting.
That’s export behavior.
And because one affected row also introduced a blank field that shifted subsequent columns, I’m cautious about auto-repairing it until I fully understand the pattern.
3. Net Amount Attached to Check Number
Example:
Net,Check No541.41-3002090 (corrupt)541.41,3002090 (fixed)
Instead of a comma delimiter separating the net amount from the check number, the two values were joined with a dash.
At first glance, that dash raises questions. Is it a negative indicator? Is it part of the amount? Or is it simply acting as a misplaced separator?
In context, it is not a credit. It is not a negative value. It is two separate fields that lost their proper delimiter during export.
Once the fields were separated correctly, the row parsed as expected and aligned with surrounding transactions.
That distinction matters.
This isn’t a data-entry mistake by a department.
It isn’t a bookkeeping adjustment.
It’s a structural formatting defect in the exported report.
And structural defects are important to identify — not because they imply wrongdoing, but because they affect confidence in downstream analysis. When a delimiter disappears, columns shift. When columns shift, numbers land in the wrong fields. And when that happens silently, it can distort summaries, totals, and trends.
That’s exactly why the import script validates each column strictly and refuses to proceed when something doesn’t align.
Data integrity isn’t just about totals adding up.
It’s about fields meaning what they are supposed to mean.
Why This Matters
If financial oversight means scanning individual transactions one at a time, we will always be reactive.
Real oversight means revealing signals:
- Trend anomalies
- Vendor concentration shifts
- Outlier payments
- Date inconsistencies
- Batch irregularities
- Duplicate check behaviors
- Structural export defects
Signals require clean, queryable data.
The CSV exports themselves are formatted like printed reports, complete with row totals embedded throughout the file. Those totals must be stripped away to access actual transactional data.
Once the noise is removed, the signal becomes visible.
Why I Went Back to 2014
In February, I began requesting exports back to 2014.
That year matters.
It’s the earliest point tied to several issues with unclaimed checks.
If you want to understand a system, you don’t start at last month.
You start at the beginning of the pattern.
Now I can:
- Filter by vendor
- Filter by account structure
- Chart spending over time
- Compare account paid dates vs check dates
- Audit individual transactions
- Reconcile across exports
- Replace data ranges cleanly when updated exports arrive
All without wiping tables or manually reshaping files.
This Isn’t Just for Me
If supervisors genuinely want stronger financial oversight, we need better transparency tooling.
Not more PDFs.
Not more static reports.
Structured exports.
Queryable data.
Signal visibility.
This platform isn’t about accusation.
It’s about structure.
When you can see patterns clearly, conversations change.
What’s Next
There are still schema improvements to make:
- Make invoice date nullable
- Refine automated repair logic for consistent export defects
- Track quarantined rows separately for review
- Cross-reference with other exports
But the core is now stable.
No more manual SQL wipes.
No more hand-editing CSV files.
No more one-off scripts.
Just upload → validate → stage → replace range → commit.
What Would Make This Beautiful
Right now, the system works — but it isn’t elegant.
Each update requires a formal records request.
Each month requires communication.
Each export must be manually delivered.
Over time, that becomes slow. And costly. And unnecessary.
What would make this truly powerful — truly beautiful — is simple:
A monthly machine-readable export of the general ledger.
If a CSV were published (or even made available through a predictable endpoint), I could:
- Automatically fetch the latest month
- Validate it
- Import it
- Recompute signals
- Update public-facing dashboards
All without human intervention.
That’s not about convenience.
That’s about transparency infrastructure.
When financial data is published in structured form on a regular cadence, oversight shifts from reactive to continuous. Instead of waiting for questions, the system itself surfaces trends — vendor concentration, unusual check sequences, timing shifts, fund movements.
Right now, the data exists.
It just isn’t operationalized for public analysis.
So for now, I file the request.
I receive the export.
I upload it.
I run the pipeline.
But the future version of this — the one aligned with real transparency — is automated, predictable, and routine.
Not because I need it.
Because a community that values oversight shouldn’t depend on manual extraction to see its own financial signals.
Closing Thought
This started with 0 MPG fuel reports.
It led to my first FOIA with the county.
It led to structured exports.
It led to deeper pattern recognition.
Oversight isn’t about suspicion.
It’s about clarity.
And clarity starts with clean data.
