Referential integrity
A GTFS feed is a small relational database with ~20 foreign-key relations. Editing it naively — drop a stop, rename a route — is a fast way to produce a feed that validates structurally but fails in Google Maps because stop_times now reference nothing. Referential integrity is the property that rescues you from that.
gapline enforces integrity on every write. No edit reaches disk until the integrity model has confirmed that the result is consistent.
The model
Section titled “The model”Internally, gapline maintains a reverse-index for every primary key used as a foreign key elsewhere. When you load a feed, the index maps:
stops.stop_id → rows in stop_times, transfers, pathways that reference itroutes.route_id → rows in trips, fare_rules that reference ittrips.trip_id → rows in stop_times, frequenciescalendar.service_id → rows in trips, calendar_dates…Queries against the index are O(1) per PK — building the cascade plan for a delete on a mid-sized feed takes milliseconds.
Deliberately, the index is just hash maps: no graph library, no path-finding, no cycle detection. GTFS foreign-key chains are shallow (at most 2–3 hops) so this is enough.
Delete: automatic cascade preview
Section titled “Delete: automatic cascade preview”delete cannot orphan dependents. When you run:
gapline delete stops --where "stop_id=S01"gapline:
- Computes the set of
stop_id = S01matches instops.txt. - For each match, walks the reverse index to find every row in every dependent file that references the match transitively.
- Prints a preview:
Records to delete from stops.txt:S01Deleting would also delete:- 83 records in stop_times.txt- 2 records in transfers.txtProceed with cascade delete? [y/N]
- Applies the plan only after you confirm (or if you passed
--confirm).
There is no --cascade flag on delete because cascade is the only safe default. If the target has no dependents (for example, calendar_dates.txt is a leaf), the prompt simply lists the matched rows.
Update: PK rewrites need --cascade
Section titled “Update: PK rewrites need --cascade”A non-PK update (say, changing a stop_name) touches only the target file. No cascade is needed, no cascade is computed.
A PK update (say, renaming stop_id=S01 to stop_id=STOP_MAIN) is different: every row in every dependent file that references the old PK needs to be rewritten to reference the new one. --cascade opts into this rewrite:
gapline update stops \ --where "stop_id=S01" \ --set stop_id=STOP_MAIN \ --cascade --confirmWithout --cascade, the PK rewrite is refused before it starts — the command would otherwise orphan every stop_times row that references S01.
Create: FKs must resolve
Section titled “Create: FKs must resolve”create refuses to insert a record whose foreign-key fields do not point to existing rows. For example:
gapline create stop-times --set trip_id=UNKNOWN stop_id=S01 ...fails immediately with an fk_violation error — trip_id=UNKNOWN is not in trips.txt.
Why it matters
Section titled “Why it matters”A feed that passes structural validation but has orphaned references is the worst class of broken: it looks fine in a quick check but fails silently in production. Consumers handle orphans inconsistently — some skip the affected rows, some reject the feed entirely, some render partial data and never surface the error.
By enforcing integrity at write time, gapline makes this class of bug impossible to create through the CLI. The trade-off is that delete and update --cascade need to plan the full cascade before applying it — usually a few milliseconds, occasionally a few seconds on very large feeds.
See also
Section titled “See also”gapline update—--cascadesemantics.gapline delete— automatic cascade preview.gapline create— FK resolution on insert.- Guides / Editing with CRUD — walkthrough with a real feed.