Skip to content

GTFS feeds

A GTFS Schedule feed is a collection of CSV files packaged together that describe a transit network: routes, stops, schedules, fares, shapes. gapline works with the same 17 files defined by the GTFS Schedule spec.

Both forms are accepted anywhere a feed is consumed:

Terminal window
gapline validate -f gtfs.zip
gapline validate -f ./gtfs/

The ZIP path is usually what agencies publish. The directory form is convenient when you are iterating locally — unzip gtfs.zip -d ./gtfs/ once, then work on the files directly. gapline does not require the ZIP to be re-packed between iterations.

When writing back (via --output, update, delete, or a .gl save), the output format mirrors the input: read from a ZIP, write a ZIP; read from a directory, write a directory.

Every file below lives at the root of the archive (or the directory). gapline parses all of them and enforces validation rules on all of them.

FileRequired by the spec?CRUD target name
agency.txtRequiredagency
stops.txtRequiredstops
routes.txtRequiredroutes
trips.txtRequiredtrips
stop_times.txtRequiredstop-times / stop_times
calendar.txtConditionallycalendar
calendar_dates.txtConditionallycalendar-dates / calendar_dates
shapes.txtOptionalshapes
frequencies.txtOptionalfrequencies
transfers.txtOptionaltransfers
pathways.txtOptionalpathways
levels.txtOptionallevels
feed_info.txtOptionalfeed-info / feed_info
fare_attributes.txtOptionalfare-attributes / fare_attributes
fare_rules.txtOptionalfare-rules / fare_rules
translations.txtOptionaltranslations
attributions.txtOptionalattributions

“Conditionally required” means at least one of calendar.txt or calendar_dates.txt must be present.

All 17 targets are supported by the CRUD commands (read, create, update, delete). Both the kebab-case and underscore spellings resolve to the same target.

  • File structure. Missing required files, empty files, or CSV rows that do not match their declared header are ERROR-level findings. Validation downstream of these does not run meaningfully — fix the structural issues first.
  • Primary keys. A duplicate stop_id, route_id, or similar is always an error. gapline relies on PK uniqueness for the reverse-index used by referential integrity.
  • Foreign keys. stop_times.stop_id → stops.stop_id, trips.route_id → routes.route_id, and the other FK relations defined by the spec are enforced. Orphans are errors (except calendar_dates.service_id referencing a service that is not in calendar.txt, which is a warning — the spec allows date-only service definitions).
  • Required field values. Missing values in spec-required columns are errors.
  • Unknown columns. Extra columns not defined by the spec are informational findings (unknown_column), not errors. Agencies often ship proprietary extensions; gapline preserves them when writing back.
  • BOM. A UTF-8 byte-order mark at the start of a file is accepted silently. The parser strips it before CSV parsing.
  • Column order. The spec does not mandate a column order; gapline accepts any order as long as all required columns are present.
  • Whitespace in values. Leading/trailing whitespace inside a quoted value is preserved. Unquoted values are not trimmed.

GTFS is defined as UTF-8. gapline rejects non-UTF-8 files with a clear message (which file, which byte offset). If an agency ships Latin-1 data, transcode it before handing the feed to gapline:

Terminal window
iconv -f latin1 -t utf8 stops.txt > stops.utf8.txt
  • GTFS-Realtime. Not supported. GTFS-RT is a separate protobuf-based spec; only Schedule is covered.
  • GTFS-Flex and Fares v2. Not supported at this time.