Introducing GTFS Best Practices

Earlier this year, a group of 17 organizations collected their separate experience, expertise, and interests into a collaborative project to develop and publish the GTFS Best Practices in an effort convened by Rocky Mountain Institute, an independent non-profit organization working to transform global energy use. Before the Best Practices, we already had the GTFS reference, of course, but this did not fully cover and clarify many important questions about how to describe transit services and ensure compatibility with all GTFS-consuming applications. In all, there are nearly 80 numbered practice recommendations for cases not addressed or ambiguous in the GTFS reference.

Why GTFS Best Practices?

A clear guide to GTFS Best Practices provides several benefits for all of us using GTFS data:

  • Transit agencies and their vendors can publish data more confidently and with less frustration. The improved clarity in GTFS Best Practices means there is less guesswork and experimentation involved to get the desired results in GTFS-consuming applications. Transit agencies can reference the GTFS reference and Best Practices in requests for proposals for GTFS-producing software to ensure data interoperability and better customer experiences.
  • GTFS-consuming software can reference the GTFS Best Practices to help GTFS publishers to improve their data. With established, shared best practices, we hope and believe this will lead to more consistent datasets, which will enable applications to scale more quickly.
  • Everyone who uses transit can benefit from abundant choice of transit applications and accurate, clear transit information. The introduction of GTFS Best Practices will allow validation software such as FeedValidator and gtfs-validator to check new aspects of data quality. Already, Mapzen has built quality checks into Transitland!

What’s covered and how are the GTFS Best Practices structured?

The Best Practices are organized by data file and by cases. The organization by file follows the Spec reference, with best practice recommendations tied to files and fields.

For example, here are some recommendations concerning shapes.txt.

Other parts of the Best Practices answer important questions for any transit provider who wants to publish high-quality GTFS that works with a variety of applications:

The GTFS Best Practices section that is organized by cases addresses transit some service features that need to be addressed across various files and fields:

With the many types of applications that use GTFS data, we designated which practice recommendations are useful for certain categories of applications:

  • Trip planners
  • Arrival predictions: software to generate real-time arrival estimates.
  • Timetables: These practices support the creation of timetables based on GTFS
  • Human readability: to unzip and examine GTFS files easily.

The process

The process of developing the GTFS Best Practices itself was valuable. With many of the principal players in the GTFS application landscape at the table, we were able to find and resolve disagreements and different expectations about GTFS practices, and to test out collaboration models to work on interoperable transit data formats. The working group followed this process to create the GTFS Best Practices:

  1. Collecting knowledge and needs: Participants brought issues they had encountered to the table and proposed how they might be addressed practice recommendations.
  2. The first draft of GTFS Best Practices was prepared by a team of Trillium consultants.
  3. The first draft was circulated to collect (abundant) comments and changes.
  4. After some further refinements to the draft GTFS Best Practices, the group voted on all of the practice recommendations.
  5. We discussed “controversial” practices (around which there was no clear consensus) in person as well as in on the #GTFS Slack channel to revise them so that more working group members would become amenable to recommended practice. Practices around which no consensus developed were discarded.
  6. Finally, we published the GTFS Best Practices at gtfs.org/best-practices.

As part of this process, the group also proposed several changes to the GTFS reference itself (see closed and open pull requests). Some GTFS Best Practices are still making it into the GTFS reference.

We designed the GTFS Best Practices process just around the needs of the working group. Later, discovered that we had followed a process that aligned with aspects of the W3C process for Working Groups. That is encouraging, because it leads us to think we are on the right track towards developing a reusable launching pad to continue improving GTFS and working on new transportation data specifications.

What’s next?

We need a consortium to continue transportation data specifications collaboration and development. Ad-hoc collaboration between a handful of interested parties is not enough to keep up momentum, confidently make decisions, and facilitate broad industry support. We need a permanent venue and active facilitation to continue the work.

In December of 2015 (“GTFS Today and Tomorrow”), I wrote that we need a central home and neutral space that allows companies and organizations to collaborate on transportation data specifications. This might include facilitating data specification development and refinement, hosting reference and training materials, and providing global directory of transit data. GTFS Best Practices represents a move towards that, and Mapzen’s participation in and support of the GTFS Best Practices effort made this vision more complete. (Mapzen is responsible for Transitland, a global directory and API for transit data.)

We need to continue to improve the GTFS and GTFS-realtime formats, and, further, to work on data formats for complementary non-transit modes.

  • GTFS needs a full overhaul of the fare model (see GTFS Fare Working Group). GTFS also needs to support integration of trip planners with ticketing and reservations systems.
  • GTFS-realtime also needs a clear and widely-accepted set of best practices. Transit needs a “GTFS sometimes” for service changes that occur on timeframe between what is typically put in a static GTFS feed and GTFS-realtime
  • Demand-responsive transportation (DRT) needs data specifications to describe service availability and support interoperable applications for its use and management. GTFS-flex is emerging as a way of describing demand-responsive service availability. TCRP G-16 (Development of Transactional Data Standards for Demand-Responsive Transportation) aims to support linking DRT applications (I currently sit on an advisory panel for the project).
  • Other data standards need to be developed and improved: for rideshare/carpool, ridehail, carshare, bikeshare, vanpool, road incidents, and for recording ridership (GTFS-ride) and planned transportation network changes (proposal for a General Network Feed Specification). For an earlier discussion, see the “Multimodal transportation data formats (& gaps) roundup”, July 2016.

How do we keep up momentum of GTFS Best Practices and move new fronts forward? Rocky Mountain Institute’s paper “A Consortium Approach to Transit Data Interoperability” lays out the case for a consortium or multiple consortia to manage the interlinked data specifications that support our multimodal transportation network. No single organization has the incentive, mandate, and resources to take on the task of developing, popularizing, and improving data specifications. Instead we need a consortium structure that facilitates collaboration between interested players and encourages the continued adoption and improvement of data standards.

Aaron is the founding principal of Trillium Solutions, Inc. He brings experience that includes 12 years of web-development with 8 years in public transportation, with knowledge of fixed-route transportation, paratransit, rural transportation, and active transportation modes. Aaron is a recognized expert in developing data standards, web-application design, digital communications, and online marketing strategy. He originally developed Trillium’s GTFS Manager, and has played a key role in the development of the GTFS data specification since 2007.