GTFS Today, and Tomorrow

Trillium started because of GTFS (the General Transit Feed Specification). Since then, a primary mission of this company has been to facilitate the development and adoption of open GTFS and other transportation data. Today, GTFS adoption has far exceeded critical mass. The Specification has transformed the transit software marketplace. In conversations with GTFS stakeholders, I hear variants on the question, “Where do we go from here?” I have attempted to articulate a vision that answers this question and contributes to the conversation.

First, a bit of background. In 2005, TriMet sent selected elements from of its schedule database to Google, which then used that database schema as the basis for the Google Transit Feed Specification (GTFS). Today, GTFS datasets are publicly available for at least 1,000 transit agencies. The GTFS datasets that are public may just be the tip of the iceberg. Google shows about 6,200 cities around the world covered with transit information, though routing is not offered for all these cities, nor is it publicly known what data formats sit behind this transit coverage.

In 2010, the Google Transit Feed Specification was renamed the General Transit Feed Specification to accurately represent its use in many different applications outside of Google products. In 2011, a companion specification, GTFS-realtime, was introduced.

After its start with a single consuming application, GTFS now is used in hundreds of applications. As the use of the GTFS has spread and needs for transportation information have grown, the collaborative structure and tools around the specification have not kept pace with its phenomenal success. These are some of the needs that come up:

  1. A directory to help to disseminate, discuss, and annotate GTFS/-realtime datasets
  2. Best practices or style guide for GTFS/-realtime
  3. A neutral, and inclusive home for GTFS & GTFS-realtime, including reference, documentation, training, and tools

A response to each of these needs is outlined below.

A directory to help to disseminate, discuss, and annotate GTFS datasets

3rd party application developers need tools to find GTFS datasets efficiently (say, based on region), and to determine if the dataset is available under a license that is compatible with their intended use. Making it easy to find GTFS and GTFS-realtime data is crucial to support abundance in the application market.

Also, a catalog of GTFS and GTFS-realtime data will make it easy to observe practices and support high-quality communication around Spec use. Early on in the life of GTFS, GTFS Data Exchange, developed and managed by volunteers, emerged as the world’s directory of transit data. Today, the site has become unreliable, in part because it depends on busy volunteers and lacks a dedicated funding stream.

There seem to be a few options emerging that might replace GTFS Data Exchange. One such tool is Transitfeeds created by Quentin Zervaas. Its data inspection features are useful to quickly point to an element in a public GTFS dataset in conversation with other developers. The site also includes features for GTFS-realtime, and links to App Centers. transitland created by Mapzen provides a feed registry which is in its early stages. Its datastore API promises a powerful way of accessing parts of GTFS data feeds via machine-readable JSON.

A data directory is a crucial part of the GTFS ecosystem. We need a directory that is geographically comprehensive, covers both static and realtime data, and includes tools to quickly inspect and share data elements. GTFS Data Exchange shows us that volunteers make amazing contributions, but the community can only lean on their largesse for so long. It’s important for the GTFS community to have a site with institutional funding or a revenue model that can sustain operations over the long-term.

Best practices guide for GTFS (and GTFS-realtime)

The Specification provides a lot of freedom: many fields are optional, and it often allows multiple approaches for describing a given transit service. Mere conformance with the Specification does not equate to good data and a high-quality experience in transit applications. A comprehensive best practices guide would help data publishers to improve the quality of their feeds and the outcomes in 3rd party consuming applications. Armed with best practices, transit agencies can be more confident in the data they publish, and hold their vendors accountable for producing quality data.

Google published the first such guide, Google Transit Data Provider Best Practices, which is organized around the particular needs of trip planning in Google Maps. Arrival estimation systems and timetable generation systems, for example, also consume GTFS and have particular needs for that GTFS data (e.g. vehicle blocks, trip directions). A central compendium of requirements and recommendations, both universal and application-specific, would be very useful for the feed publisher that is trying to publish versatile, useful data. A preliminary documentation effort for best practices has begun in a Google Doc (see GTFS-changes).

Agreement on universal practices would help to eliminate an inefficient workaround that Trillium has observed —  some agencies publish a few variations of GTFS datasets in order to satisfy requirements of different systems.

A neutral and inclusive home for GTFS & GTFS-realtime, including reference, documentation, training, and tools

Currently, the GTFS reference, best practices, feed directory, and training materials are distributed across many websites. A certain amount of decentralization is inevitable and even desirable in a vibrant and diverse environment. However, the current highly fragmented information turns away confusion-adverse newcomers. Consolidating resources in one online home, managed by the same organization that governs the Spec, will facilitate wider adoption and more rapid evolution of GTFS and GTFS-realtime.

This central home needs to be a neutral space that allows competing companies to collaborate on an equal level. Google has done a fantastic job as a manager and steward of the GTFS from its beginning. Today, there is an abundance of other applications that use GTFS, and rival companies AppleMicrosoft, and Nokia also utilize GTFS. I have begun to wonder if as the home for official GTFS documentation poses an impediment to some other stakeholders feeling welcome and empowered in the group. It may be largely an issue of optics; nonetheless those perceptions do matter.

What neutral group could offer a home for the GTFS if not Google? Some options include the W3C, International Organization for Standardization (ISO), European Committee for Standardization (CEN), Open Geospatial Consortium (OGC), or a different or entirely new organization. This route would shift GTFS away from Google’s concentrated (and benevolent) leadership. We then open the question and problem of governance. We do not need a whole new governance approach as it has worked well on the whole; instead I advocate it should be amended, not massively overhauled, to preserve the practical elements that have made it work. The organizational structure must continue to allow major stakeholders — organizations and individuals actively working with GTFS — to maintain control over the Specification.

The community of people and organizations using GTFS and GTFS-realtime needs to come agreement about an organizational structure to support these outlined needs, how to pay for and support its activities, and how to govern the specifications.

Aaron is the founding principal of Trillium Solutions, Inc. He brings experience that includes 12 years of web-development with 8 years in public transportation, with knowledge of fixed-route transportation, paratransit, rural transportation, and active transportation modes. Aaron is a recognized expert in developing data standards, web-application design, digital communications, and online marketing strategy. He originally developed Trillium’s GTFS Manager, and has played a key role in the development of the GTFS data specification since 2007.