Life and Growth of Transport Data Standards – Lessons from 12 Years of GTFS

Introduction

The General Transit Feed Specification (GTFS), which describes public transit routes, stops, schedules, and fares, recently celebrated its 12th birthday! Over the course of its life, the GTFS has shown the transportation industry how a data specification can gain widespread adoption to enable new applications.

Of course, there are many other important parts of the transportation network besides public transit. How can we nurture and accelerate emerging specifications so that they can fill in data gaps and perform a similarly transformative function as GTFS?

The Interoperable Transit Data Consortium, in incubation by the Rocky Mountain Institute, is learning from the experience of GTFS and other data specs to create a reusable launch vehicle to fill gaps in transportation network data fabric.

To provide lessons for other specifications, here are some of the phases that GTFS has progressed (and is progressing) through on its way to maturity:

  1. Pre-specification – describing the use case
  2. Development of the specification
  3. Publish specification and provide training resources
  4. Development of validation tools
  5. Encouraging adoption through education and advocacy
  6. Experimentation and practice observation
  7. Ongoing specification development and governance

Phases

1. Pre-specification – describing the use case

First, what are the problem(s)  that need to be solved? And what are the envisioned outcomes that a data pipeline or application interface will need to support? Before working on the General Transit Feed Specification (GTFS), Google and TriMet had a clear understanding of a real use case. This is necessary for developing any practical specification. This first use case for GTFS, of course, involved describing transit routes, stops, schedules, and fares so that Google Maps could both plan trips and provide the information travelers need to identify transit stops, routes, and vehicles.

At the first phase it is only necessary for one organization to be involved.  In the case of GTFS, Google was the single organization that developed the first use case. But the specification was designed to be extensible to serve other applications, showing a lot of generosity and strategic forthought on the part of the Google Transit team. This is a distinct setup from data specifications created by a single organization that did not originally anticipate varied use cases and stakeholder organizations; while in some cases these specifications have been labeled “standards” through adoption by standards bodies, those specifications which stay closely tied to a single project or company have not not spurred industry transformations in the same way as widely-adopted specifications.

If many organizations have varied use cases, those many organizations should all be involved. But if the collaboration involves too many parties or if they are uninterested in standardizing common practice, that creates inertia and decision-making friction. If the specification is designed to be extensible, then it can grow to accommodate those use cases later.

2. Development of the specification

Based on the application use case(s), the data specification needs to be developed, taking into account technical considerations:

  1. Transmittal performance, security, and data privacy
  2. Technical choices that will make the data easy to check and validate
  3. Extensibility to support extra-spec experimentation

It is important for software development to occur alongside the development of a specification to make sure that it practical to implement. This is also an opportunity to catch any needs that were not first collected in the description of use cases. We see many examples of this process in action:

  • The W3C process requires “reference implementation” before any final adopted spec recommendation (in W3C parlance, “recommendation” = finalized/released specification. Prior to the “Recommendation” phase, a W3C working group issues a “Call for Implementations.”
  • GTFS was developed by Google, based on real data provided by TriMet. The GTFS data was immediately put into use. By contrast, the Transit Communications Interface Profiles (TCIP) process did not require working reference implementations. As a result, this standard proved impractical and is barely used.

3. Publish specification and provide training resources

A common data specification needs to be published at a centralized location. Ideally, the site with the data spec reference should also offer links to an introductory guide, training materials, example data, validation tools (see below), data directories, governance process, and applications and organizations that use the data format. If the format is intended to be used internationally, then it is useful to provide language translations. The “source documents” for a specification reference should also be available in a shared version-controlled repository space such as a GitHub repo so that users can propose and discuss modifications.

When GTFS was released it was called the “Google Transit Feed Specification” and hosted on a Google site for developers (see Spec as published in 2006). The Google brand as well as the carrot of a free trip planner (Google Maps) helped the specification to get broad reach. The specification was released under the Creative Commons license and is designed to be easily extensible to serve other use cases and applications. In 2009, some Google-specific language was removed and the data spec was renamed to the General Transit Feed Specification” to acknowledge the wider use of the Specification beyond Google.

Today, GTFS reference continues to be hosted on the Google Developer site. Some organizations have reported that the lack of neutrally-branded/shared territory creates a barrier to their full participation in the Specification community. To remove this barrier, the Interoperable Transit Data Consortium effort is currently beginning to move essential documentation and resources to GTFS.org.

Google’s GTFS documentation pages launched with a validator, an example feed, and an overview. Over time, relevant resources have become more abundant and fragmented. GTFS.org aims to once again offer a more centralized information resource for people using the Specification.

The GTFS (static) and GTFS-realtime reference documentation is currently also made available on GitHub, which makes it possible for people to branch the reference documentation to propose and discuss changes.

4. Development of validation tools

Validation tools help to enforce and guide conformance to the Specification. Validation tools that are current with revisions to a core specification and recommended practices will help to promote consistent practices and make specification users’ lives easier.

GTFS had feed_validator.py from 2007, almost right from its beginning. Today, there are other GTFS validators including Conveyal’s gtfs-validator, based on the onebusaway-GTFS library, and Conveyal’s gtfs-lib.

5. Encouraging adoption through education and advocacy 

The best way to promote adoption of a specification is through compelling applications. “If you want x, then provide data in this format.” It is therefore useful to lead with compelling use cases and applications, and then to make sure that the specification is easy to find and use (see “Publish specs and provide training resources”).

GTFS adoption was originally spurred by a very compelling and free GTFS-consuming application, Google Maps. From the beginning of GTFS, there was a concerted effort to highlight GTFS applications and encourage agencies to publish GTFS, which continues today.

In 2008, Joe Hughes, (at the time an engineer at Google, now at Citymapper), presented the case for data sharing by inventorying GTFS-consuming applications and projects (see presentation “better faster cheaper”). Joe shared notes on emerging GTFS applications on his Headway blog. In 2009, the non-profit Front Seat, with a group of partners, built City-Go-Round to help transit agencies and end-users see the value of GTFS data (data = useful applications) at a time when there was active resistance to the idea of open transit data sharing. Today, the most complete and current inventory of GTFS-consuming applications is on TransitWiki.

TriMet was an active evangelist in the open transit data movement, with champions sharing their thinking and experience publishing GTFS data (see Tim McHugh’s presentation “Leveraging Resources for Customer Information by Exposing Transit Data” and Bibiana McHugh’s “Open Data and Open Source Software”). TriMet shared their experience and practices broadly, including in an interview I conducted with them in 2008 (“Open source and open data make for transit innovation”). At this phase, other voices beyond Google’s were crucial to advocate for adoption of GTFS. GTFS industry adoption was sped by a major global corporation that offered a compelling use case (Google), combined with the supportive voices of forward-thinking transit agencies, hobbyists, and independent professionals.

6. Experimentation and Practice Observation 

Once a specification is in widespread use, a body of practice will form along with an accumulation of data. Systematically monitoring data practices and actively facilitating collaboration opportunities will support a specification’s continued relevance and success.

GTFS was released in 2006 (see original spec) as the Google Transit Feed Specification, without transfers.txt, frequencies.txt, parent stations, and many other fields. Over the course of about two years from 2006 to 2008 (see spec revision history), GTFS was rapidly ammended: new fields were added, some required fields were made optional, and some problematic aspects were removed (like the stop address fields, alternatives to specifying lat/lon).

Over the life of the GTFS, the community of users have used a variety of methods to track who is publishing data, how the Spec is being interpreted and extended, and tracking new needs and opportunities. Among these methods are:

7. Ongoing specification development and governance

Successful specifications need a process for adapting to new use cases and changing industry needs. The community of users needs a way to propose and discuss changes. This community also needs a clear, trusted process to agree on the changes to be implemented so that common interests will continued to be served.

Today’s change guidelines for GTFS are very similar to the original guidelines. A crucial feature is that the practicality of all future changes to GTFS must be demonstrated by a working application implementation before can be accepted into the Specification. Change proposals that are led by strong champions with active technical leadership are most likely to be successful. These change proposals are most often led by someone with strong stake in the outcome (i.e. they represent the interests of a GTFS producer or consumer).

GTFS has benefited from the involvement of large and small organizations, and hobbyists. Hobbyists have built GTFS Data Exchange and TransitFeeds. Hobbyists are also somewhat more likely to submit spec pull requests (proposed amendments), since it’s easier for them to sign Google’s Contributor License Agreement. But there have been times when the two types clash. For example, passionate maintainers of a single feed objecting to changes that would be beneficial, on the whole, but don’t fit their one feed. In most cases, the rules to require supporting consuming and producing applications have served to maintain a practical approach and guide amendments to the spec.

GTFS Best Practices Working Group members had the insight that the reason many important discussions and useful proposals have languished on gtfs-changes for years is that it is difficult to get the attention of all the necessary decision-makers at the same time. We theorized that organizing or once- or twice-per-year sprints would help to focus the attention of key individuals and arrive at more decisions.

Observations from the GTFS experience

With 12 years of GTFS now, we can draw on lessons to see what works and what does not work — both for GTFS and other data specifications. [See: Multimodal transportation data formats (& gaps) roundup, December 2017]

Tracing back over the life of GTFS, we can see the milestones and phases through which it has progressed on the way to widespread adoption. Other specifications will need to pass through many of the same milestones. Here are some top lessons from the GTFS experience:

  1. A good data specification will not become widely-adopted on its own, just because it works. An active advocacy and education effort is necessary to encourage and support adoption with practitioners and policy-makers.
  2. To enforce practicality in specifications, require reference implementations.
  3. To allow a data specification to evolve, design for extensibility and then track how users adapt the specification, bringing lessons back into the core (shared) specification.
  4. Strong facilitation and focal points for collaboration are necessary throughout the life of the data specification. This can come from many different angles, but a purely ad-hoc approach leads to slow change, and a isolated islands, rather than shared understanding, vision and collaborative spaces.
  5. Validation tools are essential to enforce data quality.

There are so many needs for interoperable transit data standards — to build better apps and tools to help travelers better use the transportation network, and to help our cities make planning decisions. Accelerating the pace of data specifications development and adoption is key. The adoption of new transportation data specifications can be accelerated by facilitation of an Interoperable Transit Data Consortium, which develops and operates process, tools, and programs that are informed by following some of the lessons from the GTFS experience and other successful data specifications.