Open data

Does any law require D.C. agencies to open their data?

Not yet. Many cities have such laws (New York City, San Francisco, etc.). Federal agencies took steps as well to open data under Office of Management and Budget direction in the Obama administration. Open federal data is now a legal mandate through the Open, Public, Electronic, and Necessary (OPEN) Government Data Act, included as Title II of the Foundations for Evidence-Based Policymaking Act (H.R. 4174) that took effect in January 2020.

Assuring openness by law in the District has been under discussion among officials and advocates. The D.C. Open Government Coalition worked with others to develop open data provisions included in the Strengthening Government Transparency Amendment Act introduced in the D.C. Council in March 2017 (Bill B22-0188) but never considered.

Meanwhile D.C. government data are open by policy set in two directives of the D.C. mayor.

So what is the open data policy of the D.C. government?

Mayor Muriel Bowser issued Order 2017-115 in April 2017, for the first time declaring that:

“the greatest value from the District’s investment in data can only be realized when enterprise datasets are freely shared among District agencies, with federal and regional governments, and with the public to the fullest extent consistent with safety, privacy, and security. “Shared” means that enterprise datasets shall be:

1. Open by default, meaning their existence will be publicly acknowledged, and further, if enterprise datasets are not shared, an explanation for restricting access will be publicly provided;
2. Published online and made available to all at no cost;
3. Discoverable and accessible;
4. Documented;
5. As complete as can be shared;
6. Timely;
7. Unencumbered by license restrictions; and
8. Available in common, non-proprietary, machine-readable formats that promote analysis and reuse.”

What progress has been made locating and opening data?

D.C. government agencies covered must inventory their data and make much of it public. The citywide list is to be updated annually and published in March coincident with Sunshine Week. The annual report of the Chief Data Officer, also issued in March as directed by Mayor’s Order 2018-050, gives further details. A Draft Technology Strategic Plan for DC: Unleashing the Possible, issued November 2019, also discusses the role of data, chiefly in terms of “mission use cases” or how agencies can use big data and artificial intelligence to support their own work. Further thoughts are at the end of this page on that theme–the lack of data and evaluation to shed light on service to public users.

The most recent annual report, issued in March 2020, says that as of December 10, 2019, 82 D.C. Government agencies recorded 1,915 enterprise datasets (a net increase of 136, with some datasets retired). Education, transportation and campaign finance agencies added the most datasets. The 82 agencies (seven more than a year earlier) include almost all mayoral agencies but only 16 of over 60 independent agencies. (The nonresponding agencies are not identified.) Unlike mayoral agencies, independent agencies such as housing and transit authorities governed by separate boards and commissions are not bound by mayoral orders but were “strongly encouraged to voluntarily comply.”

Detailed facts of all 1,915 datasets are available in various formats.

To decide what to make public, agencies are required to classify each dataset by sensitivity – using a five-level system from zero (least sensitive) to four (most sensitive). Agencies reported they hold 859 datasets (up from 794 last year) assessed as Level 0 in sensitivity. Such datasets are considered “open,” able to be proactively released to the public. (The total for Level 0 is reported as 859 on p.10 of the report; it is reported as 873 on p. 12. We have asked the agency to clarify.)

Release is still a work in progress—231 or over a quarter are not yet posted. The 628 datasets now released (51 more than the 577 released at this time last year) can be accessed at the D.C. Open Data Portal.

The central staff responsible for the open data program maintains a site for discussion of requests and problems at GitHub.

What non-sensitive data are not yet public?

For the 231 of the 859 or 27 percent of Level 0 datasets not yet released, all by definition non-sensitive, this annual report like its predecessor offers no explanation. It again promises that the Office of the Chief Technology Officer (OCTO) will continue to work with agencies, the Open Government Advisory Group (a mayorally-appointed panel chiefly of agency reps that was stalled with unfilled positions for a year but revived with new appointees early in 2020), and the community “to prioritize posting these remaining Level 0 datasets.”

Unreleased datasets are listed on the open data website but the dates of the last update shown on the landing page suggest the data are stale (not updated since last year). The 2020 report does not report on results of a review planned for 2019 by the Office of the City Administrator to look at the shortfall of agency performance in publishing the Level 0 datasets.

And what data are too sensitive to release?

Agencies classified the remaining 1,056 datasets as too sensitive to be released. They fall into four levels of increasing concern:

Sensitivity level	Definition	Number of datasets (as of December 10, 2019)
1	Public but not proactively released because of concerns over safety, privacy, security or legal issues	179
2	Not highly sensitive but subject to a FOIA exemption and nonpublic, for internal government use only	219
3	Confidential, protected by law — especially education and health records	562
4	Restricted confidential, release could cause significant damage or injury to persons or impair agency ability to do its work	97

Source: D.C. Chief Data Officer 2020 Annual Report.

Agencies decide the classifications, but the extent of review of those decisions remains obscure. A handbook gives a blizzard of details on data elements but very limited elaboration on the four sensitivity categories. The annual report does not discuss accountability and quality control. As with any secrecy scheme, over-classification by cautious officials is a constant threat.

Hesitation may also stem from D.C. Auditor criticism in 2017, following major security incidents and review of a few agencies’ policies and procedures, that far better protection was needed for personally identifiable information.

The report has no further information on whether the Chief Data Officer reviews agency dataset classification decisions to check validity and reliability — that is, whether the assigned sensitivity levels accurately reflect the datasets and correctly apply the level definitions.

The 2019 report offered evidence of mis-classification drawn from instances where agencies that denied FOIA requests for secret datasets were eventually reversed on appeal. A brief text in the March 2020 report (p.17) refers to ten examples of mis-classification reported last year. The OCTO authors reviewed another year of FOIA requests and found “of the datasets described in last year’s report, we didn’t make much progress.” That is, errors continued to be uncovered as FOIA requests succeeded? Were there successful FOIA requests (or appeals) for other unreleased datasets (beyond the ten already known)?

This brief text hints that the validity of the key system feature, sensitivity ratings, remains legally questionable. The new report lacks detail on the scope of this fundamental problem and any corrective action plans.

The FOIA data showing agency error is important to allow targeting scarce corrective action resources on agencies most often getting dataset classification wrong. For example, from our own FOIA experience with non-data requests, the Coalition believes the Metropolitan Police Department more than other agencies withholds all types of records incorrectly, based on an overbroad definition of invasion of privacy.

If data have been overlooked or incorrectly classified, what can the public do?

The mayor’s 2017 order warns that it creates no new rights and that the order may not be enforced in court.

It provides no administrative procedure (within the government) for users to appeal decisions affecting open data.

It does say the Chief Data Officer is responsible for “receiving and responding to public input regarding the District’s data policy and activities.” But agency data officials responsible for their own inventory and classification have no assigned responsibility to the public. Accountability is only through the normal chain of command from elected officials to appointed executives and career staff.

The D.C. Open Government Coalition is interested to hear from data users who find problems with the inventory or classification, or other issues in accessing datasets, and have not been able to resolve them with government officials.

For example, users seeking details on exactly what’s in a dataset have told the Coalition that D.C. agencies often cannot provide a dictionary when asked under FOIA. They’re often not withheld–such a record just doesn’t exist.

The government’s DC Open Data Handbook (relatively new guidance to agencies from the central data office) states, “Each dataset should include a data dictionary as a separate document. The data dictionary lists the table structure, with each column defined in easy to understand terms. This includes providing the values and descriptors for any domains.”

Non-technical documentation of computer data is generally poor, according to information technology experts, and data dictionaries in particular are rare. When they do exist, they are neither up-to-date nor widely accessible. If you have asked for a dictionary of the data in a D.C. agency dataset and encountered problems, please let us know at info@dcogc.org.

Can I still use D.C. open records law to get D.C. data?

Yes. Data have always been considered electronic records available by request under the D.C. Freedom of Information Act. A dataset request is therefore subject to the act’s deadlines, exemptions, fees, appeal process and ultimately the right to sue to challenge agency errors. The open data policy does not change that.

The mayor’s 2017 order recognized useful interplay between FOIA and the new proactive data publication system being established:

“On the one hand, FOIA request-tracking data should inform public bodies about the demand for and priority of publishing certain datasets or derivatives of those datasets as Level 0, Open. Similarly, successful appeals for datasets previously denied under FOIA exemptions can inform public bodies about potential errors in dataset classification. On the other hand, publication of FOIA request-tracking data can help residents hold public bodies accountable for the timely and consistent processing of requests.”

For data not published on the data portal, the order thus recognized that a FOIA request is in essence a test of the mayor’s classification system that has placed now over 1,000 datasets in Levels 1-4 off-limits.

Last year’s report noted ten different examples in 2018 where data had to be released pursuant to D.C. FOIA, when requesters couldn’t find it on the data portal of opendata.dc.gov. These included data on: police arrests, dockless bikes and scooters, surveys of charter school facilities, health inspections, health professional licenses, moving violations by city vehicles, vacant property determinations, and more. As discussed just above, the 2020 report says another year’s review of FOIA requests and appeals showed small progress in curing the problem.

Tracking FOIA requests for analysis is hindered by the fact that six years after the public online request portal system began it is not fully used: more agencies use it (64 in 2019, up from 55) but only 80 percent of 10,836 FOIA requests in fiscal year 2019 were captured in the system database, a tiny uptick from the 76 percent in 2018.

The portal is proprietary software known by the trade name FOIAXpress, developed for federal agencies and purchased years ago from its developer, AINS. The D.C. Open Government Coalition called for a review of the system in testimony on behalf of dissatisfied public users at an oversight hearing of the D.C. Council in February 2019.

The Coalition met repeatedly in 2018 and 2019 with D.C. officials to press for consideration of alternatives and meanwhile propose improvements in the present user experience. The portal managers at OCTO were responsive and many improvements followed. (Slides from an OCTO presentation in July 2019 describe Coalition suggestions and upgrades in response.) A major remaining goal is that the software work equally on mobile and laptop/desktop platforms. The report says the vendor has made “significant progress on a mobile-friendly version” and officials “hope to test it this spring and release it soon after.” The District is not considering a new system, though many states and some individual cities enhance requesters’ experience by adopting next-generation software. Advanced tools of artificial intelligence built into some vendors’ portal software can suggest alternative available sources in real time, even as a requester types the records wanted. D.C. agencies logs thousands of FOIA requests each year unprocessed other than diverting them elsewhere.

The FOI statute requires several reports each year on request processing, appeals and litigation, submitted by the mayor and attorney general to the D.C. Council each February with data on the prior fiscal year. Public use of the reported data has been limited to downloading the data as pages, PDF files archived on the website of the Office of the Secretary.

As the policy called for, and as promised in the 2019 report, the District’s annual FOIA reports have this year been published as open data in addition to PDF format. (Metadata are here.) This allows easier analysis for the technically adept. The Open Data Report begins to use the data, for example to describe the problem that some agencies are not using the central FOIAXpress portal—inconveniencing over 2,000 requesters.

The FOIA processing report is still just a year’s worth of raw data–a giant table of 80 rows (agencies) and 36 columns (data elements about FOIA request processing such as record releases and denials, timeliness, exemptions used in denials, etc.). The D.C. Open Government Coalition has for years called for greater analysis of the performance of the open records system in the District. The new, more accessible data may encourage that.

Are D.C. data available in any other way?

Yes. In addition to the data published on the D.C. government Open Data Portal, and requests to individual agencies via D.C. FOIA, others make data available.

The local brigade of volunteers that are part of the national nonprofit Code for America has used datasets from many sources in the District in projects in recent years. See their “ANC Finder” here. They have also tested data availability by submitting FOIA requests to D.C. agencies. The Code for D.C. website includes links to over 500 archived datasets from past projects. They hold monthly meetings that feature work sessions and new project pitches. Their data portal is here.

For an example of creative use of open data from Washington Metro Area Transit Authority sources, explained in detailed methodology notes, see the report by MetroHero, Metrobus Report Card: Grading the Performance of Buses in the D.C. Priority Corridor in May 2019.

So, how is D.C. doing with opening data?

The Coalition welcomes the candid report and joins OCTO in hopes for continued work on basics: improving inventory participation (especially by independent agencies), improving classification decisions (showing a suspicious degree of error when subjected to FOIA legal review), and helping agencies lagging in publishing non-sensitive datasets.

The report notes a year of accomplishments, concluding with a dozen or more useful examples of successful complex technical work where government-wide improvements (for example, a common directory of all D.C. addresses for all agencies to use) and exchange of datasets helped D.C. government agencies do their work and communicate with the public. The efforts brought outside accolades for good work. For example, D.C. won Gold Certification, with six other cities, in the What Works Cities competition that judged the use of data in decisions. D.C. agencies’ use of mapping software won an industry award.

But the big picture? So far the reporting is chiefly focused inward, to demonstrate how Big Data can help the government. (After all, that is the OCTO mission, assigned in its statute, “to help District departments and agencies provide services more efficiently and effectively.”)

Yet enhanced data access is justified equally for its public use — adding efficiency to business and economic life, as well as powering broader research and analysis of all kinds including for accountability. Transport for London, for example, hired Deloitte to do the math and found $150 million in annual economic benefits and savings for travelers and the agency from open data. Progress on that external use case can’t be gleaned from the D.C. report. The kind, quantity and quality of public access and use is simply a story waiting to be told.

It appears from the lack of reported data that systematic engagement with public users remains a work in progress. (Compare, for example, the New York City data portal, which lists seven reports in the last two years about user experience.) The FOIA portal languished for years as an unfriendly place to begin a FOIA request as it was no one’s job to gather user views and act on them. It is unknown whether user reviews are collected on the much newer D.C. data portal. Many cities and states have such portals, allowing interesting comparisons. The open data movement has begun to develop an evaluation literature.

Bottom line: it’s time for the District to visibly embrace public users as a vital constituency for open data including addressing them in future reporting.