Open data

Does any law require D.C. agencies to open their data?

Not yet. Many cities have such laws (New York City, San Francisco, etc.). Open federal data is now a legal mandate through the Open, Public, Electronic, and Necessary (OPEN) Government Data Act, included as Title II of the Foundations for Evidence-Based Policymaking Act (H.R. 4174) that took effect in January 2020. (See Congressional Research Service explainer here.)

Assuring openness by law in the District has been discussed among officials and advocates but never enacted. The D.C. Open Government Coalition worked with others to develop open data provisions included in the Strengthening Government Transparency Amendment Act introduced in the D.C. Council in March 2017 (Bill B22-0188) but never considered.

Meanwhile, D.C. government data are open by policy set in two directives of the D.C. mayor that lack the force of law or regulation.

What is the open data policy of the D.C. government?

Mayor Muriel Bowser issued Order 2017-115 in April 2017, for the first time declaring that:

“[T]he greatest value from the District’s investment in data can only be realized when enterprise datasets are freely shared among District agencies, with federal and regional governments, and with the public to the fullest extent consistent with safety, privacy, and security. “Shared” means that enterprise datasets shall be:

1. Open by default, meaning their existence will be publicly acknowledged, and further, if enterprise datasets are not shared, an explanation for restricting access will be publicly provided;
2. Published online and made available to all at no cost;
3. Discoverable and accessible;
4. Documented;
5. As complete as can be shared;
6. Timely;
7. Unencumbered by license restrictions; and
8. Available in common, non-proprietary, machine-readable formats that promote analysis and reuse.”

What progress has been made locating and opening data?

D.C. government agencies must inventory their data, evaluate its sensitivity, and make as much of it public as possible consistent with privacy and security. More exactly, agencies under the mayor must take part in the inventory and publication process. Unlike mayoral agencies, independent agencies such as the public library, housing authority, and charter schools governed by separate boards and commissions are not bound by mayoral orders but are “strongly encouraged to voluntarily comply.”

That has been done for seven years under plans developed by the Chief Data Officer (CDO), an official within the Office of the Chief Technology Officer (OCTO). A citywide list of datasets is updated annually and published in March coincident with Sunshine Week. The CDO issues an annual report in March as directed by Mayor’s Order 2018-050, with further details. Coverage in that annual report of the data inventory effort has diminished over the years as the data activities at OCTO have expanded and also as open data online details have expanded.

Open Data DC is the project name at OCTO for the work of curating agencies’ data sets issued to the public on the OpenDataDC.gov website in compliance with the mayor’s orders. The overall effort is described on a web page linking to different aspects of how users can draw on the data resource.

A private nonprofit called Data Community DC hosted a YouTube webinar in September 2023 where OCTO officials introduced the DC data access process, explained some technical access tools and illustrated the potential of government data with examples of nongovernment users including a homeowner, a business owner, and a graduate student. The group, known as “DC2,” offers community events on diverse topics related to data and uses.

The most recent annual CDO report, issued in March 2024, says that in the year 2023 78 D.C. Government agencies recorded 2,330 enterprise datasets. The 78 agencies include the mayoral agencies but only some of the 60 independent agencies. (The nonresponding agencies are not identified.)

Detailed facts of all 2,330 datasets are available in various formats.

To decide what to make public, agencies must evaluate the sensitivity of the information in each dataset by assigning a score on a five-level scale from zero to four (zero being least sensitive and thus open, the other four sensitive and thus not published).

Agencies reported in 2023 that they hold 1,061 datasets assessed as Level 0 in sensitivity–the ones eligible to be proactively released to the public without request.

Actual release, to be done by D.C. agencies under OCTO guidance, has been incomplete from the outset. The datasets now released and posted can be accessed, along with a wide variety of tools for manipulating and displaying data, at the D.C. Open Data Portal.  

What non-sensitive data are not yet public?

The mayor’s order requiring publication of Level 0 datasets is apparently not strongly enforced across agencies. The CDO reported this March that 298 datasets or 28 per cent of the 1,061 total classified as open are not yet published for public access. They are listed here. The CDO annual report offers no explanation for the datasets not yet released, all non-sensitive. It even lacks an OCTO promise to work on the problem (included in some early CDO annual reports). The Coalition has yet to hear the results of a review planned for 2019 by the Office of the City Administrator to look at the shortfall of agency performance in publishing the Level 0 datasets.

And why are some data too sensitive to release?

Agencies classified the remaining 1,270 datasets as too sensitive to be just openly released. They fall into four levels of increasing concern:

Sensitivity level Definition Number of datasets (as of December 2023)
1 Public but not proactively released because of concerns over safety, privacy, security or legal issues 187
2 Not highly sensitive but subject to a FOIA exemption and nonpublic, for internal government use only 213
3 Confidential, protected by law — especially education and health records 735
4 Restricted confidential, release could cause significant damage or injury to persons or impair agency ability to do its work 135
Source: D.C. Chief Data Officer 2024 Annual Report.

Agencies decide the classifications, but with modest guidance and no further review. OCTO guidance to agencies is in an Open Data DC Handbook (2022 ed.). It includes a blizzard of details on how agencies must describe their data but has a scant two pages on the critical evaluation of sensitivity that can result in a body of data being hidden from view as an official secret. The CDO reports in the 2024 annual report that sensitive information includes:

  • Critical Infrastructure Information
  • Criminal Justice Information
  • Beneficiaries Tax Information
  • Payment Card Information
  • Other Financial (personal and business financial information, non-tax)
  • Protected Health Information
  • Student Education Records

Hesitation may stem from D.C. Auditor criticism in 2017, following major security incidents and review of a few agencies’ policies and procedures, that far better protection was needed for personally identifiable information in government databases.

As with any secrecy scheme, over-classification by cautious officials is a constant threat. The 2024 report has no further information on whether the CDO has reviewed agency dataset classification decisions to check validity and reliability — that is, whether the assigned sensitivity levels accurately reflect the dataset contents and correctly apply the level definitions. The OCTO guidance leaves it to agency general counsels to decide, apparently without review.

If data have been overlooked, incorrectly classified, or not described thoroughly, what can the public do?

The mayor’s 2017 order warns that it creates no new rights and that the order may not be enforced in court. Nor does it provide any administrative procedure (within the government) for users to appeal decisions such as mis-applying secrecy rules.

The order does say the Chief Data Officer is responsible for “receiving and responding to public input regarding the District’s data policy and activities.” But agency data officials (and their lawyers) responsible for their own inventory and classification have no assigned responsibility to the public. Accountability is only through the normal chain of command from elected officials to appointed executives and career staff.

The D.C. Open Government Coalition is interested to hear from data users who find problems with the inventory or classification, or other issues in accessing datasets, and have not been able to resolve them with government officials.

For example, users seeking details on exactly what’s in different datasets told the Coalition in earlier years that D.C. agencies could not provide a data dictionary to go with data obtained under FOIA. OCTO management of the dataset publication effort has led to much more consistent pressure that public data must be accompanied by such details.

The DC Open Data Handbook requires such a roadmap (pp. 5-6): “Each dataset should include a data dictionary as a separate document… The data dictionary lists the table structure, with each column defined in easy-to-understand terms. This includes providing the values and descriptors for any codes, categories, or domains… It ultimately informs the end user community of the dataset’s attributes in plain language.”

Continued OCTO pressure on agencies to help users with such guides to databases remains important. If you have asked for a dictionary of the data in a D.C. agency dataset and encountered problems, please let us know at info@dcogc.org.

Can I still use D.C. open records law to get D.C. data?   

Yes. Data have always been considered records available by request under the D.C. Freedom of Information Act (FOIA). A dataset request must therefore be answered by an agency just like any other–subject to the same reply deadlines, exemptions, fees, appeal process and ultimately the right to sue to challenge agency errors. The open data policy does not change that; it simply adds the mayor’s policy preference that D.C. agencies should make some data available routinely without request.     

The mayor’s 2017 order recognized useful interplay between FOIA and the new proactive data publication system being established:

“On the one hand, FOIA request-tracking data should inform public bodies about the demand for and priority of publishing certain datasets or derivatives of those datasets as Level 0, Open. Similarly, successful appeals for datasets previously denied under FOIA exemptions can inform public bodies about potential errors in dataset classification. On the other hand, publication of FOIA request-tracking data can help residents hold public bodies accountable for the timely and consistent processing of requests.”

For data not published on the data portal, the order thus recognized that a FOIA request is in essence a test of the mayor’s classification system that empowered agency officials to place over 1,200 datasets off-limits because of sensitivity ratings in Levels 1-4.

The system failed early tests. CDO reports in 2019-20 cited ten different examples where data had to be released under D.C. FOIA after requesters couldn’t find the data on the data portal. These included data on police arrests, dockless bikes and scooters, surveys of charter school facilities, health inspections, health professional licenses, moving violations by city vehicles, vacant property determinations, and more. These were just the kind of errors the original mayor’s order warned about, and turned up in close review of FOIA treatment of data requests. Even with evidence showing no legal barrier to release, it evidently proved hard for OCTO to spur agency publication. The CDO report for 2020 (p.17) noted “of the datasets described in last year’s report [that should have been public], we didn’t make much progress.”

The FOIA request tracking database mentioned in the mayor’s order has proved a disappointment also. Coalition efforts failed as we tried to use the database just as the mayor suggested, to hold government accountable for delays in FOIA processing during and after the pandemic, from 2020 forward. We found the backlog of incomplete requests can’t be evaluated as the published dataset includes only closed requests, not the full data on all requests’ status captured in the online request portal. Coalition efforts to request better data from OCTO under FOIA proved unsuccessful for three years. That record of denial is the focus of a pending case in the D.C. Superior Court.

Are there other published D.C. government data sources?

Some agencies publish data separately. For example, the Metropolitan Police Department publishes data on police stops; the Office of the State Superintendent of Education publishes data on schools, students, and staff; and the Department of Human Resources publishes data on government employees’ positions and salaries.

The OCTO “DC Compass” system with a chatbot to answer user queries of the thousands of datasets on file (discussed in more detail below) appears from our tests untrained as yet to advise requesters where else to go if there is no responsive data. If you have a question about locating a source for data, feel free to write us at info@dcogc.org.

So, how is D.C. doing with opening data?

The CDO reports note many accomplishments each year, with examples of successful complex technical work where government-wide improvements (for example, a common directory of all D.C. addresses for all agencies to use) and combining of datasets helped D.C. government agencies do their work and communicate with the public. A recent exhibit is a new business licensing portal that merges tax and licensing data so businesses and nonprofits can do their government compliance work in one stop.

The efforts bring outside accolades for good work. For example, D.C. won Gold Certification, with six other cities, in the What Works Cities competition that judged the use of data in decisions. D.C. agencies’ use of digital maps has repeatedly been cited as exemplary.  

And just now, the ultimate access tool has just been added to the data portal. Called DC Compass, it is from the District’s geospatial mapping software vendor Esri, a new way anyone can simply ask questions and get answers if the open datasets have responsive details. “We are excited to be chosen as the first jurisdiction to use this groundbreaking technology and make our nearly 2,000 open data sets accessible to users with a simple AI-driven chatbot,” said Interim Chief Technical Officer Stephen Miller in a press release in March. “You no longer need to be a data scientist or a spreadsheet wizard to analyze DC’s vast open data catalog.” The vision is powerful. (See developer presentation of details here.)

The new functionality offers “quick and convenient access to information about the District without having to search through multiple datasets related to your question or know how to write a query or use a filter or perform any kind of data functions. This chat interface can answer a range of questions related to statistics, summaries, and other location specific information. In addition, DC Compass can create maps and other visualizations with the associated answers and provide suggestions for additional datasets and initiatives related to your questions.”

OCTO CDO Annual Report 2024

Obviously such an interface can put data in the hands of a new universe of users. The story up to now had been chiefly about ways Big Data can help the D.C. government, benefiting individual residents mostly indirectly. (After all, that is the OCTO mission, assigned in its statute, “to help District departments and agencies provide services more efficiently and effectively.”) An OCTO DC Tech Plan FY23 -25: Unleashing the Possible includes “User-Centric Digital Experiences” and “Data and Analytics” as two of five core capability pillars, but government is the data user targeted in the goal to “bolster the use of data to strengthen DC Gov decision making and service delivery.” 

“Empowering residents” is occasionally mentioned in CDO report discussions, but the effort to do so in the open data context remains modest. (In the long run, OCTO efforts to expand broadband access will help many connect with web tools for every sort of activity.) Still, enhanced data access without need of a FOIA request, is justified equally for its use by the public — adding efficiency to business and economic life, as well as powering broader research, analysis, and application development of all kinds including for accountability. Transport for London, for example, hired Deloitte to do the math and found $150 million in annual economic benefits and savings for travelers and the agency from open data that in turn powered dozens of tools for time-saving travel planning.

But progress on the broader external use case, especially use by non-technical residents, can’t be gleaned from the D.C. reports. (A section on civic engagement is about agency projects on topics of public interest.) The kind, quantity and quality of public access and use is mostly unknown.  An audience question at the OCTO presentation to the Data Community DC webinar last fall asked about users of the data, and staff admitted they had little information beyond Google analytics; one slide showed the average stay on the site was three minutes. Another question asked about training and OCTO said theirs was only for government staff. A community project (using 311 call data to compare D.C. responses to street and sidewalk repair needs) languished until a university data science student could be enlisted for analysis. (See separate blog post on this interesting use case.)

Compare, for example, the New York City data portal, which lists seven reports about user experience. Other cities and states have data portals, allowing interesting comparisons. The open data movement has begun to develop a cross-national evaluation literature. The foundational issues of data quality and converting legacy data to digital loom large in that literature; though these may profoundly affect users’ interest, the subjects are not formally addressed in the CDO reports.

Bottom line: it could benefit the District to visibly embrace public users at multiple levels of technical skill as a vital constituency. What individuals and organizations do and don’t use, and why, among the open data on offer, should be addressed in future reporting to guide plans for meeting that demand.