The Reliability Of Government Data Over Externally Managed Data Sets

When I worked at the Department of Veterans affairs I was approached by a number of folks, external to the federal government, who wanted to help clean up, work with, and improve public data sets when it came to open data efforts in the federal government. As I was working on specific datasets about veteran facilities, organizations, programs, services, and other datasets that would make a potential impact on a veterans lives I would often suggest publishing CSVs to Github, and solicit the help of the public to validate, and manage data out in the open. Something that was almost always shut down when I brought the topic up within anyone in leadership.

The common stance regarding the public participating in acquiring, managing, and cleaning up data using Github was–NO! The federal government was the authority when it came providing data. It would own the entire process, and would be the only gatekeeper for accessing it. A couple of datasets that came up were the information for suicide assistance, and substance abuse clinic support, which I had on the ground local folks at clinics, and veteran support groups wanting to help. I was told there would be no way I could get approval to help crowdsource the evolution of data sets, that all data would be stored, maintained, and made available via VA servers.

As I waded through a significant number of links that returned 404, as part of my talk about the state of APIs in federal government last week, I’m reminded once again of the reliability of federal government datasets. I’m finding a significant number of APIs, datasets, and supporting documentation go missing. This has me looking for any existing examples of how the federal government can better publish, share, syndicate, and manage data in an interoperable way. Efforts like the National Information Exchange Model (NIEM), which “is a common vocabulary that enables efficient information exchange across diverse public and private organizations. NIEM can save time and money by providing consistent, reusable data terms and definitions, and repeatable processes.”

Another aspect of this conversation I’ll be exploring further, is the role Github plays in all this. There are 130+ federal agency Github users / organizations on the platform, and I’d like to see how this usage might contribute to federal agencies being more engaged, and managing the uptime, availability, and reliability of data, code, APIs, and other resources coming out of the federal government. I am looking for any positive examples of federal agencies leveraging external cloud services, and private sector partnership opportunities to make data, content, and other resources more available and reliable for public consumption. Let me know any other angles you’d like to see highlighted as part of my federal government data and API research.