This is the briefing document I've provided for prospective lead legislative sponsors (see questions and comments below):
Along with a number of other open government advocates, I've launched a campaign to put a definition of "open data online" into California and San Francisco law. The issue is that often when documents and data are published online, they cannot be accessed or used in a meaningful fashion because they cannot be searched, indexed by Google, or combined in a meaningful way with other documents for analysis. I want to tackle this not by mandating that certain documents and data be published online, but simply by creating a reference standard so that when new mandates pass, or new documents are published online as a matter of course under existing law or regular business, they are in accessible formats.
This has the benefits of making things easier for people who use screen readers, for developer who want to use public data to build applications, for transparency advocates, and is simply good policy. Publishing data in formats that can't be searched, compared to other documents or reused in a meaningful way is as useless as keeping it tucked away in an obscured file cabinets. Publishing in accessible formats online is as simply as education employees in how to properly save and store documents for online publication using the same software they already have on their computers. In an ironic demonstration of the current problem, San Francisco's current open data law was published by the Board of Supervisors as an unsearchable PDF.
Proposal: San Francisco/California Open Data Standard
Draft Text: Heretoforth, any documents or data published online by the State of California/City and County of San Francisco and its employees, departments and agencies must be published in a structured format that can be retrieved, downloaded, indexed, sorted, searched, and reused by commonly used Web search applications and commonly used software.
Background: California/San Francisco would further cement its leadership position as one of the global leaders in open government and accessibility by adopting this standard. It is derived from model open government legislation proposed by the global CityCamp movement (
/). Much of the existing open data legislation from around the world lacks simple and clear standards definitions such as this (
http://wiki.civiccommons.org/Open_Data_Policy). Creating this standard would be the foundation for ensuring that future laws around publication of State/CCSF documents are meaningful. See also background on open data standards around the world:
http://wiki.civiccommons.org/Open_Standards_Policy.
Associated costs: None, and possibility of savings. This standards legislation would not create a new mandate for publication, rather it would give clear guidance on how data is to be published - using commonly accessible formats without requiring a specific format that could be outdated by technological developments. Passage of this law would reduce the burden of reformatting documents to comply with records requests as documents published under this standard would be easily accessible. It also has the benefit of opening government data to innovators from around the world to build useful applications using public data.
Early support: Since we publicly launched a campaign to enshrine this standard into law in SF and California on Nov. 16, 2011, we have seen significant support across social media channels, and endorsements from open government leaders from San Francisco, California and around the world, including:
- Javier Muniz, CTO and co-founder, Granicus (based in SoMa and one of the greatest open gov tech company success stories in the U.S.)
- Steve Ressler, founder, GovLoop
- Rep. Jason Murphey, Chairman of the House Goverment Modernization Committee, Oklahoma
- Scott Primeau, OpenColorado
- Luke Frewell, founder and publisher, GovFresh
- and many more who can be viewed online - https://wiredtoshare.com/structured_open_data_campaign
The legislative proposal is also supported by CityCampSF, Gov 2.0 Radio, GovFresh and the SF Tech Dems.
Showing 12 reactions
Sign in with
• Consider changing title to: Proposal: San Francisco/California Guidelines for Publishing Open Data or something similar.
• Change “commonly used Web search applications and commonly used software†to “Web search applications and software that have achieved dominance in the marketplace and are publicly recognized as widely used.
• I interpret the term “Open Data Standard†in the context of this proposal to mean any data or documents that are, can, may, or should be available to the public without restriction.
The term “Open Data Standard†does not have a single definition. Commonly agreed definitions include:
• Published without restriction; the standard is available at no charge or a charge that is reasonable in cost and can be reasonably administered by parties in the implicated industry.
• Made freely available for adoption by the industry
• Controlled by a non-profit open industry organization with a well-defined inclusive process for evolution of the standard.
My suggestions:
• Change “open data online” to “open data exchanged electronicallyâ€
• Change “published online†to “distributed electronicallyâ€
• Change “new documents are published online†to “ new documents are distributed electronicallyâ€
• Change “education employees†to “educating employeesâ€
Open data standard has two very different semantic meanings. I will cover this in a later post.
These are the systems that many governments have in place, and there would be costs associated with replacing this enterprise software. This doesn’t mean that governments are against replacing these systems, just that there would be costs involved.
In cases where there are no proprietary software hangups, there is also the issue of “simply educating employees in how to properly save and store documents for online publication.” Training in government is never a simple task, especially for large governments with decentralized web management systems. Even tiny City of Reno has 80 staff posting to the web. If an internal expert is not on staff, a funding source for a training consultant would need to be identified.
I think the heart of the proposal is in the right place, just be cautious of encouraging an unfunded mandate.
Here are the guidelines from the W3C – http://www.w3.org/TR/2009/WD-gov-data-20090908/ – including:
“Make the data both human- and machine-readable:
enrich your existing (X)HTML resources with semantics, metadata, and identifiers;
encode the data using open and industry standards – especially XML – or create your own standards based on your vocabulary;
make your data human-readable by either converting to (X)HTML, or by using real-time transformations through CSS or XSLT. Remember to follow accessibility requirements;
use permanent patterned and/or discoverable “Cool URIs”;
allow for electronic citations in the form of standardized (anchor/id links or XLINKs/XPointers) hyperlinks."
Since IANAL, I can only guess that the law requires: a logical test, an example that passes the test, an example that fails the test. Open data is… Data that embody openness are… Data that do not embody openness are…
Is that specifically what you are after?
If so, does that not boil down to defining the word “open?”
Under the Open Data Campaign tab you assert that we are hobbled by lack of a legal definition. Do legislators in California understand the assertion, i.e., what “hobbled” means? I’m not sure if this site is meant to stand alone or if it accompanies other material.
Background leads with “cementing leadership” for CA & SF. That seems vain and not at all tied to any practical motivation for undertaking the advocacy. You have a lot more background from which to draw for this to be the lead statement.
Sidebar: you mentioned screen readers, above – are you aware of http://www.section508.gov/ ?
Cost savings are just one reason to implement machine readable formats using data standards and it doesn’t take a rocket scientist to outline the savings. As a taxpayer, It is disturbing to me that an initiative with long term savings would be DOA if costs are assigned to it from the beginning.
Also, they are asking for further definition of “commonly used Web search applications and commonly used software.”
Any help from open data advocates responding to these two concerns is appreciated! Thanks.