Can Open Source Transform Democracy?
Stanford panelists say free and unfettered access to government documents would jump-start justice for all Americans.
Mark Michels, Law Technology News
May 6, 2014 |0 Comments
“What are the benefits of a more open data ecosystem for the law?” The Codex FutureLaw Conference 2014 at Stanford Law School grappled with this question during a panel discussion entitled “Forging an Open Legal Document Ecosystem.” Moderated by Law Technology News’ Editor-in-Chief Monica Bay, the panel included Brian Carver, assistant professor at the U.C. Berkeley School of Information (Free Law Project); Thomas Bruce, director and co-founder of the Legal Information Institute at the Cornell University Law School; attorney Julio Avalos, general counsel at GitHub, and Paul Sawaya, creator of Restatement, an open source software toolkit for legal technologists.
The four panelists contended that 1) open legal data can produce innumerable societal and economic benefits, and 2) there are still significant roadblocks impeding that vision.
One major theme that emerged from the panel: While there are a growing number of government entities publishing statutes, codes, regulations and judicial decisions online, the format in which these documents are published limit their utility. For example, documents in PDF are easy for humans to read, but are almost impossible for computers to digest. PDF documents (and many other file types) must go through a conversion procedure to make the documents “machine readable” so they can be processed by computers. Only when documents are in a format that computers can read will legal technologists be able to use them systematically and efficiently, panelists observed.
Carver described the problem this way—as an intellectual property professor, he wanted to know straightaway when courts published decisions related to issues of interest to him. One approach was that he could scan the Supreme Court and circuit courts’ opinions daily, but that would be inefficient and time consuming. Furthermore, there was no effective way to have a computer scan the court websites because the computers could not read the human-readable files.
To solve this problem he and his colleague, Michael Lissner, developed a process using a tool called a “scraper” to collect the published cases from the U.S. Supreme Court and circuit courts on a daily basis. After collecting the cases, the documents were converted (“parsed”) into a format that a computer could read—in this case into an XML format. Once in this machine readable format a computer query ran against the new data to identify and report cases of interest.
The Free Law Project emerged from Carver and Lissner’s work and now provides free access to its CourtListener repository which, according to Carver, contains more than 2.5 million decisions from about 350 court websites. The project uses its “Juriscraper” tool to collect decisions from the U.S. Supreme Court, federal courts of appeal, selected federal district courts and numerous state courts. The Free Law Project also created an application programming interface that makes the data on CourtListener available at no cost, and allows others to access the data and download decisions in bulk.
Carver provided one example of how this open legal data ecosystem operates. State Decoded is an organization that created an open platform to display (among other things) state legal codes, to make them more accessible and understandable. State Decoded linked some of its state code pages to CourtListner cases that cite the specific state codes. This user-friendly data set of statutes and case law is available to the public at no cost.
In true Silicon Valley fashion, panelist and software developer Paul Sawaya said he was inspired by last year’s Stanford CodeX program and started a project to “computerize the law” by developing tools to “parse legal documents into a machine-readable format that is more semantic.”
Many legal technologists face the same challenges Carver encountered when he had to transform court opinions into a machine readable format, said Sawaya. He developed an open source toolkit called “Restatement” which provides structured parsing and web publishing for legal text. Developers like Sawaya play an important role by providing legal technologists the tools they need to operate in an open document ecosystem.
Bruce has been a leader in the open legal document movement since he co-founded the Legal Information Institute at Cornell Law School in 1992. LII’s website is the “most trafficked open access legal website,” he noted. The United States Code and Supreme Court decisions have long been posted on the LII website, and his team recently undertook a challenging effort to get the Federal Code of Federal Regulations accessible on it as well.