Investigation (Information Access) analyzes FBI and Secret Service files for Aaron Swartz that have been released under the Freedom of Information Act, and gives visual form to aspects of data collection and information framing that often remain background assumptions when presented as evidence. The piece exists in two parts: an interactive data visualization displaying analytics gathered using Natural Language Processing algorithms, and illustrations of law enforcement’s methods of data collection. The interactive data visualizations allow the viewer to compare textual analysis of documents from the FBI, the Secret Service, court proceedings, media coverage, and documents written by Swartz himself. The visualizations make similarities and differences in the choice of language immediately apprehensible, while the words are displayed in their original context below the visualizations, providing a direct comparison of the texts. For example, a visualization of frequently occurring words reveals that ‘information’ tops the list in the FBI files and in Swartz’s Guerilla Open Access Manifesto, but rarely appears in Swartz’s blog posts. The viewer can quickly compare the linguistic context of each use of the term across corpuses. The accompanying illustrations provide visual representation of the data collection process with the text describing that process incorporated into the image itself.
The project is currently in development. Although FOIA documents are released to the public in PDF format, the quality of those documents is typically too poor for standard Optical Character Recognition (OCR) to make all of the document’s text readable. Upon the projects completion, the completed corpus along with the enhanced OCR programs created for the process will be open-sourced. This work is a collaberation with Jennifer Gradecki.