Text Analysis

Introduction

The statute of censorship that formed the Ohio Board of Film Censorship declared that the censors should approve any film that was "of a moral, educational, or amusing and harmless character" (Carmen 11). Since that statement is intentionally vague and in positive language, it does not tell us what the censorship board routinely did not approve of. Rather than looking at the rejection certificates, which generally only stated "on account of being harmful," this research will focus upon the Bulletins. The Bulletins are summaries of the eliminations requested from films that are deemed "approved with eliminations." A Bulletin was created each week that collected together all of the eliminations requested, as well as the film title and maker of the film (not the distributor). The corpus of this research was gathered from the archive at Ohio History Connection and consists of the Bulletins of September 1915 - February 1916. These are useful primary sources because they contain detailed information about what a distributor needed to remove from a film in order for it be approved to be shown in Ohio. Thus, by collecting six months worth of Bulletins and uploading them into a text analysis software like Voyant, the Bulletins can be analyzed through distant reading to get an overview of the main topics of censorship. Beyond that, using the Voyant tools to navigate the corpus of texts allows for an analysis that focuses upon specific questions, such as how gender played a role in the censorship topics identified.

In order to get the Bulletins into the right condition for the Voyant research, the documents had to go through several transitions. First, pictures were taken of the documents in the archive at Ohio History Connection in Columbus, OH. Then those images were scanned for Optical Character Recognition (OCR) with a program called Abby FineReader, which allows for the researcher to correct the OCR process as it occurs, thus allowing the program to learn how to better read the documents. This was effective, and the program made short work of the Bulletins. However, the Bulletins are form documents, meaning they contain a lot of information that was not necessary for the text analysis of eliminations. So the OCR'ed documents were stripped down into plain text, containing only the eliminations requested. The Bulletins were weekly documents, but in order to make the corpus managable they were grouped together by month and uploaded into Voyant as the corpus for the Bulletins.

Voyant tools is an online program that allows a researcher to upload a corpus of text for analysis. The program reads the texts and inputs the information into various tools for analysis. It is important to note that Voyant Tools does not do any analysis of its own, it merely organizes and presents the data in various ways for the researcher. As can be seen in the Censorship Topics section, these tools are an effective way to explore a large corpus of information. In order to maximize the usefulness of the tools, the researcher must put in "stop words" into the program. These stop words tell Voyant to ignore those words whenever they appear in the text, so they will not appear in any of the tools. Voyant already has a significant list of stop words, but for the Bulletin corpus several terms had to be added to the stop words list, terms that would appear often in the corpus but not relate to the content of eliminations. The stop words added are: cut, scene, scenes, sub, title, part, showing, close, ft, feet.

The first section of the text analysis focuses upon the Censorship Topics, using Voyant to analyze the Board's Bulletins to determine the main topics of censorship and how gender played a role. The second section of text analysis focused upon the nomenclature of film history research. When the film industry developed there was not a standard way of describing the new entertainment medium, and many various of the word picture were used. Eventually the terms film and movie became the most common, but it was not always the case. When researching books and newspapers for information on early theaters and films, it can become difficult to know what to search. For this second section of text analysis two resources were utilized: Google Ngram and Voyant. Google Ngram allows the researcher to input words and phrases and search the significant Google Books Library for that term, which it then puts into a convenient chart. This resource was used to search for the appearance of a select number of terms related to film. Then a selection of newspapers from Ohio 1912-1917, particularly ones talking about film censorship, and was gathered and put into plain text files and then uploaded into Voyant. This newspaper corpus was then used with the trends toolsto search for the same terms related to film that had been searched in Ngram. The results can be found in the section titled Nomenclature.

The Obscene Moving Image