DH@Guelph – Visualizing My Data

I have been working on my relational database for over six months now. I am still inputting new data as I continue to review testimony on racialized violence, but the database is taking shape with hundreds of unique records describing the experiences of African Americans in the postemancipation South. The next step, then, is to consider how I can best showcase my database and the data contained within. As a methodological tool, my database has been very useful for helping me understand the relationships between specific types of violence and the methods of resistance employed in response. But I have found that my database is less useful for elucidating those relationships to those not already familiar with structured query language. For this reason, I have been ruminating on how to produce a visual representation of my research. This is what brought me to the DH@Guelph Summer Workshops.

The DH@Guelph Summer Workshops are a series of four day workshops on topics relating to digital humanities research and pedagogy. This year I registered for Tell Stories With Data, a hands-on workshop that explores the entire data visualization lifecycle. By learning to gather, create, clean, process, visualize, and share complex data, we found new ways to explore our research and make it more useful, engaging, and accessible.

Over four intense days, we learned how to:

  • Find and Gather Data – We learned how to find existing datasets, but also how to create our own. We not only learned to create a sustainable workflow, but we talked at length on the ethics of data collection.
  • Clean and Process Data – We learned how to clean messy data using OpenRefine. Inconsistencies, missing data, or human error can be easily corrected using this powerful tool. This was particularly useful for datasets where multiple people had been involved in the data collection process; with multiple people involved, the likelihood of human error or inconsistencies in data entry increased.
  • Visualize Data – We learned the best practices to design, create, and refine data visualizations. This portion of the workshop obviously took up the majority of our time as we explored a variety of software tools like Excel, R, Tableau, Voyant, and Gephi. We also talked extensively about what makes a good visualization (it should be clear, concise, and accessible).
  • Preserve and Share Data – We learned about strategies for preparing our data for public consumption. Creating the visualization is only one part of the process (albeit a very important part). It is also necessary to consider where your visualization will live and how it will engage audiences. And we also learned the importance of preserving the original data in an ethical way.

Before this workshop, my experience creating data visualizations was basically limited to Microsoft Excel. This turned out to be a good thing because I was open to experimenting with new tools. I honestly did not know what tools would be most useful or, at a more basic level, would even be an option for visualizing my data. Ultimately, this was the biggest takeaway from the workshop: not every visualization tool works for every dataset. It is very important, long before you start playing with tools to visualize your data, to think about the story you want to tell. What is your main argument? How many variables do you want to visualize? How can you present your data in a clear and concise way? Before you even start playing around with visualization tools, it is important to consider these questions. Perhaps it is necessary to even sketch out your idea on paper. Then you can find the tool that will best showcase your data.

In order to learn how to use tools like Tableau, Voyant, and Gephi, we worked with pre-existing datasets. These were datasets that the instructors had cleaned and ensured were suitable to use with the tools that we were experimenting with. Because the datasets we were working with had been preselected for our tools, the resulting visualizations were always fruitful. Yet when I returned to those same tools with my own data, I found the results were not always helpful. Not every tool was useful for telling my story. But once I sat down and thought about what kind of visualization I wanted to make, it became much easier to select the right tool. In the end, I chose Gephi to visualize the relationships between types of violence and methods of resistance.

Gephi, an open-source network analysis and visualization software package, can be used to map the relationships between people, places, and ideas as a force-directed diagram. Rooted in the theory of social network analysis, each individual object is represented as a node which is linked to other nodes according to the relationships and interactions that connect them. I was able to use Gephi to produce a simple visualization using a small subsection of my dataset.

A network visualization displaying types of violence used against African Americans and the methods of resistance used in response. This visualization was created using a small subset of data from the Slave Narrative Collection.
A test network visualization displaying the types of violence used against African Americans and the methods of resistance deployed in response. This visualization was created in Gephi using a small subset of data from the Slave Narrative Collection.

To produce this visualization, I only used a small subset of my data (approximately 100 interviews from the Slave Narrative Collection). As a result, this visualization cannot be used to draw any concrete conclusions about the relationships between violence and resistance. But, it serves as a proof of concept. This test visualization shows the types of violence (in orange) that were being inflicted on African Americans in the postemancipation South. It also shows the methods of resistance (in blue) that were deployed in response. This somewhat rudimentary visualization is, to me, a striking way of representing the data contained in my relational database. Because the nodes are weighted (the bigger the node, the more prevalent the type of violence or resistance) it provides a unique way of understanding how African Americans experienced racialized violence in the postemancipation South. Again, this visualization cannot be used to draw conclusions, but I think it shows promise.

When my database is complete, it is my intention to create a similar visualization using my entire dataset. Will nightriding still present as the most common type of violence? Will a dominant resistance tactic emerge? Will new types of violence or methods of resistance emerge from the periphery? These are questions I am very excited to answer in the near future.

Sarah Whitwell is a PhD candidate in the Department of History at McMaster University. Her research explores the efforts of black women to resist racialized violence in the postemancipation South. You can find her on twitter: @whitwese

Leave a Reply

Your email address will not be published. Required fields are marked *