Read an interview with Professor Aarthi Vadde about how technology impacts the study of literature and new methodologies in humanities research.
Aarthi Vadde, Ph.D. is an Associate Professor of English at the Trinity College of Arts & Sciences, specializing in the field of 20th-21st century Global Anglophone literature. Dr. Vadde conducts interdisciplinary research exploring the relationship of literary history with computational technologies and internet culture. She is a member of the editorial board of the Norton Anthology of English Literature and is co-editing Volume F: The Twentieth and Twenty-First Centuries. She is also the co-founder of Novel Dialogue a podcast about how novels are made – and what to make of them.
I know you’re an English professor and that you are studying 21st century contemporary writing. What are your current scholarly projects?
Aarthi Vadde: I would like to start with my second book project, which is exactly on contemporary literature and 21st century digital culture. In some ways, this project is a traditional humanist approach to the study of Web 2.0 or the “social web”. Around 2004, there was a technical and a social movement in which entrepreneurs and programmers rebranded the web, rethinking the web as a technical infrastructure that could bring people together. I am interested in how the rebranding of the web as a social web and a participatory space created a new way of thinking about contemporary literature as an artistic practice and a popular practice. This new space for creating and responding to art is what I’m interested in. I am looking at some traditional and some not so traditional writers and artists who have used the resources of the web or engaged social media platforms to create new and interesting works of literature.
Traditional humanities work uses interpretive methods like close reading and the sociology of culture, which refers to the study of the institutions (like the university or publishing companies) that contribute to the making of literature as a cultural category (like the “canon” or the “bestseller”). I use these methods, but have become equally interested in the way the web plays into the making of literature and popular forms of storytelling. Studying the internet requires approaches that go beyond reading singular texts, artifacts, or images. We need to understand cultural phenomena at scale. For that, we need computational methods like statistical analysis for studying text and images as data as opposed to individual works.
If I understand well, you are developing methods to look at the larger social structure of the web and its impact on literature, but also at the computational model of the web. It seems fascinating and novel to think of literature and of the observations that led to the interpretation of literature as ‘data’.
Aarthi Vadde: The word data was not a part of a humanist’s vocabulary until the idea of ‘big data’ spread across disciplines. Of course we use data, we just refer to it differently. We use words like ‘archives’, ‘primary texts’ or ‘secondary sources’. If you were to ask us if we are evidence-based, I would respond, “of course we are evidence-based, even though our data sets are smaller; sometimes they might be as small as a passage of literature.” There was a bit of push back in the humanities against the use of ‘imported vocabularies’. When I think about close reading, or any kind of traditionally interpretative work, I don't tend to use the word ‘data’. When I look at large data sets and I work with collaborators who are employing a method that we call ‘distant reading’, which involves using statistical methods and natural language processing, I feel more comfortable using the word ‘data’.
Could you give us some concrete examples of the methods and tools that you use for your interpretation and analysis of the web and of the literary text in light of the web movement?
Aarthi Vadde: I have several articles in the works. One is about the fanfiction platform Archive of Our Own (AO3), which is interesting for a number of reasons. It is a fan-driven website for archiving fanfiction run entirely on volunteer labor, and mostly by women programmers. Because they are not commercially driven, they're very open to being scraped by researchers and to building scholarship around fan communities. In some ways they are exceptional to the Web 2.0 model because they are not profiting from user-generated content. Instead, they are motivated to be a space in which their users can form community around their mutual interests in storytelling. The name of the site is alluding to a famous feminist essay by Virginia Woolf, called A Room of One’s Own, which is about the material resources a woman needs to become a writer – most obviously privacy and freedom from domestic duties.
AO3 is allowing fans to co-create this communal space, intended to be open and affirming, where they can respond to each other’s stories and reframe the narratives circulating within in popular culture. We are looking at how fandoms, which have grown massively in size since going online, help us understand fictionality in this web-based social media context. We have these traditional definitions of fictionality in the literary space, as a third way between truth and lies. Fiction offers forms of symbolic truth, usually produced by a single author. Fanfiction is creating a collective communal response to a source text.
We want to know - does fanfiction have a narrative form distinct from its source text? Can we study that at scale using a quantitative approach to formalism? We took the Harry Potter fandom, which is one of the most popular fandoms on AO3, and compared the source text with the most popular fanfiction. And it turned out that there were definite shifts in narrative form. For example, minor characters fell away, while a strong emphasis was placed on two central characters. The fantasy genre morphed into romance.
Then we looked at characters and narrative space: how much attention do characters take up, and what kinds of stories get the most attention? We found slash fiction, which features two male characters in a romantic relationship, to be by far the most popular. People sometimes think that it's very progressive by virtue of putting two men into a relationship, but it actually reaffirms the centrality of characters who are already centralized in the source text. Actually, the majority of fanfiction keeps the central characters, and it sheds more and more minor characters.
Do you think that for amateurs writing fanfiction it is a simpler task to create stories around central characters and themes like romance rather than creating the world around minor characters?
Aarthi Vadde: My instinct is to say “no”. There are amateur writers who are creating vast sprawling worlds in other areas of the Internet. The emphasis on romance and pairing of characters that wouldn't ordinarily be paired could happen because it's a largely female and queer-driven platform. When you look at the data and at the total Harry Potter fandom on AO3, stories that explore gender fluidity are a very small percentage of the whole. But the outliers should also get attention, because they are the stories pushing the fandoms forward. So we can combine close reading of individual stories with a mapping of thousands of stories. This view is something we wouldn't have had without using these data-driven methods.
Are the research methods that you use formalized or codified?
Aarthi Vadde: I think the field of cultural analytics, which brings data-driven approaches to the study of literature and culture, is evolving and developing formal procedures. There are journals devoted to the topic and clear peer review guidelines, but the field is still relatively young. There have been movements to codify how one presents data. The trend has been to publish data sets after the article that you're publishing comes out.
In terms of my own research methods, I'm working on the very early stages of another data-driven cultural studies project. My collaborators and I would like to do a project on the international reception of Squid Game - a very popular Netflix show from Korea. We want to understand it across Japanese, English and Korean-speaking audiences, which means that we're now trying to pull data across platforms and languages.
We are thinking about the kinds of questions we want to ask for gathering information around those things. We have some early views of the data, we talk about what we're seeing, and then we try to arrive at a question that feels both theoretically interesting and empirically tractable. We have hypotheses, we test them, we gain new hypotheses based on results.
What is the size of these data scrapes?
Aarthi Vadde: We used the academic Twitter API to scrape Squid Game related tweets and retweets. For original tweets, we have about 2.1 million across the 3 languages (1.6M US, 400K Japan, 120K Korea). For retweets, we have a total of ~9.8 million (8.7M US, 400K Japan, 800K Korea). We’re glad we got these before Elon Musk bought Twitter because the future of that API is very much unknown.
We also partnered with Parrot Analytics for data on about 1000 TV shows with the highest global demand/audience attention over the last 5 years. We also can see monthly audience demand data in the US, Japan, and Korea for the last 5 years, which helps us situate the success of Squid Game in the context of other popular television shows across three national markets. We are the first researchers to work with Parrot, and they are excited about having an academic partnership. You can read a bit more about our project at their website.
It is interesting to compare this work with medical scientists who collect a lot of patient data but they only focus on a little part. When a different research question comes up, they go back and look again at the data.
Aarthi Vadde: This is similar to what we do. We over-scrape, use portions of it, and then ideally when we release a data set more stories, more narratives, and more questions will come out. because it's really impossible to have a critical argument that accounts for everything that the data could reveal. However, you have to pick one road. In our Harry Potter fandom study we focused on power users, the people writing the most stories for the platform and getting the most responses.
In the sciences there is something called a data management plan: this is a document very similar to your methodologies, where the project team describes how the data will be managed, collected, stored, and shared; what types of files will be expected, how often they will be shared with collaborators, etc. Years ago, data management plans were mostly informal. A lot of federal funders, including national endowment of the humanities, are moving into the direction of asking researchers to have these plans.
Aarthi Vadde: I have to admit that when you mentioned it, it was the first time I had heard the term. I'm not surprised that we're moving towards a more formal accounting of how we label, store and distribute data. How can we not move forward under these conditions, especially given the ethical challenges that come with using large data, like ‘are you anonymizing?’; ‘how are you thinking about who else might be able to use and view your data?’; ‘how are you going to license it? With a creative commons license?’. I did not have a data management plan to write. If funders are now asking to see that form of documentation, I'm not surprised.
The DM plans are written so that if you were to come back to a research project several years later, these plans would allow you to navigate the project structure. They present in great detail how you will store and manage data and your expectations around how long you plan to retain it and where. It is a tool to help you or any other person who comes and looks at the project, to navigate the data.
Aarthi Vadde: This sounds incredibly helpful. I would say that these were all things that we talked about, but we did not write them down in one place and designate a DM plan. So, instead of spending ages to track down your past work, you can look at this document.
Yes. For some, writing and updating these documents over time are seen as burdening. This effort can impede the research process and inspiration, some may think.
Aarthi Vadde: It reminds me of when you're writing, and you're following a line of inspiration, and you don't want to stop to footnote your quotes or you have an incomplete citation without a page number. You go back three months later, and you are asking yourself, where did I get this from? I now have to track the page down. I totally empathize with wanting to avoid burdensome labor, but it can be even more burdensome to document something retrospectively.
I would like to go back to the fanfiction platform study and ask you how the traditional scholarly methods used in the humanities can translate into novel research approaches like yours.
Aarthi Vadde: We can use the tools we've traditionally used to extend our traditional questions into new spheres. One example is the novel way in which we look at readers in this new study. For a very long time in literary studies we talked about ‘the ideal reader’, the person who reads a work of literature and sees all the complexity of it. And yet we've had limited ability to study ‘actually existing readers’, which involves ethnographic methods, such as surveying readers to self-report on how they are responding to a work of literature. With online platforms you have readers responding “in the wild” as opposed to in laboratory conditions where they respond to a survey or participate in a study. It’s more spontaneous and truly fascinating to watch reading communities become interpretive communities where people collectively decide what something means, to look at how they arrive at their insights by looking at the fiction they write, and the comments they write on each other's fictions.
What are the major roadblocks or hesitations for going into this new direction?
Aarthi Vadde: The major hesitation comes from answering the question “What is the core of our disciplinary identity?” For the longest time the answer was “close reading”, and I don't think that we're moving away from this response. The question is how do we supplement close reading with other methods? In the past when people wanted to study existing readers, they were using ethnographic methods. Maybe they would do a study of twenty readers at a bookstore and get some information. But now we can study thousands of readers on Goodreads or The StoryGraph or AO3 – at least while they maintain APIs open to researchers, and that landscape is always changing.
Does this have implications on rethinking humanities research training?
Aarthi Vadde: We need to train graduate students interested in internet culture to be able to work with statistical methods. These were never things that were considered a requirement of a PhD program in English. But if we want to study 21st century culture, we need to rethink that approach. This is not about questioning interpretive methodologies or close reading, it's about supplementing them and bringing them into new arenas, where they mix with quantitative methods.
In biomedical sciences, mentors are also asking for statistics courses for researchers. So, this is another similarity between the two worlds. How do you navigate the need for statistical expertise now?
Aarthi Vadde: The Duke Libraries have very useful resources. I've worked with Liz Millewicz, Arianne Hartsell-Gundy, and Will Shaw, and they're all excellent at providing specific tools that might help with specific needs or with matching a tool to your research question. The next step is to think of what our curriculum should look like in order to train undergrad and graduate students in data-driven methodologies while also maintaining a critical approach to quantification. What questions do data help us answer, and what questions should we be asking about how data is constructed? I would like to develop a series of courses for English students where we can say: these are the classes to take for a good understanding of computational literary study.
Research becomes more and more interdisciplinary. How do we bridge the language and the two cultures, the sciences and the humanities?
Aarthi Vadde: Many humanists would like to see that our interpretative ways of thinking are recognized and respected. We appreciate discussions about data and methods, but scientists need to accept that humanists might have a different view of what data is. We might even have words that we would use in lieu of data; words that we feel are more intrinsic to how we think. It is important when we talk about research across disciplines to recognize that our methodologies and our language are each our own, and they derive from very specific ways of thinking and learning. I do not believe like C.P. Snow, the author of “Two Cultures”, that the sciences and the humanities can never converge. I think most scholars would say that Snow’s declaration is outdated. But I believe that they should converge in ways that retain the strengths of both – the theoretical and historical precision of the humanities combined with the quantitative reach of computational methods is powerful.
Aarthi Vadde in Duke News:
“Chat GPT can be used to teach students how to make critical judgments”: