I would have really liked to call this series “Adventures in Textual Analysis” but so many things were 100 times harder than they ought to have been…so I’m calling it “Challenges in Textual Analysis.”
Many libraries and librarians have found that a new aspect of their job includes Digital Humanities, in all its strange variations. And part of that is textual analysis. My colleague decided – since DH and textual analysis are growing in popularity for libraries and librarians – that she would start a co-working group related to textual analysis. This once a week meeting serves as a space to talk about our projects, as well as learn about and do textual analysis. I’m on the group and slowly getting my bearings.
A long time ago (almost 5 years ago), I did little textual analysis projects with Python. Well, 5 years of Python dormancy and a switchover in the community from Python 2 to Python 3 made using Python for textual analysis significantly more challenging. So, for my second foray into textual analysis, I decided I would try out freely available software.
The hardest part for me, in terms of textual analysis, is finding a large enough corpus to work on. Originally, I had used Shakespeare because his works are freely available in .txt online. Python ate Shakespeare up without a problem. Now, I am no longer a literature major. (And I did not even study Shakespeare in depth, but instead American modernist and natural works still in copyright; there was no finding a free .txt or even .pdf copy of those pieces. Short of me scanning and OCR-ing my own copy, it wasn’t going to work.) So…I decided to webscrape horse selling websites, as an experiment.
On the suggestion of my colleague the DH librarian, I started with OutWit
Hub Light and tested it against Equine.com. It took me two hours to create an appropriate webscrape but it happened. And then, because I choose to only use the free version, I had to do a lot more clicks myself in order to collect all the information. But I managed to get my corpus!
Now, I have the price, location, gender, age, and breed of all horses for sale on Equine.com on July 6th.
Check back later to see more textual analysis challenges.