Challenges in Textual Analysis, the Corpus

I would have really liked to call this series “Adventures in Textual Analysis” but so many things were 100 times harder than they ought to have been…so I’m calling it “Challenges in Textual Analysis.”

Many libraries and librarians have found that a new aspect of their job includes Digital Humanities, in all its strange variations.  And part of that is textual analysis.  My colleague decided – since DH and textual analysis are growing in popularity for libraries and librarians – that she would start a co-working group related to textual analysis.  This once a week meeting serves as a space to talk about our projects, as well as learn about and do textual analysis.  I’m on the group and slowly getting my bearings.

A long time ago (almost 5 years ago), I did little textual analysis projects with Python.  Well, 5 years of Python dormancy and a switchover in the community from Python 2 to Python 3 made using Python for textual analysis significantly more challenging.  So, for my second foray into textual analysis, I decided I would try out freely available software.

The hardest part for me, in terms of textual analysis, is finding a large enough corpus to work on.  Originally, I had used Shakespeare because his works are freely available in .txt online.  Python ate Shakespeare up without a problem.  Now, I am no longer a literature major. (And I did not even study Shakespeare in depth, but instead American modernist and natural works still in copyright; there was no finding a free .txt or even .pdf copy of those pieces.  Short of me scanning and OCR-ing my own copy, it wasn’t going to work.)  So…I decided to webscrape horse selling websites, as an experiment.

OutWit capture 1

On the suggestion of my colleague the DH librarian, I started with OutWit

 Hub Light and tested it against Equine.com.  It took me two hours to create an appropriate webscrape but it happened.  And then, because I choose to only use the free version, I had to do a lot more clicks myself in order to collect all the information.  But I managed to get my corpus!

Now, I have the price, location, gender, age, and breed of all horses for sale on Equine.com on July 6th.

Check back later to see more textual analysis challenges.

Advertisements

Money (That’s What I Want)

In a previous post, I mentioned that so many people say they didn’t become a librarian for the money.  And I said that I did.  Because if the institution I’m working at wasn’t paying me, I wouldn’t be working there any more.   And librarianship is more steady and realistic and lucrative than my “dream jobs.”

That post was almost two years ago, when I had just started off as a professional librarian.  At that point, I didn’t want people to keep saying that I wasn’t working “for the money” because I very certainly was and am.

Now, it’s been two years.  I have more years as a working professional “under my belt,” more years as an administrator, and more years as a supervisor.  And I’ve come to realize that yes, librarians themselves tone down the financial aspects of their jobs.  Their institutions, however, seem completely incapable of understanding what library labor looks like – and how such labor should be compensated.

Let’s start with myself.  When looking at early career librarian salaries, many would say I have little to complain about.  And I do complain very little about my salary.  But if one looks at my job duties…well…I’m not making what other people in similar positions are making.  I am the systems librarian, which means I am in charge of the integrated library system (ILS).  The ILS is an enterprise system.  As the manager of the ILS, I am doing work similar to that of an Enterprise Systems Manager.  According to Glassdoor, the average salary of an Enterprise Systems Manager is $94,668 annually (as of October 2017).  I am making nowhere near that salary, more than $25,000 less.

Then, I have staff members who do IT support as well as significant data analysis.  Their duties fall less neatly into a single category, but here are similar positions.  The lowest-paid equivalent position would be IT Support, which Glassdoor notes has a salary of $52,369 annually.  For the data analysis pieces of their jobs, they could either be considered equivalent to a Data Analyst, which on average makes $65,470 annually, or a Business Intelligence Analyst, which makes a whopping $79,613 annually.  Just as I am making significantly less than my IT equivalent, my staff are not making the same as their IT and corporate counterparts.

Now, I am in no way happy that my staff is making so much less than their labor is worth.  But I am in a particular funk about what our institution expects our circulation staff to do, and for how much money.  When I worked circulation at a public library, my $12.00 an hour made sense: I was one of the least trained, with the least amount of responsibility, and I was seasonal (spending the rest of my time out of state at college).  The circulation staff at my current institution are expected to manage the institution’s electronic course reserves (and perhaps to a standard higher than most of our peer institutions do not hold).  The work they do is very much equivalent to what an Electronic Resources Librarian ($63,506/yr) does, with a healthy smattering of legal understanding (a paralegal gets on average $52,351 per year).  Now, our institution keeps salaries locked down, but we all talk – and I am quite sure that they aren’t close to the paralegal salary.

So, what does this rant mean?

  1. Academia’s administrations and HR Departments do not know what library workers do, and use their ignorance to press down wages and salaries.
  2. Academia as a whole does not pay market rate for their employees.  I understand fully that they aren’t (necessarily) money-makers, but academia needs to understand what can be expected of their workforce (and therefore the whole institution) on the money they have.
  3. Librarians and library staff are in the business of information.  Find and share salary information and job descriptions; and then go to your administration and show the equivalencies between your jobs and corporate jobs: “This is the work I’m doing, and this is what it’s worth in the market.”

Amber by Kevin Seeber

First-year Teaching and Learning Librarian at Auraria University in Denver, Colorado (jealous!), Kevin Seeber wrote a blog post on his own professional site about researchers who seem suspended in time – they think their preferred way of researching is the best, and this never seems to change even as libraries and technologies do.  It’s called “Amber” and you should read it.

I am currently implementing a new ILS, as well as sit on committees concerning the potential for a new library.  And I have to say, for both researchers and workflows, that people truly seem suspended in time when all of a sudden you say “Everything is new.”  (I’m sure this is completely normal, but I’m just so involved with the implementation that I kind of blink and them and go “It’s easy.”)  Hopefully, we can Jurassic Park this thing and bring everyone into the present day and how our library technologies work.

Wish me luck!

ALAMW18 Haul: The Poppy War by RF Kuang

Public services librarians get all the fun.  One of those fun things is reader’s advisory.  As an academic technical services librarian, I don’t get to do reader’s advisory.  But I’ve got this blog, so here I will divulge some good reads.

Any ALA Annual or ALA Midwinter, I raid the exhibit hall for ARCs (advanced readers copies).  I get a bunch of free entertainment, and I get to see what’s going on in the publishing world.

Since I’m coming back home with a big haul of books and I have at least one reader devoted to my reader’s advisory, I figure I should start a new series here on my blog.  An ALAMW18 haul series.

The Poppy War by R.F. Kuang follows an orphan girl Fang Runin, “Rin,” who tests her 35068705way into an elite military academy; and from there begins learning how to be a shaman – even if everyone else thinks that is not a particularly fruitful path of study.  But when war comes to her home, Rin becomes crucial to ending it.

I read this book (544 pages) in just a few days, though it was on my to-be-read-next pile for a while longer than that.  This quick read means that I did enjoy the book.  The first part is excellent – the parts where Rin masters academics.  (Clearly, I am a bit biased towards academia!)  The war part of the story is equally compelling, but as I mentioned, I’m biased towards academia.

The ending, however, was not as satisfying as it could have been.  Genocide is a theme in this book; and it ends with a genocide.  And not a prolonged one but an instantaneous one…which makes the genocide less poignant and more like a shortcut to end a book that got a bit too long.  It’s not exactly a good look.

Would I recommend this book?  Yes.  But I would expect other readers to also find the ending somewhat lacking.

ALAMW18 Haul: Sweet Black Waves

Public services librarians get all the fun.  One of those fun things is reader’s advisory.  As an academic technical services librarian, I don’t get to do reader’s advisory.  But I’ve got this blog, so here I will divulge some good reads.

Any ALA Annual or ALA Midwinter, I raid the exhibit hall for ARCs (advanced readers copies).  I get a bunch of free entertainment, and I get to see what’s going on in the publishing world.

Since I’m coming back home with a big haul of books and I have at least one reader devoted to my reader’s advisory, I figure I should start a new series here on my blog.  An ALAMW18 haul series.

Sweet Back Waves by Kristina Perez is a retelling of the Tristan and Isolde story that you’ve probably heard/read a thousand times.  Fortunately (or unfortunately) Arthurian legends never get old – and so I keep going back to them.  And so do authors, and readers.

Perez’s version looks at Branwen, who in most tales gives Tristan and Isolde the love potion that starts the love story.  Perez’s version has a great twist: it’s Branwen Tristan falls in love with, not Isolde (or Eseult in Sweet Black Waves).  And Branwen gets a streak of magic that, perhaps in the sequels, will probably make her quite the sorceress.

This book had me hooked from the beginning.  Tristan, the love interest, is a typical YA Arthurian hero – gentlemanly, perfect, and utterly utterly boring.  That doesn’t matter a whit because Branwen carries the entire story.  She is constantly torn between love and duty, family and romance, power and kindness, in the most honest and readable way.

I read this book in two days – while doing 8 hour work days and doing horse chores.  It’s definitely a worthwhile read, and I will be watching for the sequel.

ALAMW18 Haul: Period: Twelve Voices Tell the Bloody Truth

Public services librarians get all the fun.  One of those fun things is reader’s advisory.  As an academic technical services librarian, I don’t get to do reader’s advisory.  But I’ve got this blog, so here I will divulge some good reads.

Any ALA Annual or ALA Midwinter, I raid the exhibit hall for ARCs (advanced readers copies).  I get a bunch of free entertainment, and I get to see what’s going on in the publishing world.

Since I’m coming back home with a big haul of books and I have at least one reader devoted to my reader’s advisory, I figure I should start a new series here on my blog.  An ALAMW18 haul series.

Period: Twelve Voices Tell the Bloody Truth edited by Kate Farrell was the third book in my Midwinter haul.  This was a series of 12 essays about menstruation.  Pretty simple.  It’s got great lines and fun insight into menstruation for a wide variety of women.  One does, however, wonder if anyone has a “normal” period.  What is “standard”? What do the “majority” of people experience?  Do we all have endometriosis, PCOS, a 3rd chromosome? Do we all free bleed or struggle with society’s and life’s various whims, from homelessness to lack of education?

I’d read it – and it even has a 2 pager on Grinnell College – particularly if you’re into menstruation and other “girly” things.

A Certain Omission, Part I

I was mentioning the 5 Laws of Library Science, stipulated by S.R. Ranganathan, for something (I can’t remember what any more) – when I realized I didn’t really know much at all about a theory I was mentioning.  So, I did a quick Wikipedia search of the Laws and their creator.  I ended up reading the entire of Ranganathan’s Wikipedia page and was excited by the idea of faceted classification schema.

Interlibrary loan to the rescue!

srranganathanAnyway, I started delving a bit deeper into Ranganathan, learning a bit more about his theories and biography.  And I began to think about my first exposure to him.  It was in the two 500-level introductory classes in my LIS program.  Both times, the 5 Laws and Ranganathan’s name were projected on a slide and something to the effect of “These are the 5 Laws of Library Science and they were written by Ranganathan – who’s a big deal – and you should know about them” was said.  And nothing more was said of Ranganathan in the rest of my LIS program.

With the help of inter-library loan and other library searching, I came to find out that the 5 Laws weren’t some short blurb thrown out into the world.  There is an entire monograph on them, The Five Laws of Library Science, with explanation, discussion, and theory.  He had “sequels” (of a sort): Colon Classification and Prolegomena to Library Classification.  And Ranganathan wrote other books about library science, dealing with selection, cataloging, communication, and reference service.

If Ranganathan is such a “big deal” – and I’m agreeing that he is – why did none of my LIS courses delve deeper into it?  My program was much more theoretical than many other LIS programs. (Many programs are more practical.)  Uh…so why wasn’t Ranganathan a bigger highlight?

The first thing that comes to mind would be that in a 2-year program, you can’t get to everything.  (Agreed.)  But classes had the time to read and discuss Foucault at length.  The second thing that comes to mind is that Ranganathan’s Colon Classification ystem was never adopted in North America (where I obtained my MLIS degree).  But one of my classes spent 2 weeks on Paul Otlet whose Universal Decimal Classification system; and U.S. libraries never adopted the UDC.

My next ideas are: 1) our department didn’t think the students “capable” or “interested” in such theory (where Foucault and all sorts of other theories were…), OR 2) a certain racism that exists in LIS reared its head.  I do think some of our professors did not want to teach, and probably then decided we weren’t interested enough to engage with Ranganathan.

And I also strongly suspect that my program decided to name drop but not examine Ranganathan in more depth because he was Indian.

There is a certain understanding that one needs to have when reading a theoretician.  Ranganathan came from a non-Western country, and therefore his influences were not all Western.  (He spent enough time in the UK and reading American and British library scientists to have a plenty of Western influence.)  He was a brahmin, versed in Indian religion; the Ramayana influenced him greatly.  He also worked during times of strong nationalist sentiment, and anti-brahmin sentiment.  It was a whirlwind for me to learn about – and I’m sure Ranganathan’s context influenced his work.  But for the arguably most recent/modern Library Science Theorist, I think the MLIS students could have been expected to get more familiar with Ranganathan.

Alas, LIS education in North America tries to slide past non-white, non-Western libraries and librarians.  I’ve gone to so many “book history” classes, talks, etc. that only mention book-making in Europe!  I have to go to a qualified “East Asian book history” class, talk, etc. to get to see Japanese, Chinese, or Korean book history.  Vietnam is squarely avoided.  India is avoided.  South America and its book history might as well not exist.  Any sort of text that is not “written” in the most traditional sense (quipu, winter counts, etc.) vanish from library science courses.

In library science and LIS education, we need to stop omitting so much – in order to become a global and modern profession.

 


I’m reading S.R. Ranganathan: An Intellectual Biography (1992) by Girja Kumar.  That is were I’m getting much of my information about Ranganathan.

Review: Algorithms of Oppression

I took time off from reading my ALAMW haul to read Safiya U. Noble‘s new title Algorithms of Oppression.  I had read her Bitch article back in grad school and was fascinated. When I saw the full-blown monograph, you know I had to select it for the Libraries.

With Algorithms of Oppression, Noble wants to put pressure on tech companies to not include pre-existing biases in their algorithms and to actively combat any emergent biases that their algorithms develop.  She also encourages consumers of these technologies (due to her book’s tone and style, primarily academic consumers) to critically engage with the technologies, and to also demand the companies combat biases in their products.

She primarily looks at Google (though her conclusion has an excellent interview concerning Yelp).  She starts with her initial 2010 search of “black girls” – which she had done in hopes of finding topics to discuss with her stepdaughter and nieces.  Google answered her search with pornographic representations of black women.  From then, she spent hours and hours testing Google, only to find that the search engine consistently failed to provide credible information about women of color (3).  Google Images supplies images of Black people when “gorillas” is searched.  Google Maps had the White House labeled as “N*gga House” during the Obama Administration (7).  When searching Michelle Obama, Google offers a related search that includes the word “ape” (9).

Then, Noble discusses the “historical and social conditions” that led to Google’s search results (17).  She notes that the search results allows us to see a representation of how Google, through its algorithm, conceptualizes the search term (24); and often these search results show that Google’s concept of everything is biased towards Google’s own market interests (28).  (See page 39 for a breakdown of Google’s search results page, and how many advertisements there are.)  This allows those with power and money to more strongly dictate search results.

She also discusses in-depth the search results surrounding various groups.  As her article in Bitch would suggest, she spends many pages on the search “black girls.”  She also “Hispanic” and “Latina” girls, “Indian girls” (which brings up commentary on both Indian men and women), “white girls,” and “Native American girls.”  Different professions and careers are also examined: for example, searches for “doctor” lead to images of white men (82).

Noble’s third chapter is a fascinating look at what happens when an individual searches for certain communities.  For example, Dylan Roof, who murdered worshipers at Mother Emanuel African Methodist Episcopal Church in 2015, wanted to better understand the death and subsequent legal proceedings of the murder of Trayyvon Martin.  Roof’s search led him to White Nationalist groups, rather than information on how homicide is most often intraracial (110-115).

Next, Noble discusses legal restrictions on search engines.  They are few and far between in the United States, though the European Union has managed a few protections, including the Right to Be Forgotten.  She also notes the links between library classifications and their problematic issues and search engines.  Finally, Noble pushes readers to demand public policy around technology products as well as to not rely exclusively on technology for social justice.


As for my recommendation: if you are ready for a very academic text about Google and technology, then by all means, I recommend it.  It is however, as I mentioned, very academic so it is not a light read.  However, for those of us who are systems librarians hiding in the basement, this is worth reading.

ALAMW18 Haul: Brightly Burning

Public services librarians get all the fun.  One of those fun things is reader’s advisory.  As an academic technical services librarian, I don’t get to do reader’s advisory.  But I’ve got this blog, so here I will divulge some good reads.

Any ALA Annual or ALA Midwinter, I raid the exhibit hall for ARCs (advanced readers copies).  I get a bunch of free entertainment, and I get to see what’s going on in the publishing world.

Since I’m coming back home with a big haul of books and I have at least one reader devoted to my reader’s advisory, I figure I should start a new series here on my blog.  An ALAMW18 haul series.

35721194The second book I grabbed was Brightly Burning by Alexa Donne.  It’s supposed to be “Jane Eyre in SPACE!”  I read Fonda Lee’s Zeroboxer which what “Rocky (Balboa) in SPACE!” and loved it, so I figured I’d give the “in SPACE!” thing another go.

Now, I find the storyline of Jane Eyre suspect as is.  I wondered if one could really make it all that appealing as a YA romance.  I was right: you can’t.  I liked the space and the world-building and the little tinge of mystery the book had.  But you can’t re-make a Bronte sister.  There is nothing less romantic than a Bronte sister’s male lead.  Trying to make him romantic…doesn’t work. No one wants a moody, alcoholic, lying romantic lead, thank you.

So, if you want to read Brightly Burning, enjoy the space, the sci-fi, the world-building, and the little mystery plot line (that you can kinda see right through because you know Rochester keeps a woman with mental health issues in his attic).  And try to ignore the attempted romance.

ALAMW18 Haul: Bone’s Gift

bone-s-gift

Public services librarians get all the fun.  One of those fun things is reader’s advisory.  As an academic technical services librarian, I don’t get to do reader’s advisory.  But I’ve got this blog, so here I will divulge some good reads.

Any ALA Annual or ALA Midwinter, I raid the exhibit hall for ARCs (advanced readers copies).  I get a bunch of free entertainment, and I get to see what’s going on in the publishing world.

Since I’m coming back home with a big haul of books and I have at least one reader devoted to my reader’s advisory, I figure I should start a new series here on my blog.  An ALAMW18 haul series.

The first book on my list, I read in less than 24 hours.  Bone’s Gift by Angie Smibert is about a girl (age 12) that can touch objects and see past events related to that object.  Worried that this ability – this “Gift” – might be dangerous, the protagonist Bone investigates her town, searching for answers about her family and herself.  It is set in WWII-era Appalachia, in a coal-mining town.  And we see the Virginia Writer’s Project, a part of the WPA (cue Rebecca geeking out about the New Deal), collecting folklore and other stories from the colliers.  It is a middle-grade book, but it has much wider appeal than that.

loved this book.  When it comes out on March 20th, 2018, I highly recommend grabbing it from your library (or bookstore, as you prefer).  The setting is beautiful, the characters rich, and the plot just windy enough to keep you interested (but not confused).

Great ALAMW18 haul book.

We’ll have to see if the next one is as good.