Robots and Your Download Statistics

When I worked in acquisitions, download statistics were quite useful.  They let us know that our massive digital collections were being used.  They are one of the most useful tools acquisitions can provide bibliographers to help bibliographers choose what subscriptions to keep, and which to drop.  They also gave us a glimpse into how people were using the digital collections, and how they were accessing them.  For example, in a pilot program, we wanted to see if evidence-based acquisitions (EBA) or demand-driven acquisitions (DDA) were sustainable and useful to our library.  The download statistics were useful, because that was how we knew what we purchased (or would purchase) on those plans.  We could also see how users were discovering our offerings; some of the vendors are able to give data on how users were referred to a source (via Google Scholar, via our homegrown federated search, via the catalog, etc.)  Download stats are cool.

The University College Dublin Library (Leabharlann UCD) hosted guest blogger Joseph Greene (Research Repository UCD), who asked, “How accurate are our download statistics?”  I know how valuable the download statistics were in acquisitions, so I wanted to know if they were actually accurate.

Greene focuses on UCD Library’s institutional repository – rather than a vendor platform like, let’s say, JSTOR – but he provides interesting insight into where your stats might actually be coming from.

Robots.

Many organizations use bots to crawl the web.  Think, for instance, Google.  Google uses bots to trawl the internet, which allows them to index information and make it searchable.  The Internet Archive does similar, or link checkers.  So do scammers, phishers, etc.  From what Greene says, it can be hard to distinguish actual human users from robots.

Greene and his colleagues found that 85% of their downloads were from robots.  (Wow.)  UCD Library was able to distinguish most of robots from human users.  Greene and his colleagues will be presenting at Open Repositories 2016 in Dublin, Ireland to present on further findings.

Although I’m unfortunately stuck in the United States until my institutional funding kicks in, it would be cool to go 1) to Ireland and 2) to find out how DSpace and EPrints filter out robots in their statistics.  (I wonder how WordPress or JSTOR or ProjectMUSE or any other platform does too.)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s