Saturday, March 12, 2011

1,000 Monkeys Sitting At 1,000 Typewriters...(Session 5)

As I was reading this sessions articles I could not help but be reminded of a one-act play I was in during college. The play is called Words, Words, Words by David Ives. In this performance I played a monkey named Milton whose character was loosely based on the personality and writings of John Milton. The other two monkeys were (Jonathan) Swift and (Franz) Kafka. The plot is structured around the infinite monkey theorem which suggests that, “a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a given text, such as the complete works of William Shakespeare.” (Quoted from Wikipedia)

The Wikipedia experiment nearly asks the same question but instead shifts the action onto the user. Wikipedia has an advantage as their “monkeys” have access to source material while the monkeys in the theory above are just randomly pushing buttons. The problem with both experiments is if close enough is sufficient.

Several of our readings addressed this “close enough” question. Leibenluft’s A Librarian’s Worst Nightmare article directly exposes Yahoo! Answers inaccuracy. Duguid’s article Limits of Self-Organization exposes inaccuracies in Gracenote, Project Gutenberg and Wikipedia. And Gazan’s article Social Annotations in Digital Library Collections exposed the trend for users of a FAQ website to submit more social answers as opposed to strictly factual answers.

Now before I go any further with this post I must stop and honestly ask if the general public would be more likely to read/buy The Complete Works of William Shakespeare if they knew it was produced by a monkey sitting at a typewriter or (and I honestly intend for this to be a completely separate question) would we be more likely to read/use a FAQ website or an online encyclopedia if we knew it was being written/created by our peers?

This session’s topic is the exact reason I chose to take this course and I found this session’s readings to be fascinating when I think about them through the lens of librarianship. I am tempted to write individual posts about all four of our prompts but will restrain myself and focus only on: Social Tagging vs. Professional Cataloging and Classification.

I first began thinking about Web 2.0 and Librarianship early last fall. I was curious about my current online practices and how these practices could be useful for OPACs. Around this same time I received an e-mail from the Seattle Public Library (SPL) announcing the release of their new catalog that would be full of Library 2.0 tools (social tagging, history tracking, following users, user generated lists, comments …). Specifically for this post I will focus on the use of social tagging in their OPAC and compare it to their traditional cataloging system.

Perhaps my favorite aspect of Library 2.0 tools is the fact that the user becomes engaged and an active participant in information retrieval (IR). What this means is that a portion of the catalog is reserved for those who have actually read or interacted with the resource. Catalogers, if we are honest with ourselves, do not read all the books they catalog but patrons do. This makes patrons more knowledgeable about the books in a collection then the catalogers so why not give them access to the cataloging process?

The activity on SPLs OPAC is best described by Haythornthwaite’s term “lightweight peer production." To contribute to the OPAC users do not need to make long-term commitments, they do not need to build social capital and they do not need to maintain or groom their contribution. Social tagging on SPL’s OPAC is as simple as clicking a link and entering the content.

This new aspect of SPL’s OPAC is utilizing an online concept that many (if not most) of their users are already using: Social Filtering. Kristina Lerman stated, “Rather than actively searching for new interesting context, or subscribing to a set of predefined topics, users can now put other people to task of finding and filtering information for them (2).” Due to Intellectual Freedom issues SPL’s OPAC works a bit differently then DIGG or Facebook where users are more or less open to full disclosure. Since libraries are a finite community (limited to only those with a library card) their collections are already being filtered concerning a particular slant (i.e. the community). So it may be safe to assume that social filtering in Library 2.0 enhanced OPACs are working the same way.

I am increasingly curious if social filtering would be more useful in academic/research OPACs and digital collections. Gazan stated, “While used textbooks are obviously less costly, they often carry another benefit new textbooks don’t: highlights, underscores and other annotations by their previous owners (1).” While thinking about this statement I began thinking about a new (perhaps?) concept that utilizes a common practice in social computing. That concept being Social Highlighting. Many of us already geotag ourselves when we are out by checking into restaurants, coffee shops and so on. What this is doing is cataloging our routines. Just this week both Foursquare and Whrrl have created an algorithm to compare users activities with their friends and with the greater community. To put this more popularly, on a SNS like Whrrl all of our digital selves are getting together and talking about what we have done then based on that digital discussion my digital self will come back to me and tell me what I should do next. What this is doing is taking our self-documenting digital selves and finding patterns within the larger community. Lorcan Dempsey calls this “A ‘signed’ network presence … People have become entry points on the network, and signature is important (13).”

Below is an example of a Whrrl suggestion:



If patterns in our daily lives can be recognized, which I believe they can, then digital representations can help promote productivity. Both Netflix and Amazon do this with amazing accuracy as they make suggestions based on what we have viewed. Think about the impact this could have on library OPACs. An app could be integrated into an OPAC that says, “students who have previous taken the courses you are enrolled in have viewed the following resources.” Digital Highlighting, as I introduced above, would work in a similar way. Now that digital resources can be digitally annotated those annotations can be preserved and should be viewable by future patrons. The digital catalog could create a map of previous annotations and even rank those annotations based on frequency. The question comes up as to if this would take away from the learning process and I would suggest no. Essentially, all it is doing is letting our digital representations get together for a conversation and based on that conversation making suggestions based on previous patterns.

One aspect silent in this entire conversation concerns the role of the library at a much larger scale. Libraries are collections of resources and not individual resources. Cataloging, in one sense, is collocation. Subject Headings allow for collections to be brought together in a way that is useful for its users. The controlled vocabulary works best at the collection level and not at the individual resource level. Library 2.0 works the other way around. Social tags are more about describing the work in hand then it is about figuring out where in the collection the book belongs.

In the SPL OPAC I found the following movies to be good examples of where Library 2.0 falls apart: Never Let Me Go and Casablanca. I noticed that one of the tags for the movie Never Let Me Go is “Bad Hair”.



 I question if this is a useful tag for this movie. As I have not seen this movie, it is possible that a character has to deal with a bad haircut but my guess is that whoever tagged this movie just did not like one of the character’s haircut. When I followed this tag I found that there are two movies with this tag and my guess is that the same patron created them. For Casablanca there are two tags that are the same concept but spelled differently: world war two and wwii. This makes me wonder if other resources about World War II would be hidden based on this duplication.



These tags are more or less creating taxonomy clouds. If the descriptions are vague or wrong then it makes the clouds less useful for future patrons. Adding a more useful help screen and then explaining what the goals of the program are would help this aspect of SPL’s OPAC. If collocation is important it should be stated.

I began this post by suggesting that patrons using Library 2.0 tools are similar to monkeys pushing buttons at a typewriter. But this would be stated better by suggesting that non-professionals or patrons can be helpful in the cataloging process. They can also be excellent resources when it comes to describing items in our catalogs. But I think more important is the way patrons are using resources, how that process is digitally cataloged and then socially filtered.        

7 comments:

  1. Amazon's Kindle does something similar to Social Highlighting and Social Annotation. I have a Kindle but don't really tap into the social features of it so I can't comment on how well they work. I feel like when reading for fun I might find them distracting, but I can definitely see their use in scholarly reading.
    For some reason, I have been hesitant to accept the tag revolution. I guess I have a hard time trusting the general population to accurately describe content in a meaningful manner. Like you pointed out, "bad hair" is really not helpful to anybody and to me seems like vandalism of the public space. Or at the very least a serious misinterpretation about the point of tags. The second example you showed demonstrating multiple ways to represent a single concept is another issue I have. If some of the users write "world war two" and some others write "wwii" and others still write "world war ii" or "ww2" or "world war too" or any variety of spellings, the actual concept won't be aggregated properly and might not show up as a popular tag for that content. Eventually I will probably become a convert, but for now I am happy relying on searching and traditional browsing methods.

    ReplyDelete
  2. Great insights! It is interesting that you discussed the act of highlighting (whether digital or physical) to be a type of tagging (and in a sense peer production). I think that another area that you should consider is the errors in tagging or highlighting by lay users. I am not saying that all users are malicious, but I do remember a student (when I was an undergraduate), who would spend more time putting incorrect highlights into textbooks before selling them to the bookstore at the end of the semester. This could incorrectly cause a user to pick up the text and thinking that it had useful information and purchasing it. Therefore, a certain level of accuracy will exist, along with a certain level of inaccuracy. In this sense, I do wonder what thresholds end users have for incorrect tagging of information. At what point will the user move to another source of information if too much incorrect information is available. This makes me consider the game show, Who Wants to be a Millionaire, where the US participants would ask the audience for help and they would typically give the correct answer. However, other countries that had the show had some audience that would purposely pick incorrect answers to fool the contestant. Overall, I think that you had very insightful comments that made me think about many aspects. However, how can we weigh thresholds and reliability of information for our users?

    ReplyDelete
  3. Nice blog! I think anything that will actually get students/patrons to use their nearest OPAC is a step in the right direction. We know that many decide to find resources elsewhere already. I do wonder if potential contributors would balk at the idea of using established descriptors so as not to duplicate tags. When the freedom to contribute new tags, such as "bad hair", goes away and instead users are directed to use a set of pre-determined tags such as "World War I" instead of "WWI" the excitement of feeling like a contributor kind of goes away. I can't imagine students/patrons wanting to select drop boxes or browse controlled vocabulary.

    In academic libraries, I agree there is a value in knowing which other texts were referenced by previous students and perhaps with a tiered system where monitors were allowed to assess the quality of tags, the peer-enhanced OPAC could thrive and still allow new descriptors to flourish.

    ReplyDelete
  4. Great post! I agree with your findings about tags: they can be non-standard and confusing when they are user-generate, like the case of world war two and wwii, I wonder if there is some kind of connection between these two tags within the system. However, it seems like a hard work to standardise them because there are too much "personality" in them, seriously, what does "bad hair" mean? But at the same time, this free-form social tagging system is a good way to present items that are not that "normal" and could be attarctive to users. I guess it's just a dilemma.

    ReplyDelete
  5. Why do I feel like the 1,001st monkey as I sit in front of my nearly-permanently-attached laptop? ;)

    You've done a great job of summarizing a pretty broad range of issues related to Web 2.0 tools in libraries, and I wasn't aware SPL had gone so far in this direction. One of the core differences is that the professional cataloger, like a good waiter or sports official, is supposed to be essentially invisible if they are doing their jobs well. No one thinks about *which* cataloger has described a library item; it's assumed they all adhere to the same set of standards. But most Web 2.0 tools emphasize who writes the review, adds the tags, etc. You can trust or follow that person and not others based on their contributions. That model can't be applied directly in a library that's concerned with preserving patron anonymity. Also, aggregate tagging only works if you have enough monkeys, and generating sufficient traffic around each collection item, let alone encouraging people to add tags as they visit, is an entirely different problem. But as you've done here, staying abreast of what different libraries are doing to integrate the two models is kind of a natural laboratory for what works and what doesn't, and where the two models might help each other. Excellent post.

    ReplyDelete
  6. @ Andrea - I'll have to check out that feature on Kindle, thanks for the heads up. As for social tagging I would hope that the concept would help with "traditional" browsing. The problem, like you pointed out, is when the public space gets vandalized or when multiple terms get used for a single concept.

    @ MBCO - If we go back to my example of Foursquare. There are users who use Foursquare to become the Mayor of as many places as they can by checking into places they might not have been. As for your threshold question I believe it would be a benefit to wipe all these users out of the system so that people who are trying to be authentic have a chance. We can do this with social filters. An article or e-book could say at the top how many people have viewed/read that resource and how many people have annotated it. A layer could then be stripped away leaving just people who have taken the class you are taking or the school you are going to or students with high GPAs. Maybe there could also be a way to search social tags by thesis or topic or pre-approved terms by professors. You are right, there is not at least a slight sense of control the trust level would be very low.

    @ PL - Thanks for the comments. I wonder if Library 2.0 enhanced OPACs would be better served by librarians correcting tags on the back end instead of guiding input on the front end. I have no idea how easy this would be or if duplicate entries like the one I listed could simply be auto corrected in the system.

    @ Nan - I am curious about the actual use of social tags in the SPL OPAC. Do patrons trust it enough to search tags? Or do patrons still use traditional cataloging to find books but then evaluate them using social features. I do not have access to any of that info but it would be interesting to look into.

    @ Dr. Gazan - I did notice that most of the resources that were tagged were new releases and the older items did not have much social traffic. This really did not surprise me. One thing I struggled with is understanding the difference between user generated lists and social tags. They both seemed to be doing the same thing. I guess the difference is that tag clouds can be improved by multiple users while lists are much more of a once and done tool.

    ReplyDelete
  7. Yay for coverage of SPL. I've done a couple of paper or presentation analyzing SPL's services and features. I think that they are pretty up the in terms of adopting technology for the purpose of betterment (service, access, etc).
    I think your conclusion is in line with what '2.0' means, that is collaborative production of knowledge and ideas. Instead of just taking in and consuming, users now can provide feedback and input, with which creates an infinite loop of beta improvement.

    ReplyDelete