Universal Classification of the Internet

Note: I wrote this post as background research for the current chapter (“Applied Classification”) of HumanCrafted Metadata. It is not an excerpt from the textbook itself (in case you’re wondering if my goal is to bore students to death).

oldyahooligans

In developing the background chapter on classification I went on a quest to look for vestiges of “universal internet classifications” in order to find the main classes of organized knowledge in the online universe.

Those of us who traffic in Dewey Decimal or Library of Congress Classification have Baconian systems of knowledge branded onto the brain. Even a brief, elementary-school exposure to the DDC frying pan has seared the main classes into LIS student brains to such an extent that they can’t easily consider alternatives. I’ve routinely asked students over the years, “If not using DDC, what would you use as the basis for main classes in a hierarchical system?” It’s not an easy question for anyone to answer because Dewey remains a cultural touchstone. During the birthing years of the Internet I happily browsed nascent efforts to use DDC or LCC or DDC to create collections of links. But I was there in the 1990s at San José’s SLIS when we all cheered the Bay Area grad who found herself on the Yahoo! classification team. Classifying all the things! on the Internet was sizzling back then. Not any more.

My quest was discouraging from the first, because Google’s abandonment of its Directory project in 2011 pointed to the complete collapse of internet classification efforts: “With search now dominating our web navigation, directories are seen primarily for their link juice value.” I was hard-pressed to find the venerable Yahoo! Directory from the Yahoo! home page filled with aggregated headlines. (Try and find it! I dare you.) (If you can’t, here’s the direct link.) When I finally discovered its location, I was pleased to see it still was an active site with new links, but there is no denying that it relies on revenue from paid links. It is more of a “yellow pages” for advertisers than a map of the open Internet. The high annual price Yahoo! commands highlights its position as the oldest and most well-known of the many directories that are now mostly vehicles for search engine optimization.

Not only is Yahoo! the most successful of the paid directories, its classification system is the model followed by every Internet directory. When you click through to see these other directories, you will certainly see 14-16 main classes presented in alphabetical order. The class names will be either single words or two concepts joined with &. While the classes are nearly identical to Yahoo!’s, some directories choose a few different ones that elevate commercial concepts to the top level: travel, automotive, real estate, free stuff, law. The two major not-for-profit directory projects still being curated are built on the Yahoo! model. Dmoz, the “largest human-edited directory of the web,” has one-word main classes that separate out “Home” and “Sports,” and “Shopping.” (The founders of its predecessor NewHoo decided to derive their classes from Usenet newsgroups, which would suggest there was a similar inspiration in the Yahoo! founders’ minds, though it is hard to see the Usenet hierarchy as a clear predecessor.) IPL2, “information you can trust,” has fewer classes (& they are full of ampersands). It echoes Dewey’s academic focus ever so subtly: “political science,” “technology,” “economics.” I think the answer to my question about the basis for main classes outside of DDC is clear: Yahoo’s categories, stable since its launch in 1995, show the contours of knowledge (things worth classifying) in the Internet Age.

Yahoo!

dmoz (ODP)

IPL2

Arts & Humanities

Arts

Arts & Humanities

Business & Economy

Business

Business & Economics

Computer & Internet

Computers

Computers & Internet

Education

Games

Education

Entertainment

Health

Entertainment & Leisure

Government

Home

Health & Medical Sciences

Health

Kids and Teens

Law, Government & Political Science

News & Media

News

Reference

Recreation & Sports

Recreation

Regional & Country Information

Reference

Reference

Science & Technology

Regional

Regional

Social Sciences

Science

Science

Special Collections

Social Science

Shopping

Society & Culture

Society

Sports

World

While the top level categories in Internet directories are stable, the subcategories churn around unpredictably and proliferate madly. Clicking on “Arts” in any of the three major directories is an adventure because the concept is defined differently in each. In Yahoo! we find Photography, History, Literature … (in 1999 the subcategories were Literature, Photography …); in dmoz we are led to Movies, Television, Music …; and in IPL2 we choose amongstFine Arts, History, Literature, Philosophy, more>>. The “Business” class shows more clearly the different emphases of the directories. B2B, Finance, Shopping, Jobs for Yahoo! looks like the Yellow Pages again. Jobs, Real Estate, Investing in dmoz are beckoning to people who want to improve their economic standing, but really the bulk of the links in the class are for company sites—nearly 20,000 in “Industrial Goods and Services” (aka B2B). For IPL2, the reference function is obvious: Accounting, Economics, Employment, Tax. IPL2 has good information about job seeking, but no links to the major job listing sites like monster.com. There are thousands of LIS research papers waiting to be written about these subcategories, the types of links chosen, and what that means for users of these three directories.

Because the Yahoo-like classifications are not published outside of the directory sites themselves, it is hard to fully understand the hierarchy below the class level. Understanding the hierarchy is made more difficult because of cross-reference links (preceded by the @ sign) mixed into the list of subcategories (always alphabetical). One thing is for sure: the editors of these directories do not shy away from creating close classification. On the dmoz home page, the project brags—“5,114,083 sites … over 1,014,849 categories.” One million categories for only five million sites! A lesser paid directory with fourteen main classes boasts 7832 subcategories, but only 1945 links! This suggests that the editors of these schemes had a grand time developing a close, universal classification scheme, but do not have the resources (either human or content) with which to fill it. In Yahoo! a significant percentage of its innumerable subcategories are those that refer to a single person or television character. Hierarchies such as Entertainment > Actors > Forbes, Michelle and Entertainment > Television Shows > Science Fiction and Fantasy > Star Trek > Star Trek: The Next Generation > Characters > Ro Laren are not based on the model of generic topics that have governed classification of library collections with DDC. Again, bring on the researchers to provide some understanding of the conceptual underpinnings of these systems!

As a final point, Internet classification systems for kids deserve a mention, though I won’t compare them in detail. I would rate the dmoz “Kids and Teens” all-text subcategories (14 of them) as not being very in tune with children’s browsing needs—would a child know to look for “Pets” under “Your Family”? The dmoz categories seem more like they are set up for adults browsing for resources to share with children—a search for “dinosaurs” leads to forty-five deep (six- or seven-level) hierarchies that are pretty intimidating chunks of text. So I don’t think that dmoz has made a kid-centered Internet knowledge organization that could be transferred to other web applications. Yahoo! Kids Directory has only six main classes—compacted from the eight in the original Yahooligans!, and it seems welcoming to a browsing kid, even though the subcategories are as numerous and unpredictable as those in the adult hierarchy. The resources themselves—mostly interesting, high quality, and non-commercial—are not being scrupulously maintained (about half of the dinosaur links were broken or redirects), which indicates that the subcategories must also date from long ago—their history could be traced in the Wayback Machine. The nine classes of kids’ knowledge at IPL2 are enhanced by a sidebar with direct links to homework-helping categories about Presidents, States, Science Fair, with “Resources for Parents and Teachers” broken out so they don’t interfere with kids’ browsing. What is most interesting about it is that the subcategories displayed on the main page seem chosen because of their importance to kids. “Football” and “Dance” under Sports & Recreation, or “Animals” as the first under “Math & Science.” (The dinosaur links all work.) The only other currently-maintained kids’ classification that I found in my research was Awesome Library, which emphasizes school subjects and homework help.

While Internet directories and their classification systems are mature tools, their situation in 2013 is not flourishing. Except for the IPL2 directory that is curated by LIS students, the directories are a marginal enterprise in the organization of knowledge of today’s Internet. That has implications for those of you who are evaluating them for use in library settings, but for me the continuing question is how well the Yahoo!-like classification structures reflect knowledge as we experience it online. These few survivors of the Internet directory era continue to provide rich food for thought with regard to applied classification.

Advertisements

About Cheryl Boettcher Tarsala

author, researcher, educator in the realms of cataloging, bibliography, and authorship
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s