The PhilArchive Categorization Project

A central aspect of PhilArchive is a categorization system, by which papers can be categorized into hierarchical categories. For this purpose, we have developed an extensive, if preliminary, taxonomy of philosophical areas, and we have also developed a number of tools by which the categorization system can be used. Details on these matters follow below.

The PhilArchive Taxonomy

DESCRIPTION OF THE TAXONOMY USED

The levels are as follows:

Because papers can be categorized under more than one category, there is a certain amount of crossclassification among these categories. For example, papers in the history of Philosophy will often fall under both a historical category and underneath a topical category. SOME MORE DESCRIPTIONS

Categorization Tools

Categorization of papers and books within PhilArchive is an ongoing project. ...

The fine-grained categorization tool. This tool enables fine-grained categorization of any entry. It is available by pressing "categorize" under an entry, if you are signed in. Using this tool you can classify an entry in up to three fine-grained categories. You can find a category either by using the search box or by proceeding through the hierarchy by opening folders in turn. You can repeat this process for up to three categories, clicking on a category to add it to a paper, and clicking the red mark next to a category to remove it. You can also categorize multiple entries simultaneously by choosing multiple-entry mode. This mode is especially useful for populating categories quickly by using search tools.

The iterative categorization tool. This tool enables quick categorization of entries into immediate subcategories, allowing further subcategorization by people with expertise in those areas. It is available in the area pages under the "Browse by area" menu. Here, the "Uncategorized Material" page contains entries that have not yet been categorized at all, while area pages for non-leaf categories contain a list of entries in that category that have not yet been categorized under a leaf category. Each entry is followed by a set of links for classifying the entry under a lower-level category (two levels lower for uncategorized material, one level lower for other nonleaf areas). Clicking on a link will place the entry under the relevant subcategory. You can repeat this for up to three subcategories, then click "remove" (on a category page) or "done with this one" (on the uncategorized material page).

The direct categorization tool. This tool enables users to add papers to a category directly, whether or not an entry for that paper is currently displayed. It is available in a box at the top of every category page. Simply enter the authors' surname and the first few words of the title into the box. If the paper is in the PhilArchive database, it will appear, and you can select it to add it directly to the category.

The three tools are complementary. The fine-grained tool is the most powerful but slower to use. The iterative is less powerful, because it performs only coarse-grained categorization, but is quicker and is easy to use for repeated categorization. The direct categorization tool provides more flexible coverage of papers. We hope that the presence of all three tools will enable faster progress on the categorization project than would be possible with any of them alone.

We encourage users to use these tools. Please use them only if you have relevant expertise: typically a Ph.D. in Philosophy or graduate work in a relevant area. If you do have this expertise, categorization of as many papers as possible, especially within your areas of expertise, will be much appreciated! This process will make the category system much more useful and comprehensive.

Of course it will sometimes happen that users have different ideas about categorization. If you see what you think is a mistake in categorization, feel free to undo it (though you should examine the paper in question first) and replace by a more appropriate category. Cases like this will be flagged for the editors' attention and we will eventually adjudicate.

Automatic Categorization

At the moment, PhilArchive uses a limited amount of automatic categorization. First, many journals are associated with a specific area, and every paper in that journal is filed under that area. Second, books are frequently filed under a category corresponding to their Library of Congress call number. Third, we have some automatic filters for classifying entries under areas according to the occurrence of certain words in their titles. All of these processes are imperfect. Entries are most frequently assigned to nonleaf categories, so that they will need to be further assigned to leaf categories. Often an entry will be assigned a single category automatically but will also belong under further categories that need to be assigned manually. In some cases, entries will be miscategorized entirely. Users are encouraged to look out for these imperfections and to correct them by manual categorization. We plan to eventually add more sophisticated automatic categorization tools. These tools will probably require a database of already-classified items to serve as a training set, however, so manual categorization will play a vital role in any case.

The Use of Categories

Categories are used at a number of places on PhilArchive.

First, users have the option to automatically display the categories currently associated with a given entry, by checking the "Display categories" box in the right column of most pages containing entries.

Second, users can browse categories by using the "Browse by area" menu. The menu itself leads to pages for clusters or areas (for now, putting the full category system in the menu is impractical due to memory usage and speed). The page for a nonleaf category displays the subcategories of that category in the left column, with an item count for each (either [n] or [n/m], where n is the number of items under tht category, and m is the number of items in that category that await further subcategorization). Deeper subcategories can be opened by pressing "+". Clicking on a subcategory will take on to the page for that subcategory.

For every category, the right-hand column will contain a list of papers under that category. For nonleaf categories, these will be papers awaiting further subcategorization. For leaf categories, these will be all the papers falling under that category. Our hope is that these lists will eventually constitute comprehensive bibliographies for all sorts of areas of Philosophy.

In addition, the page for every category contains a link to the discussion forum for the area associated with that category, for user-contributed bibliographies in that area, and to a list of users (with publically available profiles) who have listed that area as an area of interest.

Third, every area (such as XXXXXXXXXXXXXXXXXXXXXX and so on) has an associated discussion forum, available via the "Forums" page. This discussion forum contains discussions of papers that fall under that category, initiated via the "Discuss" link under a paper (note that when the areas associated with a paper change, the associated discussion forums will change correspondingly). The discussion also contains other discussions relevant to that area, initiated via the "Forums" page. There are also aggregated forums for each cluster (produced by aggregating the area forums), and for all clusters at once.

Fourth, every user can choose up to ten areas as their areas of interest. At the moment, users who choose such areas can (i) optionally filter any list of papers using those areas, (ii) optionally receive e-mail alerts for new items in those areas, (iii) be listed on the page of users associated with that area, and (iv) receive information about forums in those areas on their profile page.

Once again, all feedback regarding the category system is welcome at the PhilArchive Categorization Project discussion forum.