By Alison Balmat
Kitchen appliances.
Type words in little white box. Click "Search." Wait. See results
pop up. Click on first link. Skim webpage. Log off.
"That's
how searching the World Wide Web seems to work for most people,"
explains Purvi Shah. "They log on, enter a query or two, scan a
web page, and log off."
Shah is one of four undergraduate students working with Amanda
Spink, an associate professor in Penn State's School of Information
Sciences and Technology. Using data from Excite!,
they are studying people's behavior on the web in order to develop
more effective search engines.
Say you really wanted an electric can opener the kind that
is cordless, lightweight, and opens 20 cans on a single charge.
Says Spink, "Can openers might not even pop up on that first page
of search results." You'll only get convection ovens, waffle irons,
and stainless steel refrigerators unless you search further, but
most people give up after "kitchen appliances."
Very few queries incorporate the search engine's advanced features
options that allow for a more specific and accurate search
adds Michelle Sollenberger, another undergraduate working
with Spink. For example, less than five percent of all queries use
Boolean operators words such as "AND" and "OR," which narrow
or broaden a search and mistakes are common, failing to capitalize
the operator, for example. The "+" and "-" modifiers, which specify
a certain term to include or exclude in a search, are rarely seen,
and using quotation marks to create phrases is a technique absent
in most queries.
"When you search," explains Sollenberger, "using the advanced features
will break the query up into smaller pieces and the results will
be more specific." But when users try to take advantage of these
features which is rare they tend to use them incorrectly.
People use symbols such as ":" or "&" to separate terms, as you
might in everyday writing, yet the Excite! search engine
cannot recognize them. Stephanie Milchak, another of Spink's students,
explains that finding patterns of mistakes like these is vital to
improving the search engines. "Engine designers want to know exactly
what users do," she says. "Our results could lead to a new generation
of web-searching tools that work with people."
But first, the data must be scrutinized. Excite! recently
compiled 30 billion queries to analyze and "happily gave me a chunk
of that data to play around with," laughs Spink.
Undergraduate Darcy Comstock, for instance, has a stack of papers
several inches high, each page filled with 12-digit numbers
anonymous user-identification numbers and a list of every
word for which that user searched. She is looking at the number
of queries each user enters, the number of words per query, and
how the queries change (if the user adds or subtracts words) during
the session.
"The search engine actually records each individual letter that
is entered and stores it all." Adds Comstock: "The web companies
are processing more than 30 million queries per day; that's a whole
lot of data to tabulate."
So far, Spink and her students have found that, on average, people
type 2.5 words into the little white box. More than half of these
words are proper names or slang terms for which the search engines
often cannot find exact matches. A small number of words
about 75 are repeated frequently. "There's lots of 'ands,'
'ofs,' and 'thes,' but we also see 'sex,' 'free,' 'nude,' 'university,'
and 'music' a lot," Shah says.
Shah, Comstock, and Milchak are categorizing each word as
entertainment, sex, or travel, for example and will then
look within each category for patterns showing how users search
for information.
Meanwhile, Sollenberger is tallying spelling errors work
that will eventually, Spink hopes, lead to the development of a
dictionary that can automatically correct common mistakes.
"The number of queries posed on the web is huge, but searching
isn't giving people the results that it potentially can," Spink
says. Search engines can be tricky. Says Spink, "We want users to
persevere and find the best answers to their problems." Even if
that "problem" is just finding a fancy can opener.
Amanda Spink, Ph.D., is associate professor in the School of
Information Sciences and Technology, 511 Rider Bldg., University
Park, PA 16802; 814-865-4454; spink@ist.psu.edu.
Her research is funded by the National Science Foundation, NEC,
IBM, and Excite!. Darcy Comstock, Michelle Sollenberger,
and Purvi Shah are information sciences and technology majors. Stephanie
Milchak is a computer engineering major in the College of Engineering.
All four are participating in the Women in Science and Engineering
Research (WISER) program, for which first-year women students receive
credit their first semester in the lab and payment the second semester.
WISER is administered by the Pennsylvania Space Grant Consortium
and funded by the College of Agricultural Sciences, the College
of Earth and Mineral Sciences, the College of Engineering, the Eberly
College of Science, EOPC, Lockheed Martin Services Group, NASA,
and the National Science Foundation. Visit WISER at www.psu. edu/spacegrant/wiser.
Writer Alison Balmat will graduate in May 2002 with a B.A. in French
and geography, with honors in geography. Illustrator Livio Ramondelli
is an undergraduate majoring in visual arts.