Weighting: Inverse Host Frequency (IHF)
Observation
Not all URLs are equally
useful
e.g., aggregator services
Desired weighting scheme
Low weights to aggregator
web sites
High weights to personal and
group publication pages
Inverse Host Frequency (IHF)
Similar to Inverse Document
Frequency (IDF) in
information retrieval
Consider citations of top
100 authors in DBLP (by
number of citations)
For each such citation,
query search engine with
its title to obtain URLs,
truncate them to their
hostnames
If a hostname h has
frequency f(h), then its
IHF is