Weighting: Inverse Host Frequency (IHF)
Not all URLs are equally
e.g., aggregator services
Desired weighting scheme
Low weights to aggregator
web sites
High weights to personal and
group publication pages
Inverse Host Frequency (IHF)
Similar to Inverse Document
Frequency (IDF) in
information retrieval
Consider citations of top
100 authors in DBLP (by
number of citations)
For each such citation,
query search engine with
its title to obtain URLs,
truncate them to their
If a hostname h has
frequency f(h), then its
IHF is