Keynote
Challenges for Dataset Search
Ranked search of datasets has emerged as a need as shared scientific archives grow in size and
variety. Our own investigations have shown that IR-style, feature-based relevance scoring can be an
effective tool for data discovery in scientific archives. However, maintaining interactive response
times as archives scale will be a challenge. We report here on our exploration of performance
techniques for Data Near Here, a dataset search service. We present a sample of results evaluating
filter-restart techniques in our system, including two variations, adaptive relaxation
and contraction. We then outline further directions for research in this domain.