Extracting real content
from the glut of online information
Filtering information from the Internet and proprietary networks can be
difficult and time-consuming. This is increasingly so for those working with the large
body of published scientific and technical information that supports research and
development. The growing online news and information service market is adding to the
confusion, thanks to the sheer number of online newspapers, news agencies, search engines,
To counter this problem, the TREVI project has developed a next-generation technology
that helps bridge the gap between the quantity of information available and time limits.
Current approaches to harnessing the information surplus do little more than index key
words. TREVI, in contrast, extracts and presents the concepts and relationships contained
within any collection of documents, thus raising productivity without increasing
The TREVI system includes a new text extraction and classification processor based on
robust linguistic capabilities for syntactic, morphological and semantic processing of raw
text. The resulting analysis detects relevant entities (e.g. person, location, company
names) and events in texts for intelligent text classification. The system is open and
object-oriented, and follows an agent-based paradigm for text enrichment.
TREVI focuses on the converging information market (information provider and broker,
information services based on databases) and especially the news agencies. It also
addresses the growing meta-information market on the Internet (search engines, portals,
community sites, etc.).
The demonstration shows the stand-alone version of the TREVI prototype, which
categorises information with respect to predefined classes, stores natural language texts
(English and Spanish) and then publishes them in a suitable framework. The input text
comes from a press agency, a financial information provider, a shipping company and an
online medical information provider.