Text Mining - извлечение знаний из текста  

Извлекая знания из хаоса информации - статья из PC WEEK
Изучая публикации, посвященные обработке документов, легко заметить, что популярный два-три года назад термин "управление знаниями" (knowledge management, KM) сегодня встречается гораздо реже.
ThoughtTreasure - Самый крутой сайт - с отличным описанием и исходниками
ThoughtTreasure is a commonsense knowledge base and architecture for natural language processing that uses multiple representations including logic, finite automata, grids, and scripts. The ThoughtTreasure knowledge base consists of:

35,000 English words and phrases,
21,000 French words and phrases,
27,000 concepts, and
51,000 commonsense assertions about those concepts.
The ThoughtTreasure architecture consists of:
a text agency for tagging words, phrases, and named entities in text,
a syntactic component for producing syntactic parse trees,
a semantic component for producing surface-level semantic parses and resolving anaphora,
a generator for converting assertions into English and French,
a planning agency for achieving goals in a simulated world, and
an understanding agency for producing an in-depth understanding of a text.
Some pieces of common sense in ThoughtTreasure include:

Soda is a drink.
People have necks.
Excellent food is called quality food.
A play lasts about two hours.
One hangs up at the end of a phone call.
Applications of ThoughtTreasure include question answering, commonsense-enabled agents, and story understanding. See the list of applications and research that use ThoughtTreasure.

To learn more

Speech acts - agents - semiotics
"In general, communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs."


Hosted by uCoz