Text mining is the process of using computer technology to sift through text documents for the purposes of research and analysis. It is often considered very similar to the process known as data mining, but it relies on special programming to look in uncategorized text and find meaning or patterns instead of analyzing pre-categorized database information. Text mining has many applications in areas like science, marketing, and data organization.
The complexity involved in organizing words into language is much too extreme for computers to handle, but scientists have worked hard to improve this kind of programming. Many methods have been developed that let scientists identify phrases and discover facts about text. This is generally not the same as fully deciphering the meaning, but it allows for shortcuts that achieve many of the same goals. Text mining takes advantage of some of these techniques, and as this technology improves, text mining is generally expected to improve as well.
Experts use text information analysis primarily to do research into written documents. Large amounts of written data can be hard to analyze because of the tremendous amount of time required. Computers can go through this text much quicker, but they can't understand it. Text mining techniques allow computers to find useful trends in text, presenting the data in a way that may reveal new facts or allow experts to make discoveries.
An example of a use for this technology would be market research. Experts could analyze search results on a product name and have the program look for phrases that express user sentiment. In this way, they may find out how people really feel about their product in a very detailed way. They could also simply look for their product and see which phrases were popping up most often, and this might help them develop new ideas about how to please their customers.
Another use for mining text is analyzing scientific papers on similar subjects looking for new trends or agreements. This has allowed some scientists to make predictive assumptions that have proven useful in fields like protein analysis. Some experts think these sorts of applications may eventually provide unexpected discoveries.
A process called data mining is actually quite similar to the mining of text, but it is generally less complex to do because it relies on text that's already been formatted into categories. For example, the software could go through all the information for job applicants in a database, looking for trends. Text mining is more difficult for computers to do because pure text is harder to analyze than data with categories.