There are a variety of different data mining methods used in both software options and theoretical concepts. These allow users to extract information from data collected by individuals and companies utilizing a variety of tools. Large amounts of data can be used to determine various factors in a single subject or variety of subjects. These data mining methods are most commonly utilized in the the fields of fraud protection, marketing and surveillance.
For hundreds of years, data mining methods have been used to extract information from subjects. Modern techniques, however, use automated concepts to provide substantial data via computerized resources. As computer sciences emerged during the 20th century, the concept of data mining methods developed in an effort to overcome hidden patterns in large swaths of collected data. A good example of this is when an advertising firm analyzing the shopping patterns of an online customer. This company can then market certain products that the individual may be interested in purchasing.
One data mining technique used commonly in the industry is called Knowledge Discovery in Databases (KDD). Developed in 1989 by Gregory Piatetsky-Shapiro, KDD allows users to process raw data, analyze the information for necessary data and interpret the results. This method allows users to find patterns in the algorithms, however, the general data is not always accurate and can be put together in compromising ways. This is known as overfitting.
Basic data mining methods involve four particular types of tasks: classification, clustering, regression, and association. Classification takes the information present and merges it into defined groupings. Clustering removes the defined groupings and allows the data to classify itself by similar items. Regression focuses on the function of the information, modeling the data on concept. The final data mining method, association, attempts to find relationships between the various data feeds.
When using the various data mining methods, certain standards are used to determine which parameters can be used in the process. The Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) holds an annual meeting determining which processes are appropriate. Ethical factors are weighed along with practical applications to find the best information about individuals and companies. This information is published in an industry journal called SIGKDD Explorations.