中文网站
  Advanced Search
Read the latest Blogs from IT professionals in the field. Read and write community created documents. Need IT help? Ask our staff. Connect with your peers. Check our Tech Shop for posters, books and software tools. Home

Data Mining

Data Mining is the automated extraction of hidden predictive information from databases. Both relational and OLAP technologies have tremendous capabilities for navigating massive data warehouses, but not enough for today's market needs. A new technological leap is needed to structure and prioritize information for specific end-user problems. The data mining tools can make this leap. Quantifiable business benefits have been proven through the integration of data mining with current information systems. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities:

  • Automated prediction of trends and behaviors.
  • Automated discovery of previously unknown patterns.

Data mining needs to be supported by three technologies:

  • Massive data collection
  • Powerful multiprocessor computers
  • Data mining algorithms

All three areas are mature for real world data mining applications today. The core components of data mining technology are statistics, artificial intelligence, and machine learning. The most commonly used techniques in data mining are:

  • Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.
  • Decision trees: Tree-shaped structures that represent sets of decisions.
  • Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.
  • Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset.
  • Rule induction: The extraction of useful if-then rules from data based on statistical significance.

These techniques, coupled with high-performance relational database engines and broad data integration efforts, make data mining practical for current data warehouse environments. When data mining tools are implemented on high performance parallel processing systems, they can analyze massive databases in minutes. Faster processing means that users can automatically experiment with more models to understand complex data. High speed makes it practical for users to analyze huge quantities of data. Larger databases, in turn, yield improved predictions.

Many companies have deployed successful applications of data mining. While early adopters of this technology have tended to be in information-intensive industries such as financial services and direct mail marketing, the technology is applicable to any company looking to leverage a large data warehouse to better manage their customer relationships.

Integrated Data Mining Architecture

Integrated Data Mining Architecture

Related Terms: Relational Database, OLAP, Data Warehouse