Wednesday, October 31, 2007

Q. What is DATA MINING ?

Ans. Data mining can be defined as "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data". Data mining may also be defined as "the science of extracting useful information from large data sets or databases".

Data mining is the principle of sorting through large amounts of data and picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but it is increasingly used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods.

Data mining has been cited as the method by which the U.S. Army unit Able Danger supposedly had identified the September 11, 2001 attacks leader, Mohamed Atta, and three other 9/11 hijackers as possible members of an al Qaeda cell operating in the U.S. more than a year before the attack.






Q. What is data warehousing?

A data warehouse is the main repository of an organization's historical data, its corporate memory. It contains the raw material for management's decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems.

Bill Inmon, an early and influential practitioner, has formally defined a data warehouse in the following terms;

Subject-oriented
The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;
Time-variant
The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;
Non-volatile
Data in the database is never over-written or deleted - once committed, the data is static, read-only, but retained for future reporting; and
Integrated
The database contains data from most or all of an organization's operational applications, and that this data is made consistent.
A data warehouse might be used to find the day of the week on which a company sold the most widgets in May 1992, or how employee sick leave the week before the winter break differed between California and New York from 2001–2005.

While operational systems are optimized for simplicity and speed of modification (see OLTP) through heavy use of database normalization and an entity-relationship model, the data warehouse is optimized for reporting and analysis (online analytical processing, or OLAP). Frequently data in data warehouses are heavily denormalised, summarised or stored in a dimension-based model. This is not always required to achieve acceptable query response times, however.



There are many advantages to using a data warehouse, some of them are:

Data warehouses enhance end-user access to a wide variety of data.
Decision support system users can obtain specified trend reports, e.g. the item with the most sales in a particular area within the last two years.
Data warehouses can be a significant enabler of commercial business applications, particularly customer relationship management (CRM) systems.

No comments: