Framework to find Representative Instances in Data Streams Using Random Sampling
Keywords:
Data streams, Random Sampling, Very fast decision tree.Abstract
There is a lot of data explosion in today's dynamic environment. Many applications are continuously generating data and require an immediate response. The term "data stream" refers to a continuous stream of data arriving throughout time. Mining data streams is a challenging task due to its characteristics like size, speed and diverse data arriving over the period of time. This paper addresses the characteristic volume of data that cannot be stored entirely in main memory for processing due to lack in computation facilities. We use a probabilistic technique simple random sampling without replacement to extract the samples from the given population of data over a period of time. An incremental classifier very fast decision tree is used to classify the data streams for the extracted sample. Accuracy, response time, memory consumption of the samples are recorded and compared with the population data.