The aim of data condensation is to create a set of observations that has significant smaller count than the observation set of the source feature but is still a reasonable representation of the real world phenomenon. Condensing data might be a good choice to get a reasonable sized set of observations that can be manually examined, presented graphically, or that can be further processed with reasonable performance.
The data condensation: simple calculation type provides very simple condensation algorithms that do not find a reasonable representation of the real world phenomenon in all cases, but are fast in terms of processing and prove to be useful if you can make certain assumptions on the characteristics of your source data.
Characteristic |
Description |
Supports incremental execution |
Yes |
Output typing |
Implicit by pin, derived from feature connected to input pin 1 |
Locking of source features |
Observation modification, observation deletion |
Copy |
Table 1: Calculation brief
No |
Name |
Type, Constraint |
Multiplicity (Min,Max) |
1 |
Feature to process |
Features or calculations |
1,1 |
Table 2: Input pins
Configuration |
Type |
Notes |
Default value |
Include erroneous |
Boolean |
If set to true, erroneous observations of the source feature or calculation will be processed. |
False |
Condensation algorithm |
Enumeration |
Options are: •Each n-th •One per interval |
100 |
Include last observation |
Boolean |
Setting for the Each n-th condensation algorithm. If true the last observation of the source feature or calculation will always be added to the output |
false |
n-th |
Numeric |
Setting for the Each n-th condensation algorithm. Defines which observations of the source feature are picked. Must be a integer number greater or equal 1. |
10 |
Start timestamp |
Date & time |
Setting for the One per interval condensation algorithm. Defines the start time from which intervals are calculated. Can be left undefined. |
|
Time interval |
Time span |
Setting for the One per interval condensation algorithm. Defines the time span. Must be a time span greater than 0. |
24:00:00 |
Table 3: Configuration settings
If the calculation is the final calculation of the algorithm the used classifications of the source features or calculations has to be the used classification in the domain of the calculated feature. If this is not the case the calculation will fail.
Each n-th Algorithm
This algorithm simply takes each n-th observation of the source feature or calculation. Figure 1 shows a formal specification of the algorithm.
The configuration setting n-th influences the condensation ratio defined as NumberOfGeneratedObservations / NumberOfOriginalObservations. Higher values for n-th produce smaller condensation ratios. If set to true the configuration setting include last observation will always add the last observation of the source feature or calculation to the output.
Figure 2 shows an example output of the Each n-th algorithm.
The each n-th algorithm is useful when dealing with observations recorded at a high and regular sampling rate but with little changes in the observed property values and if the exact time when a property value changed is not important.
Figure 1: Selection of output observations by the Each n-th Algorithm.
Figure 2: Example of Each n-th algorithm
One per interval
This algorithm will choose only one observation per interval. The intervals are defined by the Start timestamp and Time interval configuration setting by using the algorithm shown in figure 3. Figure 4 shows which observation is picked from all the observations found in an interval.
This algorithm is useful if you have observations at a high and irregular sampling rate with little changes in the observed property values and if the exact time when a property value changed is not important.
Figure 3: Specification of the intervals for the on per interval algorithm
Figure 3: Picking one observation per interval
Examples for picking an observation from an interval:
•If the interval contains observations with sampling timestamps {t1, t2, t3, t4, t5}, t3 will be chosen.
•If the interval contains observations with sampling timestamps {t1, t2, t3, t4}, t2 will be chosen.
•If the interval contains observations with sampling timestamps {t1, t2}, t1 will be chosen.
•If the interval contains observations with sampling timestamps {t1}, t1 will be chosen.