Show/Hide Toolbars

The aim of data condensation is to create a set of observations that has significant smaller count than the observation set of the source feature but is still a reasonable representation of the real world phenomenon. Condensing data might be a good choice to get a reasonable sized set of observations that can be manually examined, presented graphically, or that can be further processed with reasonable performance.

The data condensation: simple calculation type provides very simple condensation algorithms that do not find a reasonable representation of the real world phenomenon in all cases, but are fast in terms of processing and prove to be useful if you can make certain assumptions on the characteristics of your source data.

Characteristic

Description

Supports incremental execution

Yes

Output typing

Implicit by pin, derived from feature connected to input pin 1

Locking of source features

Observation modification, observation deletion

Spatial data handling

Copy

Table 1: Calculation brief

No

Name

Type, Constraint

Multiplicity (Min,Max)

1

Feature to process

Features or calculations

1,1

Table 2: Input pins

Configuration

Type

Notes

Default value

Include erroneous

Boolean

If set to true, erroneous observations of the source feature or calculation will be processed.

False

Condensation algorithm

Enumeration

Options are:

Each n-th

One per interval

100

Include last observation

Boolean

Setting for the Each n-th condensation algorithm. If true the last observation of the source feature or calculation will always be added to the output

false

n-th

Numeric

Setting for the Each n-th condensation algorithm. Defines which observations of the source feature are picked. Must be a integer number greater or equal 1.

10

Start timestamp

Date & time

Setting for the One per interval condensation algorithm. Defines the start time from which intervals are calculated.

Can be left undefined.


Time interval

Time span

Setting for the One per interval condensation algorithm. Defines the time span. Must be a time span greater than 0.

24:00:00

Table 3: Configuration settings

If the calculation is the final calculation of the algorithm the used classifications of the source features or calculations has to be the used classification in the domain of the calculated feature. If this is not the case the calculation will fail.

Each n-th Algorithm

This algorithm simply takes each n-th observation of the source feature or calculation. Figure 1 shows a formal specification of the algorithm.

The configuration setting n-th influences the condensation ratio defined as NumberOfGeneratedObservations / NumberOfOriginalObservations. Higher values for n-th produce smaller condensation ratios.  If set to true the configuration setting include last observation will always add the last observation of the source feature or calculation to the output.

Figure 2 shows an example output of the Each n-th algorithm.

The each n-th algorithm is useful when dealing with observations recorded at a high and regular sampling rate but with little changes in the observed property values and if the exact time when a property value changed is not important.

Figure 1: Selection of output observations by the Each n-th Algorithm.

Figure 2: Example of Each n-th algorithm

One per interval        

This algorithm will choose only one observation per interval. The intervals are defined by the Start timestamp and Time interval configuration setting  by using the algorithm shown in figure 3. Figure 4 shows which observation is picked from all the observations found in an interval.

This algorithm is useful if you have observations at a high and irregular sampling rate with little changes in the observed property values and if the exact time when a property value changed is not important.

Figure 3: Specification of the intervals for the on per interval algorithm

Figure 3: Picking one observation per interval

Examples for picking an observation from an interval:

If the interval contains observations with sampling timestamps {t1, t2, t3, t4, t5}, t3 will be chosen.

If the interval contains observations with sampling timestamps {t1, t2, t3, t4}, t2 will be chosen.

If the interval contains observations with sampling timestamps {t1, t2}, t1 will be chosen.

If the interval contains observations with sampling timestamps {t1}, t1 will be chosen.

© 2021 AFRY Austria GmbH, www.redbex.com