Parent Rows Grouping

Overview

The SourceDbQuery Actor is used in the Broadway populations to query the source data. The Actor executes the query defined in the population flow, appended with a dynamically generated WHERE clause.

The server generates this WHERE clause automatically before execution. It incorporates the keys and values passed from the parent_rows output object of the PopulationArgs Actor.

image

Starting from Fabric V8.4, the structure of this generated WHERE clause is determined by the Parent Rows Grouping parameter located in the Fabric properties section of the Interface Type definition for JDBC.

Parent Rows Grouping Settings

The following table explains how each Parent Rows Grouping setting impacts the generated SQL query structure. The resulting query depends on the number of input keys and whether those keys contain multiple values.

Parameter Value Keys, Values Generated Query Example Comments
OR 1 key, multiple values SELECT * FROM PATIENT WHERE PATIENT_ID = ? OR PATIENT_ID = ? Default value for all DB types except Cassandra.
> 1 key SELECT * FROM PATIENT WHERE PATIENT_ID = ? AND VISIT_ID = ? OR PATIENT_ID = ? AND VISIT_ID = ?
IN 1 key, 1 value SELECT * FROM PATIENT WHERE VISIT_ID = ? Used for a single key with a single value. 
1 key, multiple values SELECT * FROM PATIENT WHERE VISIT_ID IN (?,?) Used for a single key with multiple values. 
> 1 key SELECT * FROM PATIENT WHERE (PATIENT_ID, VISIT_ID) IN ((?,?), (?,?)) Used for composite keys. Tupling is utilized within the JDBC SQL string.
IN_WITHOUT_TUPLING
1 key, 1 value SELECT * FROM PATIENT WHERE VISIT_ID = ? Used for a single key with a single value. 
1 key, multiple values SELECT * FROM PATIENT WHERE VISIT_ID IN (?, ?) Used for a single key with multiple values. 
> 1 key SELECT * FROM PATIENT WHERE PATIENT_ID = ? AND VISIT_ID = ? OR PATIENT_ID = ? AND VISIT_ID = ? Used for composite keys. Recommended for DBs that do not support Tupling, such as Aerospike.
NONE 1 or many SELECT * FROM PATIENT WHERE PATIENT_ID = ? AND VISIT_ID = ? Default for Cassandra; requires separate server calls for each value.

Performance Impact: OR vs. IN

The choice between OR and IN significantly impacts how the database engine parses and executes a query.

Generally, the IN operator is more efficient for large datasets because many database optimizers treat it as a single operation that can be sorted or evaluated against an index more quickly than multiple OR conditions.

While OR is the default for most database types, it can lead to slower performance as the number of values increases, as the engine must evaluate each condition sequentially.

Using IN with "Tupling" allows for more concise filtering on composite keys, though it requires support from the underlying database.

Parent Rows Grouping

Overview

The SourceDbQuery Actor is used in the Broadway populations to query the source data. The Actor executes the query defined in the population flow, appended with a dynamically generated WHERE clause.

The server generates this WHERE clause automatically before execution. It incorporates the keys and values passed from the parent_rows output object of the PopulationArgs Actor.

image

Starting from Fabric V8.4, the structure of this generated WHERE clause is determined by the Parent Rows Grouping parameter located in the Fabric properties section of the Interface Type definition for JDBC.

Parent Rows Grouping Settings

The following table explains how each Parent Rows Grouping setting impacts the generated SQL query structure. The resulting query depends on the number of input keys and whether those keys contain multiple values.

Parameter Value Keys, Values Generated Query Example Comments
OR 1 key, multiple values SELECT * FROM PATIENT WHERE PATIENT_ID = ? OR PATIENT_ID = ? Default value for all DB types except Cassandra.
> 1 key SELECT * FROM PATIENT WHERE PATIENT_ID = ? AND VISIT_ID = ? OR PATIENT_ID = ? AND VISIT_ID = ?
IN 1 key, 1 value SELECT * FROM PATIENT WHERE VISIT_ID = ? Used for a single key with a single value. 
1 key, multiple values SELECT * FROM PATIENT WHERE VISIT_ID IN (?,?) Used for a single key with multiple values. 
> 1 key SELECT * FROM PATIENT WHERE (PATIENT_ID, VISIT_ID) IN ((?,?), (?,?)) Used for composite keys. Tupling is utilized within the JDBC SQL string.
IN_WITHOUT_TUPLING
1 key, 1 value SELECT * FROM PATIENT WHERE VISIT_ID = ? Used for a single key with a single value. 
1 key, multiple values SELECT * FROM PATIENT WHERE VISIT_ID IN (?, ?) Used for a single key with multiple values. 
> 1 key SELECT * FROM PATIENT WHERE PATIENT_ID = ? AND VISIT_ID = ? OR PATIENT_ID = ? AND VISIT_ID = ? Used for composite keys. Recommended for DBs that do not support Tupling, such as Aerospike.
NONE 1 or many SELECT * FROM PATIENT WHERE PATIENT_ID = ? AND VISIT_ID = ? Default for Cassandra; requires separate server calls for each value.

Performance Impact: OR vs. IN

The choice between OR and IN significantly impacts how the database engine parses and executes a query.

Generally, the IN operator is more efficient for large datasets because many database optimizers treat it as a single operation that can be sorted or evaluated against an index more quickly than multiple OR conditions.

While OR is the default for most database types, it can lead to slower performance as the number of values increases, as the engine must evaluate each condition sequentially.

Using IN with "Tupling" allows for more concise filtering on composite keys, though it requires support from the underlying database.