Java VTL: Java implementation of VTL
May 21, 2019 ยท View on GitHub
Java VTL: Java implementation of VTL
The Java VTL project is an open source java implementation of the VTL 1.1 draft specification. It follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores.
Visit the interactive reference manual for more information.
Modules
The project is divided in modules;
- java-vtl-parent
- java-vtl-parser, contains the lexer and parser for VTL.
- java-vtl-model, VTL data model.
- java-vtl-script, JSR-223 (ScriptEngine) implementation.
- java-vtl-connector, connector API.
- java-vtl-tools, various tools.
Usage
Add a dependency to the maven project
<dependency>
<groupId>no.ssb.vtl</groupId>
<artifactId>java-vtl-script</artifactId>
<version>0.1.13-SNAPSHOT</version>
</dependency>
Evaluate VTL expressions
ScriptEngine engine = new VTLScriptEngine(connector);
Bindings bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
engine.eval("ds1 := get(\"foo\")" +
"ds2 := get(\"bar\")" +
"ds3 := [ds1, ds2] {" +
" filter ds1.id = \"string\"," +
" total := ds1.measure + ds2.measure" +
"}");
System.out.println(bindings.get("ds3"))
Connect to external systems
VTL Java uses the no.ssb.vtl.connector.Connector interface to access and
export data from and to external systems.
The Connector interface defines three methods:
public interface Connector {
boolean canHandle(String identifier);
Dataset getDataset(String identifier) throws ConnectorException;
Dataset putDataset(String identifier, Dataset dataset) throws ConnectorException;
}
The method canHandle(String identifier) is used by the engine to find
which connector is able to provide a Dataset for a given identifier.
The method getDataset(String identifier) is then called to get the dataset.
Example implementations can be found in the java-vtl-ssb-api-connector module
but a very crude implementation could be as such:
class StaticDataset implements Dataset {
private final DataStructure structure = DataStructure.builder()
.put("id", Role.IDENTIFIER, String.class)
.put("period", Role.IDENTIFIER, Instant.class)
.put("measure", Role.MEASURE, Long.class)
.put("attribute", Role.ATTRIBUTE, String.class)
.build();
@Override
public Stream<DataPoint> getData() {
List<Map<String, Object>> data = new ArrayList<>();
HashMap<String, Object> row = new HashMap<>();
Instant period = Instant.now();
for (int i = 0; i < 100; i++) {
row.put("id", "id #" + i);
row.put("period", period);
row.put("measure", Long.valueOf(i));
row.put("attribute", "attribute #" + i);
data.add(row);
}
return data.stream().map(structure::wrap);
}
@Override
public Optional<Map<String, Integer>> getDistinctValuesCount() {
return Optional.empty();
}
@Override
public Optional<Long> getSize() {
return Optional.of(100L);
}
@Override
public DataStructure getDataStructure() {
return structure;
}
}
Implementation roadmap
This is an overview of the implementation progress.
| Group | Operators | Progress | Comment |
|---|---|---|---|
| General purpose | round parenthesis | ||
| General purpose | := (assignment) | ||
| General purpose | membership | ||
| General purpose | get | The keep, filter and aggregate options are not implemented. | |
| General purpose | put | Defined in the grammar but not implemented | |
| Join expression | []{} | ||
| Join clause | filter | ||
| Join clause | keep | ||
| Join clause | drop | ||
| Join clause | fold | ||
| Join clause | unfold | ||
| Join clause | rename | ||
| Join clause | := (assignment) | ||
| Join clause | . (membership) | ||
| Clauses | rename | ||
| Clauses | filter | ||
| Clauses | keep | ||
| Clauses | calc | ||
| Clauses | attrcalc | ||
| Clauses | aggregate | ||
| Conditional | if-then-else | ||
| Conditional | nvl | ||
| Validation | Comparisons (>,<,>=,<=,=,<>) | ||
| Validation | in,not in, between | ||
| Validation | isnull | Implemented syntax are isnull(value), value is null and value is not null | |
| Validation | exist_in, not_exist_in | ||
| Validation | exist_in_all, not_exist_in_all | ||
| Validation | check | The boolean dataset must be built manually (no lifting). | |
| Validation | match_characters | ||
| Validation | match_values | ||
| Statistical | min, max | ||
| Statistical | hierarchy | The inline definition is not supported. A dataset that has a correct structure can be used instead. | |
| Statistical | aggregate | ||
| Relational | union | ||
| Relational | intersect | ||
| Relational | symdiff | ||
| Relational | setdiff | ||
| Relational | merge | ||
| Boolean | and | Only inside join expression (no lifting). | |
| Boolean | or | Only inside join expression (no lifting). | |
| Boolean | xor | Only inside join expression (no lifting). | |
| Boolean | not | Only inside join expression (no lifting). | |
| Mathematical | unary plus and minus | ||
| Mathematical | addition, substraction | ||
| Mathematical | multiplication, division | ||
| Mathematical | round, ceil, floor | ||
| Mathematical | abs | ||
| Mathematical | trunc | ||
| Mathematical | power, exp, nroot | ||
| Mathematical | ln, log | ||
| Mathematical | mod | ||
| String | length | ||
| String | concatenation | ||
| String | trim | ||
| String | upper/lower case | ||
| String | substr | No lifting. | |
| String | indexof | ||
| String | date_from_string | Dataset as input not implemented. Only YYYY date format accepted. | |
| Outside specification | integer_from_string | ||
| Outside specification | float_from_string | ||
| Outside specification | string_from_number |