Java VTL: Java implementation of VTL

May 21, 2019 ยท View on GitHub

Build Status Codecov coverage Gitter

Java VTL: Java implementation of VTL

The Java VTL project is an open source java implementation of the VTL 1.1 draft specification. It follows the JSR-223 Java Scripting API and exposes a simple connector interface one can implement in order to integrate with any data stores.

Visit the interactive reference manual for more information.

Modules

The project is divided in modules;

  • java-vtl-parent
    • java-vtl-parser, contains the lexer and parser for VTL.
    • java-vtl-model, VTL data model.
    • java-vtl-script, JSR-223 (ScriptEngine) implementation.
    • java-vtl-connector, connector API.
    • java-vtl-tools, various tools.

Usage

Add a dependency to the maven project

<dependency>
    <groupId>no.ssb.vtl</groupId>
    <artifactId>java-vtl-script</artifactId>
    <version>0.1.13-SNAPSHOT</version>
</dependency>

Evaluate VTL expressions

ScriptEngine engine = new VTLScriptEngine(connector);

Bindings bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
engine.eval("ds1 := get(\"foo\")" +
            "ds2 := get(\"bar\")" +
            "ds3 := [ds1, ds2] {" +
            "   filter ds1.id = \"string\"," +
            "   total := ds1.measure + ds2.measure" +
            "}");

System.out.println(bindings.get("ds3"))

Connect to external systems

VTL Java uses the no.ssb.vtl.connector.Connector interface to access and export data from and to external systems.

The Connector interface defines three methods:

public interface Connector {

    boolean canHandle(String identifier);

    Dataset getDataset(String identifier) throws ConnectorException;

    Dataset putDataset(String identifier, Dataset dataset) throws ConnectorException;

}

The method canHandle(String identifier) is used by the engine to find which connector is able to provide a Dataset for a given identifier.

The method getDataset(String identifier) is then called to get the dataset. Example implementations can be found in the java-vtl-ssb-api-connector module but a very crude implementation could be as such:

class StaticDataset implements Dataset {

    private final DataStructure structure = DataStructure.builder()
            .put("id", Role.IDENTIFIER, String.class)
            .put("period", Role.IDENTIFIER, Instant.class)
            .put("measure", Role.MEASURE, Long.class)
            .put("attribute", Role.ATTRIBUTE, String.class)
            .build();

    @Override
    public Stream<DataPoint> getData() {

        List<Map<String, Object>> data = new ArrayList<>();
        HashMap<String, Object> row = new HashMap<>();
        Instant period = Instant.now();
        for (int i = 0; i < 100; i++) {
            row.put("id", "id #" + i);
            row.put("period", period);
            row.put("measure", Long.valueOf(i));
            row.put("attribute", "attribute #" + i);
            data.add(row);
        }

        return data.stream().map(structure::wrap);
    }

    @Override
    public Optional<Map<String, Integer>> getDistinctValuesCount() {
        return Optional.empty();
    }

    @Override
    public Optional<Long> getSize() {
        return Optional.of(100L);
    }

    @Override
    public DataStructure getDataStructure() {
        return structure;
    }
}

Implementation roadmap

This is an overview of the implementation progress.

GroupOperatorsProgressComment
General purposeround parenthesisdone
General purpose:= (assignment)done
General purposemembershipdone
General purposegetusableThe keep, filter and aggregate options are not implemented.
General purposeputusableDefined in the grammar but not implemented
Join expression[]{}done
Join clausefilterdone
Join clausekeepdone
Join clausedropdone
Join clausefolddone
Join clauseunfolddone
Join clauserenamedone
Join clause:= (assignment)done
Join clause. (membership)done
Clausesrenamedone
Clausesfilterdone
Clauseskeepdone
Clausescalctodo
Clausesattrcalctodo
Clausesaggregatetodo
Conditionalif-then-elsetodo
Conditionalnvldone
ValidationComparisons (>,<,>=,<=,=,<>)done
Validationin,not in, betweentodo
ValidationisnulldoneImplemented syntax are isnull(value), value is null and value is not null
Validationexist_in, not_exist_intodo
Validationexist_in_all, not_exist_in_alltodo
ValidationcheckusableThe boolean dataset must be built manually (no lifting).
Validationmatch_characterstodo
Validationmatch_valuestodo
Statisticalmin, maxtodo
StatisticalhierarchyusableThe inline definition is not supported. A dataset that has a correct structure can be used instead.
Statisticalaggregatetodo
Relationaluniondone
Relationalintersecttodo
Relationalsymdifftodo
Relationalsetdiffdone
Relationalmergetodo
BooleanandusableOnly inside join expression (no lifting).
BooleanorusableOnly inside join expression (no lifting).
BooleanxorusableOnly inside join expression (no lifting).
BooleannotusableOnly inside join expression (no lifting).
Mathematicalunary plus and minusdone
Mathematicaladdition, substractiondone
Mathematicalmultiplication, divisiondone
Mathematicalround, ceil, floordone
Mathematicalabsdone
Mathematicaltruncdone
Mathematicalpower, exp, nrootdone
Mathematicalln, logdone
Mathematicalmoddone
Stringlengthtodo
Stringconcatenationdone
Stringtrimtodo
Stringupper/lower casetodo
StringsubstrusableNo lifting.
Stringindexoftodo
Stringdate_from_stringusableDataset as input not implemented. Only YYYY date format accepted.
Outside specificationinteger_from_stringdone
Outside specificationfloat_from_stringdone
Outside specificationstring_from_numberdone

Analytics