Package simpleml.dataset

April 22, 2022 ยท View on GitHub

Tutorial - Idea and basic concepts | Interface | API | DSL

Table of Contents


Class AttributeTransformer

No description available.

Constructor: Class has no constructor.


Class Dataset

A dataset with its data instances (e.g., rows and columns)

Constructor: Class has no constructor.

addAttribute (Instance Method )

Add a new attribute to the dataset with values according to a transformation function

Parameters:

  • newAttributeId: String - The ID of the new attribute
  • transformer: AttributeTransformer - The attribute transformer to be used.
  • newAttributeLabel: String? = null - The name of the new attribute.

Results:

  • dataset: Dataset - The updated dataset

dropAllMissingValues (Instance Method )

Drops instances with missing values

Parameters: None expected.

Results:

  • dataset: Dataset - The updated dataset

dropAttribute (Instance Method )

Drop a single attribute from a dataset

Parameters:

  • attribute: String - The attribute to drop from the dataset

Results:

  • dataset: Dataset - The updated dataset

dropAttributes (Instance Method )

Drop attributes from a dataset

Parameters:

  • vararg attributes: String - The list of attributes to drop from the dataset

Results:

  • dataset: Dataset - The updated dataset

dropMissingValues (Instance Method )

Drops instances with missing values in the specified attribute

Parameters:

  • attribute: String - Attribute whose empty values are dropped

Results:

  • dataset: Dataset - The updated dataset

exportDataAsFile (Instance Method )

Export any dataset to CSV file

Parameters:

  • filePath: String - No description available.

Results: None returned.

filterInstances (Instance Method )

Remove instances in a dataset according to a filter function

Parameters:

  • filterFunc: (instance: Instance) -> shouldKeep: Boolean - The filter function that returns either True (keep) or False (remove) for each instance

Results:

  • dataset: Dataset - The updated dataset

getRow (Instance Method )

Get a specific row of a dataset

Parameters:

  • rowNumber: Int - The number of the row to be retrieved

Results:

  • instance: Instance - The specified row

keepAttribute (Instance Method )

Retain a single attribute of a dataset

Parameters:

  • attribute: String - The attribute to retain in the dataset

Results:

  • dataset: Dataset - The updated dataset

keepAttributes (Instance Method )

Retain attributes of a dataset

Parameters:

  • vararg attributes: String - The list of attributes to retain in the dataset

Results:

  • dataset: Dataset - The updated dataset

sample (Instance Method )

Create a sample of a dataset

Parameters:

  • nInstances: Int - Number of instances in the sample

Results:

  • sample: Dataset - The sampled dataset

setTargetAttribute (Instance Method )

Set the specified attribute as prediction target

Parameters:

  • targetAttribute: String - The attribute to be predicted later on

Results:

  • dataset: Dataset - The updated dataset

splitIntoTrainAndTest (Instance Method )

Split a dataset in a train and a test dataset

Parameters:

  • trainRatio: Float - The percentage of instances to keep in the training dataset
  • randomState: Int? = null - A random seed to use for splitting

Results:

  • train: Dataset - The training dataset
  • test: Dataset - The test dataset

splitIntoTrainAndTestAndLabels (Instance Method )

Split a dataset into four datasets: train/test and labels/features. Requires that a target attribute has been set before via setTargetAttribute()

Parameters:

  • trainRatio: Float - The percentage of instances to keep in the training dataset
  • randomState: Int? = null - A random seed to use for splitting

Results:

  • xTrain: Dataset - Features of the training dataset
  • xTest: Dataset - Features of the test dataset
  • yTrain: Dataset - Labels of the training dataset
  • yTest: Dataset - Labels of the test dataset

transform (Instance Method )

Update existing attribute with values according to a transformation function

Parameters:

  • attributeId: String - The ID of the attribute to be replaced
  • transformer: AttributeTransformer - The attribute transformer to be used

Results:

  • dataset: Dataset - The updated dataset

transformDatatypes (Instance Method )

Convert all column values into numbers

Parameters: None expected.

Results:

  • dataset: Dataset - The updated dataset

Class DayOfTheYearTransformer

An attribute transformer to convert a date attribute to its day in the year.

Constructor parameters:

  • attributeId: String - The ID of the date attribute.

Class Instance

A single instance (e.g., row) of a dataset

Constructor: Class has no constructor.

getValue (Instance Method )

Return a specific value of the instance

Parameters:

  • attribute: String - The attribute whose value is returned

Results:

  • value: Any - The specified value

Class StandardNormalizer

A normalizer to normalize dataset values

Constructor parameters: None expected.

normalize (Instance Method )

Normalize all numeric values in the dataset

Parameters:

  • dataset: Dataset - Dataset to be normalized

Results:

  • normalizedDataset: Dataset - The normalized dataset

Class StandardScaler

A scaler to scale dataset values

Constructor parameters: None expected.

scale (Instance Method )

Scale all numeric values in the dataset

Parameters:

  • dataset: Dataset - Dataset to be scaled

Results:

  • scaledDataset: Dataset - The scaled dataset

Class TimestampTransformer

An attribute transformer to convert a date attribute to its timestamp.

Constructor parameters:

  • attributeId: String - The ID of the date attribute.

Class WeekDayTransformer

An attribute transformer to convert a date attribute to its weekday (as a string).

Constructor parameters:

  • attributeId: String - The ID of the date attribute.

Class WeekendTransformer

An attribute transformer to convert a date attribute to whether the date is on the weekend or not.

Constructor parameters:

  • attributeId: String - The ID of the date attribute.

Global Functions

Global Function joinTwoDatasets

Join two datasets into one dataset

Parameters:

  • dataset1: Dataset - The first dataset
  • dataset2: Dataset - The second dataset
  • attributeId1: String - The attribute of the first dataset to use for the join
  • attributeId2: String - The attribute of the second dataset to use for the join
  • suffix1: String - The suffix to be attached to the attribute names of the first dataset
  • suffix2: String - The suffix to be attached to the attribute names of the second dataset

Results:

  • dataset: Dataset - The joined dataset

Global Function loadDataset

Load a dataset via its identifier

Parameters:

  • datasetID: String - Identifier of the dataset

Results:

  • dataset: Dataset - The loaded dataset

Global Function readDataSetFromCSV

Load a dataset from a CSV file

Parameters:

  • fileName: String - Path and name of the CSV file
  • datasetId: String - Identifier of the dataset
  • separator: String - Separator used in the file
  • hasHeader: String - True, if the file has a header row
  • nullValue: String - String that should be parsed as missing value
  • datasetName: String - Name of the dataset
  • coordinateSystem: Int = 3857 - Coordinate system used in the geometry columns of the dataset

Results:

  • dataset: Dataset - The loaded dataset

This file was created automatically. Do not change it manually!