Feature Store & Analytic Function Tools

May 14, 2026 ยท View on GitHub

Dependencies

Assumes Teradata >=17.20. Requires the fs optional extra (uv sync --extra fs), which installs teradataml.


Feature Store tools (fs_*)

  • reconnect_to_database - reestablishes a connection to the Teradata database
  • fs_setFeatureStoreConfig - sets the feature store config for the session
  • fs_getFeatureStoreConfig - gets the feature store config for the session
  • fs_isFeatureStorePresent - confirms that a feature store has been created
  • fs_featureStoreContent - returns a summary of the feature store
  • fs_getDataDomains - lists the available domains in a feature store
  • fs_getFeatures - gets the features within the feature store
  • fs_getAvailableDatasets - lists the available datasets in the feature store
  • fs_getFeatureDataModel - returns the schema of the feature store
  • fs_getAvailableEntities - returns the entities in a domain
  • fs_createDataset - creates a data set from the feature store

teradataml Analytic Function tools (tdml_*)

These tools are registered dynamically at server startup from the teradataml library. The full list of registered functions is maintained in tools/constants.py. Each entry maps a teradataml function name to a one-line summary used as the MCP tool description.

Tools are only registered when a database connection is available and the function exists in the connected system's teradataml version. If a function is unavailable, it is skipped with a warning.

ToolDescription
tdml_ANOVAPerforms one-way Analysis of Variance (ANOVA) on a dataset with two or more groups.
tdml_AprioriFinds association patterns and calculates statistical metrics to understand the influence of item sets on each other.
tdml_BincodeFitComputes bin boundaries for numeric columns to be applied by BincodeTransform().
tdml_BincodeTransformConverts continuous numeric data to categorical data using bin boundaries from BincodeFit() output.
tdml_CFilterCalculates statistical measures of how likely each pair of items is to be purchased together.
tdml_CategoricalSummaryDisplays distinct values and their counts for each specified input column.
tdml_ChiSqPerforms Pearson's chi-squared test for independence between two categorical variables.
tdml_ClassificationEvaluatorEvaluates a classification model by computing confusion matrix metrics such as accuracy, precision, recall, and F1.
tdml_ColumnSummaryProvides a quick overview of column datatypes and a summary of NULL and non-NULL counts for a given table.
tdml_ColumnTransformerApplies multiple transformations to input data columns in a single operation using Fit analytic function outputs.
tdml_ConvertToConverts specified input columns to specified data types.
tdml_DecisionForestTrains an ensemble decision forest model for classification and regression predictive modeling.
tdml_FTestPerforms an F-test where the test statistic follows an F-distribution under the null hypothesis.
tdml_FillRowIdAdds a column of unique row identifiers to the input table.
tdml_FitDetermines whether specified numeric transformations can be applied to target columns and outputs parameters for Transform().
tdml_GLMTrains a Generalized Linear Model (GLM) for regression and classification.
tdml_GLMPerSegmentTrains a separate GLM model for each segment of the input data.
tdml_GetFutileColumnsReturns names of columns that are futile โ€” all values unique, all identical, or distinct ratio exceeds a threshold.
tdml_GetRowsWithMissingValuesReturns rows that contain NULL values in any of the specified input columns.
tdml_GetRowsWithoutMissingValuesReturns rows that have non-NULL values in all of the specified input columns.
tdml_HistogramCalculates frequency distribution of a dataset using Sturges, Scott, variable-width, or equal-width binning methods.
tdml_KMeansGroups observations into k clusters where each point belongs to the cluster with the nearest centroid.
tdml_KMeansPredictAssigns input data points to cluster centroids produced by KMeans().
tdml_KNNClassifies data points based on proximity to training data points with known categories.
tdml_MovingAverageComputes moving average values in a series using a specified moving average type.
tdml_NERExtractorPerforms Named Entity Recognition (NER) on input text using dictionary words or regular expression patterns.
tdml_NGramSplitterTokenizes an input stream of text and outputs n-grams based on specified delimiter and reset parameters.
tdml_NPathScans a set of rows looking for user-specified sequential patterns and returns rows that match.
tdml_NaiveBayesTextClassifierPredictPredicts text categories using a model generated by NaiveBayesTextClassifierTrainer().
tdml_NaiveBayesTextClassifierTrainerCalculates conditional probabilities and prior probabilities for token-category pairs for text classification.
tdml_NonLinearCombineFitComputes parameters for a non-linear combination of existing features for use by NonLinearCombineTransform().
tdml_NonLinearCombineTransformGenerates a new feature by applying a non-linear combination formula using NonLinearCombineFit() output.
tdml_NumApplyApplies a predefined numeric operation to specified input columns.
tdml_OneClassSVMTrains a linear one-class SVM model to identify outliers or novelty in a dataset.
tdml_OneClassSVMPredictPredicts whether input data points are outliers using a model generated by OneClassSVM().
tdml_OneHotEncodingFitIdentifies categorical values to be encoded and outputs parameters for OneHotEncodingTransform().
tdml_OneHotEncodingTransformEncodes categorical columns as one-hot numeric vectors using OneHotEncodingFit() output.
tdml_OrdinalEncodingFitIdentifies distinct categorical values and generates ordinal mappings for use with OrdinalEncodingTransform().
tdml_OrdinalEncodingTransformMaps categorical values to ordinal integers using OrdinalEncodingFit() output.
tdml_OutlierFilterFitCalculates percentile bounds and median for target columns for use by OutlierFilterTransform().
tdml_OutlierFilterTransformFilters rows containing outlier values using bounds from OutlierFilterFit() output.
tdml_PackPacks data from multiple input columns into a single column.
tdml_PivotingPivots data from sparse format to dense format (rows to columns).
tdml_PolynomialFeaturesFitComputes polynomial combination parameters for existing features for use by PolynomialFeaturesTransform().
tdml_PolynomialFeaturesTransformGenerates polynomial feature combinations from existing features using PolynomialFeaturesFit() output.
tdml_QQNormDetermines whether values in input columns follow a normal distribution.
tdml_ROCComputes true positive rate, false positive rate, AUC, and Gini coefficient for a binary classification model.
tdml_RandomProjectionFitGenerates a random projection matrix for use by RandomProjectionTransform().
tdml_RandomProjectionMinComponentsCalculates the minimum number of components required for random projection given an epsilon distortion value.
tdml_RandomProjectionTransformReduces high-dimensional input data to a lower-dimensional space using RandomProjectionFit() output.
tdml_RegressionEvaluatorComputes metrics to evaluate regression model predictions including RMSE, MAE, and R-squared.
tdml_RoundColumnsRounds values in specified input columns to a specified number of decimal places.
tdml_RowNormalizeFitComputes row-wise normalization parameters for specified columns for use by RowNormalizeTransform().
tdml_RowNormalizeTransformNormalizes input columns row-wise using RowNormalizeFit() output.
tdml_SMOTEGenerates synthetic minority class samples using SMOTE, ADASYN, Borderline-2, or SMOTE-NC algorithms.
tdml_SVMTrains a linear Support Vector Machine (SVM) for classification and regression.
tdml_SVMPredictPredicts target values or class labels on new data using a model generated by SVM().
tdml_ScaleFitComputes scaling statistics for specified columns for use by ScaleTransform().
tdml_ScaleTransformScales specified columns using statistics from ScaleFit() output.
tdml_SessionizeMaps each click event in a user session to a unique session identifier.
tdml_SentimentExtractorExtracts the sentiment (positive, negative, or neutral) of each input document or sentence.
tdml_ShapComputes Shapley values to explain individual predictions (feature contributions) for a machine learning model.
tdml_SilhouetteMeasures the consistency of cluster assignments by computing silhouette scores for each data point.
tdml_SimpleImputeFitComputes imputation values (mean, median, or mode) for missing values in the input data.
tdml_SimpleImputeTransformSubstitutes missing values in the input data using imputation values from SimpleImputeFit() output.
tdml_StrApplyApplies a predefined string operation to specified input columns.
tdml_StringSimilarityCalculates similarity between two strings using Jaro, Jaro-Winkler, N-Gram, or Levenshtein distance.
tdml_TFIDFComputes Term Frequency (TF), Inverse Document Frequency (IDF), and TF-IDF scores for each term in a document set.
tdml_TDDecisionForestPredictPredicts target values or class labels using a DecisionForest() model.
tdml_TDGLMPredictPredicts target values or class labels for test data using a GLM() model.
tdml_TDNaiveBayesPredictPredicts classification labels using a model generated by NaiveBayes().
tdml_TargetEncodingFitComputes target encoding values (expected value per category) for categorical columns.
tdml_TargetEncodingTransformEncodes categorical columns using encoding values from TargetEncodingFit() output.
tdml_TextMorphGenerates morphological variants (morphs) of words in the input dataset.
tdml_TextParserTokenizes text, removes punctuation, converts to lowercase, removes stopwords, and applies stemming or lemmatization.
tdml_TrainTestSplitSplits input data into training and test sets to simulate model performance on new data.
tdml_TransformApplies numeric transformations to input columns using parameters from Fit() output.
tdml_UnivariateStatisticsDisplays descriptive statistics (mean, min, max, stddev, etc.) for each specified numeric input column.
tdml_UnpackUnpacks data from a single packed column into multiple separate columns.
tdml_UnpivotingUnpivots data from dense format to sparse format (columns to rows).
tdml_VectorDistanceComputes distances between target vectors and reference vectors.
tdml_WhichMaxReturns all rows that contain the maximum value in a specified input column.
tdml_WhichMinReturns all rows that contain the minimum value in a specified input column.
tdml_WordEmbeddingsProduces embedding vectors for text and computes similarity between texts.
tdml_XGBoostTrains an XGBoost (eXtreme Gradient Boosting) model for classification or regression.
tdml_XGBoostPredictPredicts target values or class labels using a model generated by XGBoost().
tdml_ZTestTests the equality of two means under the assumption that the population variances are known.

To add a new analytic function, see How to Add a New Function.


Return to Main README