Feature Store & Analytic Function Tools
May 14, 2026 ยท View on GitHub
Dependencies
Assumes Teradata >=17.20. Requires the fs optional extra (uv sync --extra fs), which installs teradataml.
Feature Store tools (fs_*)
reconnect_to_database- reestablishes a connection to the Teradata databasefs_setFeatureStoreConfig- sets the feature store config for the sessionfs_getFeatureStoreConfig- gets the feature store config for the sessionfs_isFeatureStorePresent- confirms that a feature store has been createdfs_featureStoreContent- returns a summary of the feature storefs_getDataDomains- lists the available domains in a feature storefs_getFeatures- gets the features within the feature storefs_getAvailableDatasets- lists the available datasets in the feature storefs_getFeatureDataModel- returns the schema of the feature storefs_getAvailableEntities- returns the entities in a domainfs_createDataset- creates a data set from the feature store
teradataml Analytic Function tools (tdml_*)
These tools are registered dynamically at server startup from the teradataml library. The full list of registered functions is maintained in tools/constants.py. Each entry maps a teradataml function name to a one-line summary used as the MCP tool description.
Tools are only registered when a database connection is available and the function exists in the connected system's teradataml version. If a function is unavailable, it is skipped with a warning.
| Tool | Description |
|---|---|
tdml_ANOVA | Performs one-way Analysis of Variance (ANOVA) on a dataset with two or more groups. |
tdml_Apriori | Finds association patterns and calculates statistical metrics to understand the influence of item sets on each other. |
tdml_BincodeFit | Computes bin boundaries for numeric columns to be applied by BincodeTransform(). |
tdml_BincodeTransform | Converts continuous numeric data to categorical data using bin boundaries from BincodeFit() output. |
tdml_CFilter | Calculates statistical measures of how likely each pair of items is to be purchased together. |
tdml_CategoricalSummary | Displays distinct values and their counts for each specified input column. |
tdml_ChiSq | Performs Pearson's chi-squared test for independence between two categorical variables. |
tdml_ClassificationEvaluator | Evaluates a classification model by computing confusion matrix metrics such as accuracy, precision, recall, and F1. |
tdml_ColumnSummary | Provides a quick overview of column datatypes and a summary of NULL and non-NULL counts for a given table. |
tdml_ColumnTransformer | Applies multiple transformations to input data columns in a single operation using Fit analytic function outputs. |
tdml_ConvertTo | Converts specified input columns to specified data types. |
tdml_DecisionForest | Trains an ensemble decision forest model for classification and regression predictive modeling. |
tdml_FTest | Performs an F-test where the test statistic follows an F-distribution under the null hypothesis. |
tdml_FillRowId | Adds a column of unique row identifiers to the input table. |
tdml_Fit | Determines whether specified numeric transformations can be applied to target columns and outputs parameters for Transform(). |
tdml_GLM | Trains a Generalized Linear Model (GLM) for regression and classification. |
tdml_GLMPerSegment | Trains a separate GLM model for each segment of the input data. |
tdml_GetFutileColumns | Returns names of columns that are futile โ all values unique, all identical, or distinct ratio exceeds a threshold. |
tdml_GetRowsWithMissingValues | Returns rows that contain NULL values in any of the specified input columns. |
tdml_GetRowsWithoutMissingValues | Returns rows that have non-NULL values in all of the specified input columns. |
tdml_Histogram | Calculates frequency distribution of a dataset using Sturges, Scott, variable-width, or equal-width binning methods. |
tdml_KMeans | Groups observations into k clusters where each point belongs to the cluster with the nearest centroid. |
tdml_KMeansPredict | Assigns input data points to cluster centroids produced by KMeans(). |
tdml_KNN | Classifies data points based on proximity to training data points with known categories. |
tdml_MovingAverage | Computes moving average values in a series using a specified moving average type. |
tdml_NERExtractor | Performs Named Entity Recognition (NER) on input text using dictionary words or regular expression patterns. |
tdml_NGramSplitter | Tokenizes an input stream of text and outputs n-grams based on specified delimiter and reset parameters. |
tdml_NPath | Scans a set of rows looking for user-specified sequential patterns and returns rows that match. |
tdml_NaiveBayesTextClassifierPredict | Predicts text categories using a model generated by NaiveBayesTextClassifierTrainer(). |
tdml_NaiveBayesTextClassifierTrainer | Calculates conditional probabilities and prior probabilities for token-category pairs for text classification. |
tdml_NonLinearCombineFit | Computes parameters for a non-linear combination of existing features for use by NonLinearCombineTransform(). |
tdml_NonLinearCombineTransform | Generates a new feature by applying a non-linear combination formula using NonLinearCombineFit() output. |
tdml_NumApply | Applies a predefined numeric operation to specified input columns. |
tdml_OneClassSVM | Trains a linear one-class SVM model to identify outliers or novelty in a dataset. |
tdml_OneClassSVMPredict | Predicts whether input data points are outliers using a model generated by OneClassSVM(). |
tdml_OneHotEncodingFit | Identifies categorical values to be encoded and outputs parameters for OneHotEncodingTransform(). |
tdml_OneHotEncodingTransform | Encodes categorical columns as one-hot numeric vectors using OneHotEncodingFit() output. |
tdml_OrdinalEncodingFit | Identifies distinct categorical values and generates ordinal mappings for use with OrdinalEncodingTransform(). |
tdml_OrdinalEncodingTransform | Maps categorical values to ordinal integers using OrdinalEncodingFit() output. |
tdml_OutlierFilterFit | Calculates percentile bounds and median for target columns for use by OutlierFilterTransform(). |
tdml_OutlierFilterTransform | Filters rows containing outlier values using bounds from OutlierFilterFit() output. |
tdml_Pack | Packs data from multiple input columns into a single column. |
tdml_Pivoting | Pivots data from sparse format to dense format (rows to columns). |
tdml_PolynomialFeaturesFit | Computes polynomial combination parameters for existing features for use by PolynomialFeaturesTransform(). |
tdml_PolynomialFeaturesTransform | Generates polynomial feature combinations from existing features using PolynomialFeaturesFit() output. |
tdml_QQNorm | Determines whether values in input columns follow a normal distribution. |
tdml_ROC | Computes true positive rate, false positive rate, AUC, and Gini coefficient for a binary classification model. |
tdml_RandomProjectionFit | Generates a random projection matrix for use by RandomProjectionTransform(). |
tdml_RandomProjectionMinComponents | Calculates the minimum number of components required for random projection given an epsilon distortion value. |
tdml_RandomProjectionTransform | Reduces high-dimensional input data to a lower-dimensional space using RandomProjectionFit() output. |
tdml_RegressionEvaluator | Computes metrics to evaluate regression model predictions including RMSE, MAE, and R-squared. |
tdml_RoundColumns | Rounds values in specified input columns to a specified number of decimal places. |
tdml_RowNormalizeFit | Computes row-wise normalization parameters for specified columns for use by RowNormalizeTransform(). |
tdml_RowNormalizeTransform | Normalizes input columns row-wise using RowNormalizeFit() output. |
tdml_SMOTE | Generates synthetic minority class samples using SMOTE, ADASYN, Borderline-2, or SMOTE-NC algorithms. |
tdml_SVM | Trains a linear Support Vector Machine (SVM) for classification and regression. |
tdml_SVMPredict | Predicts target values or class labels on new data using a model generated by SVM(). |
tdml_ScaleFit | Computes scaling statistics for specified columns for use by ScaleTransform(). |
tdml_ScaleTransform | Scales specified columns using statistics from ScaleFit() output. |
tdml_Sessionize | Maps each click event in a user session to a unique session identifier. |
tdml_SentimentExtractor | Extracts the sentiment (positive, negative, or neutral) of each input document or sentence. |
tdml_Shap | Computes Shapley values to explain individual predictions (feature contributions) for a machine learning model. |
tdml_Silhouette | Measures the consistency of cluster assignments by computing silhouette scores for each data point. |
tdml_SimpleImputeFit | Computes imputation values (mean, median, or mode) for missing values in the input data. |
tdml_SimpleImputeTransform | Substitutes missing values in the input data using imputation values from SimpleImputeFit() output. |
tdml_StrApply | Applies a predefined string operation to specified input columns. |
tdml_StringSimilarity | Calculates similarity between two strings using Jaro, Jaro-Winkler, N-Gram, or Levenshtein distance. |
tdml_TFIDF | Computes Term Frequency (TF), Inverse Document Frequency (IDF), and TF-IDF scores for each term in a document set. |
tdml_TDDecisionForestPredict | Predicts target values or class labels using a DecisionForest() model. |
tdml_TDGLMPredict | Predicts target values or class labels for test data using a GLM() model. |
tdml_TDNaiveBayesPredict | Predicts classification labels using a model generated by NaiveBayes(). |
tdml_TargetEncodingFit | Computes target encoding values (expected value per category) for categorical columns. |
tdml_TargetEncodingTransform | Encodes categorical columns using encoding values from TargetEncodingFit() output. |
tdml_TextMorph | Generates morphological variants (morphs) of words in the input dataset. |
tdml_TextParser | Tokenizes text, removes punctuation, converts to lowercase, removes stopwords, and applies stemming or lemmatization. |
tdml_TrainTestSplit | Splits input data into training and test sets to simulate model performance on new data. |
tdml_Transform | Applies numeric transformations to input columns using parameters from Fit() output. |
tdml_UnivariateStatistics | Displays descriptive statistics (mean, min, max, stddev, etc.) for each specified numeric input column. |
tdml_Unpack | Unpacks data from a single packed column into multiple separate columns. |
tdml_Unpivoting | Unpivots data from dense format to sparse format (columns to rows). |
tdml_VectorDistance | Computes distances between target vectors and reference vectors. |
tdml_WhichMax | Returns all rows that contain the maximum value in a specified input column. |
tdml_WhichMin | Returns all rows that contain the minimum value in a specified input column. |
tdml_WordEmbeddings | Produces embedding vectors for text and computes similarity between texts. |
tdml_XGBoost | Trains an XGBoost (eXtreme Gradient Boosting) model for classification or regression. |
tdml_XGBoostPredict | Predicts target values or class labels using a model generated by XGBoost(). |
tdml_ZTest | Tests the equality of two means under the assumption that the population variances are known. |
To add a new analytic function, see How to Add a New Function.