Introduction to Develop Tensorflow Estimator Model with DLRover trainer
August 30, 2023 · View on GitHub
The document describes how to develop tensorflow estimator model with DLRover trainer.
Develop model with tensorflow estimator
Tensorflow Estimator encapsulate Training, Evaluation, Prediction and Export for serving actions. In DLrover, both custome estimators and pre-made estimators are supported.
A DLrover program with Estimator typically consists of the following four steps:
Define the features and label column in the conf
Each Column identifies a feature name, its type and whether it is label.
The following snippet defines two feature columns in the
example.
train_set = {
"reader": FileReader("test.data"),
"columns": (
Column.create( # type: ignore
name="x",
dtype="float32",
is_label=False,
),
Column.create( # type: ignore
name="y",
dtype="float32",
is_label=True,
),
),
}
The first feature is x and its type is float32.
The second feature is y and is label. Its type is float32.
dlrover.trainer helps build input_fn for train set and test set with those columns.
Add Custom Reader for TF Estimator in DLrover
In some case, the reader provided by DLrover trainer doesn't satisfy user's need. User need to develop custom reader and set it in the conf.
Add Custom Elastic Reader for TF Estimator in DLrover
One necessary arguments in the __init__ method is path.
The key funcion is read_data_by_index_range and count_data. count_data is used for
konwing how many dataset are there before training. During training, read_data_by_index_range
will be called to get train data.
from dlrover.trainer.tensorflow.reader.base_reader import ElasticReader
class FakeReader(ElasticReader):
def __init__(self, path=None):
self.count = 1
super().__init__(path=path)
def count_data(self):
self._data_nums = 10
def read_data_by_index_range(self, start_index, end_index):
data = []
for i in range(start_index, end_index):
x = np.random.randint(1, 1000)
y = 2 * x + np.random.randint(1, 5)
d = "{},{}".format(x, y)
data.append(d)
return data
you need to initial you reader and set it in the conf. Here is an example
eval_set = {"reader": FakeReader("./eval.data"), "columns": train_set["columns"]}
Add Custom Non Elastic Reader for TF Estimator in DLrover
The key funcion is iterator. During training, iterator will be called to get train data.
class Reader:
def __init__(
self,
path=None,
batch_size=None
):
pass
def get_data(self):
# you custom code
while True:
yield "1,1"
def iterator(self):
while True:
for d in self.get_data():
yield d
you need to initial you reader and set it in the conf. Here is an example
eval_set = {"reader": Reader("./eval.data"), "columns": train_set["columns"]}
Instantiate the Estimator
The heart of every Estimator—whether pre-made or custom—is its model function, model_fn,
which is a method that builds graphs for training, evaluation, and prediction.
In dlrover.trainer, we assume the Estimator is a custom estimator.
And pre-made estimators should be converted to custom estimator with little overhead.
Train a model from custome estimators
When relying on a custom Estimator, you must write the model function yourself. Refer the tutorial.
Train a model from pre-made estimators
You can convert an existing pre-made estimators by writing an Adaptor to fit with dlrover.trainer.
As we can see, the model_fn is the key part of estimator.
When training and evaluating, the model_fn is called with different mode and the graph is returned.
Thus, you can define a custom estimator in which model_fn function acts as a wrapper for pre-made estimator model_fn.
In the example of DeepFMAdaptor,
DeepFMEstimator in deepctr.estimator.models
is a pre-made estimator.
from deepctr.estimator.models.deepfm import DeepFMEstimator
class DeepFMAdaptor(tf.estimator.Estimator):
"""Adaptor"""
def model_fn(self, features, labels, mode, params):
'''
featurs: type dict, key is the feature name and value is tensor.
labels: type tensor, corresponding to the colum which `is_label` equals True.
'''
x = features["x"]
x_buckets = feature_column.bucketized_column(x, boundaries=[1, 3, 5])
linear_feature_columns = [x_buckets]
dnn_feature_columns = [x]
self.estimator = DeepFMEstimator(
linear_feature_columns,
dnn_feature_columns,
task=params["task"],
)
return self.estimator._model_fn(
features, labels, mode, self.run_config
)
Saving object-based checkpoints with Estimator
Estimators by default save checkpoints with variable names rather than the
object graph described in the Checkpoint guide.
The checkpoint hook is added by dlrover.trainer.estimator_executor.
SavedModels from Estimators
Estimators export SavedModels through tf.Estimator.export_saved_model.
The exporter hook is added by dlrover.trainer.estimator_executor.
When the job is launched, dlrover.trainer.estimator_executor parses the conf and builds input_fn,
estimator and related hooks.
Submit a Job to Train the Estimator model
Build an Image with Models
You can install dlrover in your image.
pip install dlrover[tensorflow] - U
Or you also can build your image from the dlrover base image.
FROM registry.cn-hangzhou.aliyuncs.com/intell-ai/dlrover:deeprec_criteo_v1
COPY model_zoo /home/model_zoo
docker build -t ${IMAGE_NAME} -f ${DockerFile} .
docker push ${IMAGE_NAME}
Set the Command to Train the Model
We need to set the command of ps and worker to train the model like the DeepCTR example
command:
- /bin/bash
- -c
- " cd ./examples/tensorflow/criteo_deeprec \
&& python -m dlrover.trainer.entry.local_entry \
--platform=Kubernetes --conf=train_conf.TrainConf \
--enable_auto_scaling=True"
Then, we can submit the job by kubectl.
kubectl -n dlrover apply -f ${JOB_YAML_FILE}