T5Preprocessor layer
- Original Link : https://keras.io/api/keras_hub/models/t5/t5_preprocessor/
- Last Checked at : 2024-11-26
T5Preprocessor class
keras_hub.models.T5Preprocessor(
tokenizer, sequence_length=256, add_start_token=False, add_end_token=True, **kwargs
)Base class for preprocessing layers.
A Preprocessor layer provides a complete preprocessing setup for a
given task. It handles tokenization, audio/image conversion, and
any other necessary preprocessing steps.
This class can be subclassed similar to any keras.layers.Layer, by
defining build(), call() and get_config() methods. All subclasses
should set the tokenizer or audio_converter or image_converter
properties during construction as needed.
from_preset method
T5Preprocessor.from_preset(preset, config_file="preprocessor.json", **kwargs)Instantiate a keras_hub.models.Preprocessor from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as
one of:
- a built-in preset identifier like
'bert_base_en' - a Kaggle Models handle like
'kaggle://user/bert/keras/bert_base_en' - a Hugging Face handle like
'hf://user/bert_base_en' - a path to a local preset directory like
'./bert_base_en'
For any Preprocessor subclass, you can run cls.presets.keys() to
list all built-in presets available on the class.
As there are usually multiple preprocessing classes for a given model,
this method should be called on a specific subclass like
keras_hub.models.BertTextClassifierPreprocessor.from_preset().
Arguments
- preset: string. A built-in preset identifier, a Kaggle Models handle, a Hugging Face handle, or a path to a local directory.
Examples
# Load a preprocessor for Gemma generation.
preprocessor = keras_hub.models.GemmaCausalLMPreprocessor.from_preset(
"gemma_2b_en",
)
# Load a preprocessor for Bert classification.
preprocessor = keras_hub.models.BertTextClassifierPreprocessor.from_preset(
"bert_base_en",
)tokenizer property
keras_hub.models.T5Preprocessor.tokenizerThe tokenizer used to tokenize strings.