KerasHub Tokenizers
- 원본 링크 : https://keras.io/api/keras_hub/tokenizers/
- 최종 확인 : 2024-11-26
Tokenizers convert raw string input into integer input suitable for a Keras Embedding
layer.
They can also convert back from predicted integer sequences to raw string output.
All tokenizers subclass keras_hub.tokenizers.Tokenizer
, which in turn
subclasses keras.layers.Layer
. Tokenizers should generally be applied inside a
tf.data.Dataset.map
for training, and can be included inside a keras.Model
for inference.
Tokenizer
WordPieceTokenizer
- WordPieceTokenizer class
- tokenize method
- detokenize method
- get_vocabulary method
- vocabulary_size method
- token_to_id method
- id_to_token method
SentencePieceTokenizer
- SentencePieceTokenizer class
- tokenize method
- detokenize method
- get_vocabulary method
- vocabulary_size method
- token_to_id method
- id_to_token method
BytePairTokenizer
- BytePairTokenizer class
- tokenize method
- detokenize method
- get_vocabulary method
- vocabulary_size method
- token_to_id method
- id_to_token method
ByteTokenizer
- ByteTokenizer class
- tokenize method
- detokenize method
- get_vocabulary method
- vocabulary_size method
- token_to_id method
- id_to_token method
UnicodeCodepointTokenizer
- UnicodeCodepointTokenizer class
- tokenize method
- detokenize method
- get_vocabulary method
- vocabulary_size method
- token_to_id method
- id_to_token method