Index | index by Group | index by Distribution | index by Vendor | index by creation date | index by Name | Mirrors | Help | Search |
Name: python312-tokenizers | Distribution: openSUSE Tumbleweed |
Version: 0.20.0 | Vendor: openSUSE |
Release: 1.1 | Build date: Mon Sep 23 09:19:52 2024 |
Group: Unspecified | Build host: reproducible |
Size: 7187943 | Source RPM: python-tokenizers-0.20.0-1.1.src.rpm |
Packager: http://bugs.opensuse.org | |
Url: https://github.com/huggingface/tokenizers | |
Summary: Provides an implementation of today's most used tokenizers |
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. * Train new vocabularies and tokenize, using today's most used tokenizers. * Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. * Easy to use, but also extremely versatile. * Designed for research and production. * Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token. * Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.
Apache-2.0
* Mon Sep 23 2024 Simon Lees <sflees@suse.de> - Don't use macros for Requires * Fri Aug 30 2024 Simon Lees <sflees@suse.de> - Update package name back to "huggingface-hub" to match pypi * Tue Aug 27 2024 Guang Yee <gyee@suse.com> - Update package name "huggingface-hub" to "huggingface_hub" * Tue Aug 20 2024 Simon Lees <sflees@suse.de> - Fix testsuite on 15.6 * Sun Aug 18 2024 Soc Virnyl Estela <obs@uncomfyhalomacro.pl> - Replace vendor tarball to zstd compressed vendor tarball - Force gcc version on leap. Thanks @marv7000 for your zed.spec - Use `CARGO_*` environmental variables to force generate full debuginfo and avoid stripping. - Enable cargo test in %check. - Update to version 0.20.0: * remove enforcement of non special when adding tokens * [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder * Make USED_PARALLELISM atomic * Fixing for clippy 1.78 * feat(ci): add trufflehog secrets detection * Switch from cached_download to hf_hub_download in tests * Fix "dictionnary" typo * make sure we don't warn on empty tokens * Enable dropout = 0.0 as an equivalent to none in BPE * Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) … * Add bytelevel normalizer to fix decode when adding tokens to BPE * Fix clippy + feature test management. * Bump spm_precompiled to 0.1.3 * Add benchmark vs tiktoken * Fixing the benchmark. * Tiny improvement * Enable fancy regex * Fixing release CI strict (taken from safetensors). * Adding some serialization testing around the wrapper. * Add-legacy-tests * Adding a few tests for decoder deserialization. * Better serialization error * Add test normalizers * Improve decoder deserialization * Using serde (serde_pyo3) to get str and repr easily. * Merges cannot handle tokens containing spaces. * Fix doc about split * Support None to reset pre_tokenizers and normalizers, and index sequences * Fix strip python type * Tests + Deserialization improvement for normalizers. * add deserialize for pre tokenizers * Perf improvement 16% by removing offsets. * Wed Jul 03 2024 Christian Goll <cgoll@suse.com> - initial commit on rust based python-tokenizers
/usr/lib/python3.12/site-packages/tokenizers /usr/lib/python3.12/site-packages/tokenizers-0.20.0.dist-info /usr/lib/python3.12/site-packages/tokenizers-0.20.0.dist-info/INSTALLER /usr/lib/python3.12/site-packages/tokenizers-0.20.0.dist-info/METADATA /usr/lib/python3.12/site-packages/tokenizers-0.20.0.dist-info/RECORD /usr/lib/python3.12/site-packages/tokenizers-0.20.0.dist-info/REQUESTED /usr/lib/python3.12/site-packages/tokenizers-0.20.0.dist-info/WHEEL /usr/lib/python3.12/site-packages/tokenizers/__init__.py /usr/lib/python3.12/site-packages/tokenizers/__init__.pyi /usr/lib/python3.12/site-packages/tokenizers/__pycache__ /usr/lib/python3.12/site-packages/tokenizers/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/__pycache__/__init__.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/decoders /usr/lib/python3.12/site-packages/tokenizers/decoders/__init__.py /usr/lib/python3.12/site-packages/tokenizers/decoders/__init__.pyi /usr/lib/python3.12/site-packages/tokenizers/decoders/__pycache__ /usr/lib/python3.12/site-packages/tokenizers/decoders/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/decoders/__pycache__/__init__.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations /usr/lib/python3.12/site-packages/tokenizers/implementations/__init__.py /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__ /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/__init__.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/base_tokenizer.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/base_tokenizer.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/bert_wordpiece.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/bert_wordpiece.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/byte_level_bpe.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/byte_level_bpe.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/char_level_bpe.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/char_level_bpe.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/sentencepiece_bpe.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/sentencepiece_bpe.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/sentencepiece_unigram.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/__pycache__/sentencepiece_unigram.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/implementations/base_tokenizer.py /usr/lib/python3.12/site-packages/tokenizers/implementations/bert_wordpiece.py /usr/lib/python3.12/site-packages/tokenizers/implementations/byte_level_bpe.py /usr/lib/python3.12/site-packages/tokenizers/implementations/char_level_bpe.py /usr/lib/python3.12/site-packages/tokenizers/implementations/sentencepiece_bpe.py /usr/lib/python3.12/site-packages/tokenizers/implementations/sentencepiece_unigram.py /usr/lib/python3.12/site-packages/tokenizers/models /usr/lib/python3.12/site-packages/tokenizers/models/__init__.py /usr/lib/python3.12/site-packages/tokenizers/models/__init__.pyi /usr/lib/python3.12/site-packages/tokenizers/models/__pycache__ /usr/lib/python3.12/site-packages/tokenizers/models/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/models/__pycache__/__init__.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/normalizers /usr/lib/python3.12/site-packages/tokenizers/normalizers/__init__.py /usr/lib/python3.12/site-packages/tokenizers/normalizers/__init__.pyi /usr/lib/python3.12/site-packages/tokenizers/normalizers/__pycache__ /usr/lib/python3.12/site-packages/tokenizers/normalizers/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/normalizers/__pycache__/__init__.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/pre_tokenizers /usr/lib/python3.12/site-packages/tokenizers/pre_tokenizers/__init__.py /usr/lib/python3.12/site-packages/tokenizers/pre_tokenizers/__init__.pyi /usr/lib/python3.12/site-packages/tokenizers/pre_tokenizers/__pycache__ /usr/lib/python3.12/site-packages/tokenizers/pre_tokenizers/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/pre_tokenizers/__pycache__/__init__.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/processors /usr/lib/python3.12/site-packages/tokenizers/processors/__init__.py /usr/lib/python3.12/site-packages/tokenizers/processors/__init__.pyi /usr/lib/python3.12/site-packages/tokenizers/processors/__pycache__ /usr/lib/python3.12/site-packages/tokenizers/processors/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/processors/__pycache__/__init__.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/tokenizers.cpython-312-i386-linux-gnu.so /usr/lib/python3.12/site-packages/tokenizers/tools /usr/lib/python3.12/site-packages/tokenizers/tools/__init__.py /usr/lib/python3.12/site-packages/tokenizers/tools/__pycache__ /usr/lib/python3.12/site-packages/tokenizers/tools/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/tools/__pycache__/__init__.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/tools/__pycache__/visualizer.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/tools/__pycache__/visualizer.cpython-312.pyc /usr/lib/python3.12/site-packages/tokenizers/tools/visualizer-styles.css /usr/lib/python3.12/site-packages/tokenizers/tools/visualizer.py /usr/lib/python3.12/site-packages/tokenizers/trainers /usr/lib/python3.12/site-packages/tokenizers/trainers/__init__.py /usr/lib/python3.12/site-packages/tokenizers/trainers/__init__.pyi /usr/lib/python3.12/site-packages/tokenizers/trainers/__pycache__ /usr/lib/python3.12/site-packages/tokenizers/trainers/__pycache__/__init__.cpython-312.opt-1.pyc /usr/lib/python3.12/site-packages/tokenizers/trainers/__pycache__/__init__.cpython-312.pyc /usr/share/doc/packages/python312-tokenizers /usr/share/doc/packages/python312-tokenizers/README.md /usr/share/licenses/python312-tokenizers /usr/share/licenses/python312-tokenizers/LICENSE
Generated by rpm2html 1.8.1
Fabrice Bellet, Tue Nov 12 00:56:02 2024