Index | index by Group | index by Distribution | index by Vendor | index by creation date | index by Name | Mirrors | Help | Search |
Name: python313-tokenizers | Distribution: openSUSE Tumbleweed |
Version: 0.21.0 | Vendor: openSUSE |
Release: 4.1 | Build date: Wed Dec 18 15:20:07 2024 |
Group: Unspecified | Build host: reproducible |
Size: 6247065 | Source RPM: python-tokenizers-0.21.0-4.1.src.rpm |
Packager: https://bugs.opensuse.org | |
Url: https://github.com/huggingface/tokenizers | |
Summary: Provides an implementation of today's most used tokenizers |
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. * Train new vocabularies and tokenize, using today's most used tokenizers. * Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU. * Easy to use, but also extremely versatile. * Designed for research and production. * Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token. * Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.
Apache-2.0
* Wed Dec 18 2024 Soc Virnyl Estela <uncomfyhalomacro@opensuse.org> - Update to version 0.21.0: * More cache options. * Disable caching for long strings. * Testing ABI3 wheels to reduce number of wheels * Adding an API for decode streaming. * Decode stream python * Fix encode_batch and encode_batch_fast to accept ndarrays again * Thu Nov 07 2024 Soc Virnyl Estela <uncomfyhalomacro@opensuse.org> - Select only rust tier 1 arches. - Update registry.tar.zst dependencies - Update version to 0.20.3: * fix pylist * [MINOR:TYP] Fix docstrings - Updates from 0.20.2: * Bump cookie and express in /tokenizers/examples/unstable_wasm/www * Fix off-by-one error in tokenizer::normalizer::Range::len * Arg name correction: auth_token -> token * Unsound call of set_var * Add safety comments * PyO3 0.22 - Updates from 0.20.1: * Update README.md * fix benchmark file link * [ignore_merges] Fix offsets * Bump body-parser and express in /tokenizers/examples/unstable_wasm/www * Bump serve-static and express in /tokenizers/examples/unstable_wasm/www * Bump send and express in /tokenizers/examples/unstable_wasm/www * Bump webpack from 5.76.0 to 5.95.0 in /tokenizers/examples/unstable_wasm/www * Fix documentation build * style: simplify string formatting for readability * Sun Nov 03 2024 Soc Virnyl Estela <uncomfyhalomacro@opensuse.org> - Experiment with cargo vendor home registry. See documentation: https://github.com/openSUSE-Rust/obs-service-cargo/blob/master/README.md#cargo-vendor-home-registry * Mon Sep 23 2024 Simon Lees <sflees@suse.de> - Don't use macros for Requires * Fri Aug 30 2024 Simon Lees <sflees@suse.de> - Update package name back to "huggingface-hub" to match pypi * Tue Aug 27 2024 Guang Yee <gyee@suse.com> - Update package name "huggingface-hub" to "huggingface_hub" * Tue Aug 20 2024 Simon Lees <sflees@suse.de> - Fix testsuite on 15.6 * Sun Aug 18 2024 Soc Virnyl Estela <obs@uncomfyhalomacro.pl> - Replace vendor tarball to zstd compressed vendor tarball - Force gcc version on leap. Thanks @marv7000 for your zed.spec - Use `CARGO_*` environmental variables to force generate full debuginfo and avoid stripping. - Enable cargo test in %check. - Update to version 0.20.0: * remove enforcement of non special when adding tokens * [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder * Make USED_PARALLELISM atomic * Fixing for clippy 1.78 * feat(ci): add trufflehog secrets detection * Switch from cached_download to hf_hub_download in tests * Fix "dictionnary" typo * make sure we don't warn on empty tokens * Enable dropout = 0.0 as an equivalent to none in BPE * Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) … * Add bytelevel normalizer to fix decode when adding tokens to BPE * Fix clippy + feature test management. * Bump spm_precompiled to 0.1.3 * Add benchmark vs tiktoken * Fixing the benchmark. * Tiny improvement * Enable fancy regex * Fixing release CI strict (taken from safetensors). * Adding some serialization testing around the wrapper. * Add-legacy-tests * Adding a few tests for decoder deserialization. * Better serialization error * Add test normalizers * Improve decoder deserialization * Using serde (serde_pyo3) to get str and repr easily. * Merges cannot handle tokens containing spaces. * Fix doc about split * Support None to reset pre_tokenizers and normalizers, and index sequences * Fix strip python type * Tests + Deserialization improvement for normalizers. * add deserialize for pre tokenizers * Perf improvement 16% by removing offsets. * Wed Jul 03 2024 Christian Goll <cgoll@suse.com> - initial commit on rust based python-tokenizers
/usr/lib64/python3.13/site-packages/tokenizers /usr/lib64/python3.13/site-packages/tokenizers-0.21.0.dist-info /usr/lib64/python3.13/site-packages/tokenizers-0.21.0.dist-info/INSTALLER /usr/lib64/python3.13/site-packages/tokenizers-0.21.0.dist-info/METADATA /usr/lib64/python3.13/site-packages/tokenizers-0.21.0.dist-info/RECORD /usr/lib64/python3.13/site-packages/tokenizers-0.21.0.dist-info/REQUESTED /usr/lib64/python3.13/site-packages/tokenizers-0.21.0.dist-info/WHEEL /usr/lib64/python3.13/site-packages/tokenizers/__init__.py /usr/lib64/python3.13/site-packages/tokenizers/__init__.pyi /usr/lib64/python3.13/site-packages/tokenizers/__pycache__ /usr/lib64/python3.13/site-packages/tokenizers/__pycache__/__init__.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/__pycache__/__init__.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/decoders /usr/lib64/python3.13/site-packages/tokenizers/decoders/__init__.py /usr/lib64/python3.13/site-packages/tokenizers/decoders/__init__.pyi /usr/lib64/python3.13/site-packages/tokenizers/decoders/__pycache__ /usr/lib64/python3.13/site-packages/tokenizers/decoders/__pycache__/__init__.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/decoders/__pycache__/__init__.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations /usr/lib64/python3.13/site-packages/tokenizers/implementations/__init__.py /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__ /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/__init__.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/__init__.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/base_tokenizer.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/base_tokenizer.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/bert_wordpiece.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/bert_wordpiece.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/byte_level_bpe.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/byte_level_bpe.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/char_level_bpe.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/char_level_bpe.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/sentencepiece_bpe.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/sentencepiece_bpe.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/sentencepiece_unigram.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/__pycache__/sentencepiece_unigram.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/implementations/base_tokenizer.py /usr/lib64/python3.13/site-packages/tokenizers/implementations/bert_wordpiece.py /usr/lib64/python3.13/site-packages/tokenizers/implementations/byte_level_bpe.py /usr/lib64/python3.13/site-packages/tokenizers/implementations/char_level_bpe.py /usr/lib64/python3.13/site-packages/tokenizers/implementations/sentencepiece_bpe.py /usr/lib64/python3.13/site-packages/tokenizers/implementations/sentencepiece_unigram.py /usr/lib64/python3.13/site-packages/tokenizers/models /usr/lib64/python3.13/site-packages/tokenizers/models/__init__.py /usr/lib64/python3.13/site-packages/tokenizers/models/__init__.pyi /usr/lib64/python3.13/site-packages/tokenizers/models/__pycache__ /usr/lib64/python3.13/site-packages/tokenizers/models/__pycache__/__init__.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/models/__pycache__/__init__.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/normalizers /usr/lib64/python3.13/site-packages/tokenizers/normalizers/__init__.py /usr/lib64/python3.13/site-packages/tokenizers/normalizers/__init__.pyi /usr/lib64/python3.13/site-packages/tokenizers/normalizers/__pycache__ /usr/lib64/python3.13/site-packages/tokenizers/normalizers/__pycache__/__init__.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/normalizers/__pycache__/__init__.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/pre_tokenizers /usr/lib64/python3.13/site-packages/tokenizers/pre_tokenizers/__init__.py /usr/lib64/python3.13/site-packages/tokenizers/pre_tokenizers/__init__.pyi /usr/lib64/python3.13/site-packages/tokenizers/pre_tokenizers/__pycache__ /usr/lib64/python3.13/site-packages/tokenizers/pre_tokenizers/__pycache__/__init__.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/pre_tokenizers/__pycache__/__init__.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/processors /usr/lib64/python3.13/site-packages/tokenizers/processors/__init__.py /usr/lib64/python3.13/site-packages/tokenizers/processors/__init__.pyi /usr/lib64/python3.13/site-packages/tokenizers/processors/__pycache__ /usr/lib64/python3.13/site-packages/tokenizers/processors/__pycache__/__init__.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/processors/__pycache__/__init__.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/tokenizers.abi3.so /usr/lib64/python3.13/site-packages/tokenizers/tools /usr/lib64/python3.13/site-packages/tokenizers/tools/__init__.py /usr/lib64/python3.13/site-packages/tokenizers/tools/__pycache__ /usr/lib64/python3.13/site-packages/tokenizers/tools/__pycache__/__init__.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/tools/__pycache__/__init__.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/tools/__pycache__/visualizer.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/tools/__pycache__/visualizer.cpython-313.pyc /usr/lib64/python3.13/site-packages/tokenizers/tools/visualizer-styles.css /usr/lib64/python3.13/site-packages/tokenizers/tools/visualizer.py /usr/lib64/python3.13/site-packages/tokenizers/trainers /usr/lib64/python3.13/site-packages/tokenizers/trainers/__init__.py /usr/lib64/python3.13/site-packages/tokenizers/trainers/__init__.pyi /usr/lib64/python3.13/site-packages/tokenizers/trainers/__pycache__ /usr/lib64/python3.13/site-packages/tokenizers/trainers/__pycache__/__init__.cpython-313.opt-1.pyc /usr/lib64/python3.13/site-packages/tokenizers/trainers/__pycache__/__init__.cpython-313.pyc /usr/share/doc/packages/python313-tokenizers /usr/share/doc/packages/python313-tokenizers/README.md /usr/share/licenses/python313-tokenizers /usr/share/licenses/python313-tokenizers/LICENSE
Generated by rpm2html 1.8.1
Fabrice Bellet, Fri Jan 10 00:13:42 2025