Name: libggml-cpu	Distribution: openSUSE Tumbleweed
Version: 4501	Vendor: openSUSE
Release: 1.1	Build date: Fri Jan 17 16:37:49 2025
Group: Unspecified	Build host: reproducible
Size: 248224	Source RPM: llamacpp-4501-1.1.src.rpm
Packager: https://bugs.opensuse.org
Url: https://github.com/ggerganov/llama.cpp
Summary: A tensor library for C++ (CPU backend)

A tensor library for C++. It was created originally to support llama.cpp
and WhisperCpp projects.

This package includes the CPU backend for ggml.

Provides

Requires

License

MIT

Changelog

* Fri Jan 17 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 4501:
    * Optimizations to Vulkan kernels
    * Add internlm3 support
    * Add `llama_model_load_from_splits`
    * ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot
    * cli : auto activate conversation mode if chat template is
      available (#11214)
    * common : support tag-based --hf-repo like on ollama
    * cli: reset color before exiting
* Sun Jan 12 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 4458
  - Add 0002-build-main-cli.patch to only build necessary binaries
  - Package convert_hf_to_gguf script
  - Package gguf.h header file
  - Remove llama-perplexity
  - Remove llama-test-backend-ops
  - Use pkg-config for OpenCL and Vulkan
  - Do not build tests
* Fri Jan 03 2025 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 4409
* Thu Dec 19 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Disable LTO, as it was causing some issues with dynamic loading
    of backends
  - Disable dynamic loading of backends for now
* Sat Dec 14 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 4326:
    * Introducing experimental OpenCL backend
    * Vulkan backend improvements and optimizations
    * Update documentation for server streaming mode
    * Improve -ctv -ctk CLI arguments
* Wed Dec 11 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 4304:
    * Load all backends from a user-provided search path at runtime
    * Vulkan backend improvements and optimizations
    * Server improvements and optimizations
* Sat Dec 07 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Split backends into different packages
  - Added llama-server llama-perplexity and llama-bench binaries
* Sat Dec 07 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 4284:
    * Various ops optimizations
    * Various server fixes
    * Vulkan backend improvements and optimizations
    * Automatic selection of best CPU backend
* Sat Nov 30 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Removed ggml-amx.so, as it is now included in the CPU backend
  - Update to version 4230:
    * ggml-cpu: replace AArch64 NEON assembly with intrinsics in
      ggml_gemv_q4_0_4x4_q8_0() (#10567)
    * readme : remove old badge
    * readme : refresh (#10587)
    * vulkan: Dynamic subgroup size support for Q6_K mat_vec (#10536)
    * ggml : move AMX to the CPU backend (#10570)
    * server : add more test cases (#10569)
    * imatrix : support combine-only (#10492)
    * cleanup UI link list (#10577)
    * ggml : fix I8MM Q4_1 scaling factor conversion (#10562)
    * ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (#10580)
    * sycl : offload of get_rows set to 0 (#10432)
* Fri Nov 29 2024 eyadlorenzo@gmail.com
  - Update to version 4219:
    * sycl : Reroute permuted mul_mats through oneMKL (#10408)
    * CANN: RoPE operator optimization (#10563)
    * vulkan: get the first command buffer submitted sooner (#10499)
    * llava: return false instead of exit (#10546)
    * ggml : remove redundant copyright notice + update authors
    * llama : add missing model types
    * server : (tests) don't use thread for capturing stdout/stderr,
      bump openai client library (#10568)
    * common: fix warning message when no GPU found (#10564)
    * docs: fix outdated usage of llama-simple (#10565)
    * ci : fix tag name in cuda and hip releases (#10566)
    * ggml : fix row condition for i8mm kernels (#10561)
    * cmake : fix ARM feature detection (#10543)
    * ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541)
    * kompute : improve backend to pass test_backend_ops (#10542)
    * CANN: Update cann.md to display correctly in CLion (#10538)
    * CANN: Fix SOC_TYPE compile bug (#10519)
    * CANN: ROPE operator optimization (#10540)
    * common : fix duplicated file name with hf_repo and hf_file
      (#10550)
    * Add some minimal optimizations for CDNA (#10498)
    * ci : faster CUDA toolkit installation method and use ccache
      (#10537)
    * metal : fix group_norm support condition (#0)
    * sync : ggml
    * Do not include arm_neon.h when compiling CUDA code (ggml/1028)
    * vulkan: define all quant data structures in types.comp (#10440)
* Wed Nov 27 2024 eyadlorenzo@gmail.com
  - Update to version 4195:
    * vulkan: Handle GPUs with less shared memory (#10468)
    * vulkan: further optimize q5_k mul_mat_vec (#10479)
    * vulkan: skip integer div/mod in get_offsets for batch_idx==0 (#10506)
    * vulkan: optimize Q2_K and Q3_K mul_mat_vec (#10459)
    * ci : fix cuda releases (#10532)
    * Add OLMo 2 model in docs (#10530)
    * ci : remove nix workflows (#10526)
    * llama : disable warnings for 3rd party sha1 dependency (#10527)
    * Fix HIP flag inconsistency & build docs (#10524)
    * mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (#10516)
    * vulkan: fix group_norm (#10496)
    * server : replace behave with pytest (#10416)
    * restore the condistion to build & update pacakge when merge (#10507)
    * cmake : enable warnings in llama (#10474)
    * ci : publish the docker images created during scheduled runs (#10515)
    * ci : add ubuntu cuda build, build with one arch on windows (#10456)
    * ggml-cpu: cmake add arm64 cpu feature check for macos (#10487)
    * server : fix parallel speculative decoding (#10513)
    * speculative : simplify the implementation (#10504)
    * CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454)
    * CANN: RoPE and CANCAT operator optimization (#10488)
    * vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484)
    * Introduce llama-run (#10291)
    * ci : build docker images only once daily (#10503)
    * server : add more information about error (#10455)
    * server : enable cache_prompt by default (#10501)
    * metal : enable mat-vec kernels for bs <= 4 (#10491)
    * Rename Olmo1124 to Olmo2 (#10500)
    * llama : accept a list of devices to use to offload a model (#10497)
    * Github: update issue templates [no ci] (#10489)
    * Add download chat feature to server chat (#10481)
    * server : add speculative decoding support (#10455)
    * ggml : add support for dynamic loading of backends (#10469)
    * tests : fix compile warning
    * metal : minor code formatting
    * [SYCL] Fix building Win package for oneAPI 2025.0 update (#10483)
    * speculative : refactor and add a simpler example (#10362)
    * flake.lock: Update (#10470)
    * llama : fix op mul check with command-r-plus (#10476)
    * convert : XLMRoberta Type Vocab Size (#10458)
    * fix gguf-py:  Conversion error when multiple licenses are configured (#9807)
    * ggml : do not use ARM features not included in the build (#10457)
* Sat Nov 23 2024 eyadlorenzo@gmail.com
  - Update to version 4153:
    * ci: Update oneAPI runtime dll packaging (#10428)
    * GitHub: ask for more info in issue templates (#10426)
    * CANN: Support Ascend310P to accelerate F32 and F16 Model (#10216)
    * cuda : optimize argmax (#10441)
    * llama : handle KV shift for recurrent models (#10402)
    * sync : ggml
    * ggml/sched : do not skip views in pre-assignments
    * ggml-opt: fix data corruption (ggml/1022)
    * vulkan: predicate max operation in soft_max shaders/soft_max (#10437)
    * cmake: add link dependencies to cmake find pkg (#10433)
    * llama : add .clang-format file (#10415)
    * vulkan: copy iq4_nl LUT into shared memory (#10409)
    * vulkan: further optimize mul_mat_vec using larger loads (#10387)
    * update rel to 4040 (#10395)
    * Fix missing file renames in Makefile due to changes in commit ae8de6d50a (#10413)
    * add cmake rvv support (#10411)
    * sync : ggml
    * metal : fox offset integer overflows in im2col (ggml/1015)
    * metal : add `GGML_UNARY_OP_ELU` kernel (ggml/1018)
    * cmake: force MSVC compiler charset to utf-8 (#9989)
    * Add required ggml-base and backend libs to cmake pkg (#10407)
    * cuda : fix CUDA_FLAGS not being applied (#10403)
    * llama : add check for KV cache shifts (#10401)
* Tue Nov 19 2024 eyadlorenzo@gmail.com
  - Update to version 4130:
    * llama : add OLMo November 2024 support (#10394)
    * sycl : Add option to set the SYCL architecture for all targets (#10266)
    * vulkan: Optimize soft_max (#10301)
    * sycl: Revert MUL_MAT_OP support changes (#10385)
* Tue Nov 19 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Package test-backend-ops
* Mon Nov 18 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Lower requires CMake version to 3.14
* Mon Nov 18 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Re-enable Vulkan backend
  - Update to version 4126:
    * cuda : only use native when supported by cmake (#10389)
    * Skip searching root path for cross-compile builds (#10383)
    * vulkan: remove use of null initializer (#10372)
    * flake.lock: Update (#10346)
    * Vulkan: Fix device info output format specifiers (#10366)
    * docker: use GGML_NATIVE=OFF (#10368)
* Mon Nov 18 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Disable Vulkan backend because of a bug on vnsprintf and Vulkan
    Backend: https://github.com/ggerganov/llama.cpp/issues/10375
  - Remove libllava packaging (for now)
  - Update to version 4120:
    * CUDA: fix MMV kernel being used for FP16 src1 (#10357)
    * CMake: fix typo in comment [no ci] (#10360)
    * llama : only use default buffer types for the KV cache (#10358)
    * gitignore : ignore local run scripts [no ci]
    * metal : refactor kernel args into structs (#10238)
    * ggml : fix undefined reference to 'getcpu' (#10354)
    * CUDA: remove DMMV, consolidate F16 mult mat vec (#10318)
    * CMake: default to -arch=native for CUDA build (#10320)
    * ggml : fix possible buffer use after free in sched reserve (#9930)
    * ggml : inttypes.h -> cinttypes (#0)
    * ggml : adapt AMX to tensor->grad removal (#0)
    * make : add ggml-opt (#0)
    * tests : remove test-grad0
    * ggml : fix compile warnings (#0)
    * ggml: new optimization interface (ggml/988)
    * scripts : update sync
    * docs : vulkan build instructions to use git bash mingw64 (#10303)
    * llama/ex: remove --logdir argument (#10339)
    * llamafile : fix include path (#0)
    * make : auto-determine dependencies (#0)
* Sat Nov 16 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Split libllama into libllama and libllava
  - Build with Vulkan support
  - Update to version 4100:
    * server: (web UI) Add samplers sequence customization (#10255)
    * scripts : fix missing key in compare-llama-bench.py (#10332)
    * vulkan: Optimize some mat-vec mul quant shaders (#10296)
    * vulkan : add cmake preset debug/release (#10306)
    * ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324)
    * llama : save number of parameters and the size in llama_model (#10286)
    * Make updates to fix issues with clang-cl builds while using AVX512 flags (#10314)
    * scripts: update compare-llama-bench.py (#10319)
    * ggml : fix some build issues
    * cmake : fix ppc64 check (whisper/0)
    * ggml : vulkan logs (whisper/2547)
    * sync : ggml
    * AVX BF16 and single scale quant optimizations (#10212)
    * ci: build test musa with cmake (#10298)
    * sycl: Update Intel docker images to use DPC++ 2025.0 (#10305)
    * server : (web UI) add copy button for code block, fix api key (#10242)
    * cann: dockerfile and doc adjustment (#10302)
    * scripts : fix regex in sync [no ci]
    * sycl: Use syclcompat::dp4a (#10267)
    * backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921)
    * ggml : build backends as libraries (#10256)
    * CUDA: no -sm row for very small matrices (#10185)
    * speculative : fix out-of-bounds access (#10289)
    * vulkan: Optimize binary ops (#10270)
    * vulkan: Use macros to make the mat mul pipeline creation more concise (#10259)
    * llama : propagate the results of `graph_compute` (#9525)
    * sync : ggml
    * docs : update bindings list (#10261)
    * server : add missing docs (#10269)
    * server : fix incorrect res in validate_model_chat_template (#10272)
    * metadata: Detailed Dataset Authorship Metadata (#8875)
    * sycl : Fixes to broken builds and test-backend-ops (#10257)
    * vulkan: Optimize contiguous copies (#10254)
    * vulkan: Throttle the number of shader compiles during the build step. (#10222)
* Mon Nov 11 2024 eyadlorenzo@gmail.com
  - Update to version 4066:
    * metal : more precise Q*K in FA vec kernel (#10247)
    * server : enable KV cache defrag by default (#10233)
    * flake.lock: Update (#10243)
    * server : (web UI) Add back sampler settings (#10239)
* Mon Nov 11 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Remove not used CLI commands from package
  - Update to version 4062:
    * vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10226)
    * metal : reorder write loop in mul mat kernel + style (#10231)
    * metal : fix build and some more comments (#10229)
    * metal : fix F32 accumulation in FA vec kernel (#10232)
    * llama : fix Qwen model type strings
    * metal : hide debug messages from normal log
    * ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213)
    * ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156)
    * scripts : fix pattern and get n_tokens in one go (#10221)
    * metal : opt-in compile flag for BF16 (#10218)
    * metal : improve clarity (minor) (#10171)
    * metal : optimize FA kernels (#10171)
    * swift : exclude ggml-metal-embed.metal (#10211)
    * server : minor UI fix (#10207)
    * server : revamp chat UI with vuejs and daisyui (#10175)
    * scripts : add amx to sync-ggml.sh [no ci]
    * sync : ggml
    * scripts : sync update
    * ggml : add ggml-cpu.h to the public headers (#10204)
    * Remove identical wte/etw logic for jais (#10203)
    * DRY: Fixes clone functionality (#10192)
    * fix q4_0_8_8 format for corrupted tokens issue (#10198)
    * Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133)
    * metal : add BF16 support (#8439)
    * server : remove hack for extra parallel slot (#10187)
    * metal : fix from ptr buffer name (#10189)
    * ggml : adjust is_first_call init value (#10193)
    * metal : add quantized FA support (#10149)
    * llama : add <|tool_call|> formatting to Granite template (#10177)
    * ggml : fix arch check in bf16_to_fp32 (#10164)
    * Q6_K AVX improvements (#10118)
    * ggml : fix gelu tables initialization (#10172)
    * ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167)
    * server : clarify /slots endpoint, add is_processing (#10162)
    * fix build break on arm64 linux (#10166)
    * cuda : clear error after changing peer access (#10153)
    * metal : simplify f16 and f32 dequant kernels (#0)
    * metal : move dequantize templates to beginning of MSL source (#0)
    * CANN: adjust backend registry refactor. (#10158)
    * sync : ggml
    * cmake : make it possible linking ggml as external lib (ggml/1003)
    * metal : fix minor string leaks (ggml/1004)
    * ggml : move CPU backend to a separate file (#10144)
    * metal : minor fixup in FA kernel (#10143)
    * flake.lock: Update (#10146)
    * Add apple arm to presets (#10134)
    * server : fix slot selection by lru (#10126)
    * server : fix endpoint checks (#10135)
    * llama : adjust default context size + print warnings (#10136)
    * simple-chat : only add bos on first prompt (#10129)
    * convert-lora : make `--base` optional (#10110)
    * llama : add simple-chat example (#10124)
    * llama : use smart pointers for ggml resources (#10117)
    * vulkan : improve ggml_vk_create_buffer error handling (#9898)
    * readme : update hot topics
    * server : fix smart selection of available slot (#10120)
    * ggml : remove ggml_scratch (#10121)
    * sync : ggml
    * ggml : alloc ggml_contexts on the heap (whisper/2525)
    * build: fix build error in Windows env with OneAPI setup (#10107)
    * llama : improve output buffer type selection (#10098)
    * quantize : fix --keep-split (#10114)
    * llama : fix buffer checks for mamba and rwk (#10111)
    * loader:  refactor tensor weights storage (#9935)
    * server : include scheme when printing URL (#10106)
    * ggml : check tensor name lengths in gguf files (#10100)
    * kompute: add mul_mat_q4_k shader (#10097)
* Thu Oct 31 2024 eyadlorenzo@gmail.com
  - Update to version 3995:
    * kompute: add backend registry / device interfaces (#10045)
    * ggml : fix memory leaks when loading invalid gguf files (#10094)
    * readme : more lora detail in main example readme (#10064)
    * convert : more detailed convert lora usage docs (#10065)
    * ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029)
    * llama : refactor model loader with backend registry (#10026)
    * ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763)
    * llama : remove Tail-Free sampling (#10071)
    * llama : Add IBM granite template (#10013)
    * flake.lock: Update (#10063)
    * musa: workaround for Guilty Lockup in cleaning src0 (#10042)
    * server : don't overfill the batch during infill (#10018)
    * llama : switch KQ multiplication to F32 precision by default (#10015)
    * sync : ggml
    * increase cuda_cpy block size (ggml/996)
    * scripts : fix amx sync [no ci]
    * metal : support permuted matrix multiplicaions (#10033)
    * llama : add DRY sampler (#9702)
    * llama: string_split fix (#10022)
    * llamafile : extend sgemm.cpp support for Q5_0 models (#10010)
    * server : check that the prompt fits in the slot's context (#10030)
    * server : refactor slot input data, move tokenizer to HTTP thread (#10023)
    * ci : fix cmake flags for SYCL
* Thu Oct 24 2024 eyadlorenzo@gmail.com
  - Update to version 3972:
    * CUDA: fix insufficient buffer clearing for MMQ (#10032)
    * CUDA: fix MMQ for non-contiguous src0, add tests (#10021)
    * server : samplers accept the prompt correctly (#10019)
    * sync : ggml
    * llama.vim : bump generation time limit to 3s [no ci]
    * CUDA: fix 1D im2col, add tests (ggml/993)
    * ggml : remove redundant set of contexts used field (ggml/978)
    * llama.vim : add classic vim support (#9995)
    * metal : add POOL2D and fix IM2COL (#9943)
    * flake.lock: Update
    * llama : fix empty batch causing llama_batch_allocr to crash (#9966)
    * llama : rename batch to ubatch (#9950)
    * Rwkv chat template fix (#10001)
    * lora : warn user if new token is added in the adapter (#9948)
    * llama : add chat template for RWKV-World + fix EOT (#9968)
    * [CANN] Adapt to dynamically loadable backends mechanism (#9970)
    * arg : fix typo in embeddings argument help [no ci] (#9994)
    * llama.vim : fix info text display [no ci] (#9787)
    * llama.vim : move info to the right of screen [no ci] (#9787)
    * readme : update UI list (#9972)
    * arg : fix attention non-causal arg value hint (#9985)
    * llama.vim : plugin for Neovim (#9787)
    * ggml : add asserts for type conversion in fattn kernels (#9971)
    * rpc : pack only RPC structs (#9959)
    * llama : default sampling changes + greedy update (#9897)
    * speculative : fix handling of some input params (#9963)
    * fix mul_mat_vec_q and *_vec_q error (#9939)
    * readme : update bindings list (#9951)
    * readme : update infra list (#9942)
    * llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745)
    * rpc : backend refactoring (#9912)
    * [SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705)
    * add amx kernel for gemm (#8998)
    * server : add n_indent parameter for line indentation requirement (#9929)
    * llama : rename batch_all to batch (#8881)
    * readme : remove --memory-f32 references (#9925)
    * llama : change warning to debug log
    * llama : infill sampling handle very long tokens (#9924)
    * readme : update bindings list (#9918)
    * vulkan : add backend registry / device interfaces (#9721)
    * fix: allocating CPU buffer with size `0` (#9917)
    * fix: use `vm_allocate` to allocate CPU backend buffer on macOS (#9875)
* Wed Oct 16 2024 eyadlorenzo@gmail.com
  - Update to version 3930:
    * llama : suppress conversion from 'size_t' to 'int' (#9046)
    * llava : fix typo in error message [no ci] (#9884)
    * grammar : fix JSON Schema for string regex with top-level alt. (#9903)
    * llama : add tensor name for "result_norm" (#9907)
    * server : fix the disappearance of the end of the text (#9867)
    * sync : ggml
    * ggml-alloc : remove buffer_id from leaf_alloc (ggml/987)
    * [CANN] Fix cann compilation error (#9891)
* Tue Oct 15 2024 eyadlorenzo@gmail.com
  - Update to version 3922:
    * llama : add infill sampler (#9896)
    * server : improve infill context reuse (#9894)
    * sampling : add XTC sampler (#9742)
    * server : update preact (#9895)
    * readme : update bindings list (#9889)
* Mon Oct 14 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 3917:
    * server : handle "logprobs" field with false value (#9871)
    * Vectorize load instructions in dmmv f16 CUDA kernel (#9816)
    * server : accept extra_context for the infill endpoint (#9874)
    * server : reuse cached context chunks (#9866)
    * flake.lock: Update (#9870)
* Mon Oct 14 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Add Vulkan support
* Sat Oct 12 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Update to version 3912:
    * server : add option to time limit the generation phase (#9865)
    * server : remove self-extend features (#9860)
    * server : remove legacy system_prompt feature (#9857)
* Sat Oct 12 2024 Eyad Issa <eyadlorenzo@gmail.com>
  - Initial packaging

Files

/usr/lib64/libggml-cpu.so

Generated by rpm2html 1.8.1

Fabrice Bellet, Wed Jan 29 02:38:11 2025

libggml-cpu-4501-1.1 RPM for riscv64

From OpenSuSE Ports Tumbleweed for riscv64

Provides

Requires

License

Changelog

Files