ollama/llama/compat/llama-ollama-compat-util.h
jmorganca 2a388da77b llama/compat: split shared infra into a util TU
Main translation unit (llama-ollama-compat.cpp) is now purely per-arch
dispatch: detect_* + handle_* for each arch, plus the 4 public entry
points. Dropped from 724 lines to 430.

Everything that doesn't depend on a specific arch moves to
llama-ollama-compat-util.{h,cpp}:
  - gguf KV helpers (has_key, copy_{u32,f32}_kv, inject_{u32,f32,str,bool,
    f32_arr}_if_missing, truncate_{str,data}_arr)
  - ggml tensor helpers (any_tensor_with_prefix, rename_tensor,
    rename_tensors_containing, set_tensor_{type,shape}, reclaim_slot_as,
    tensor_file_offset)
  - per-loader skip-prefix registry (add_skip_prefix, should_skip_tensor_prefix)
  - LoadOp registry (register_load_op, take_load_op, read_at)
  - common high-level transforms (promote_tensor_to_f32, register_concat_load)

New helpers introduced while splitting:
  - inject_{u32,f32,str,bool,f32_arr}_if_missing — replaces the
    has_key + gguf_set_val_* idiom we were using 20+ times.
  - reclaim_slot_as — extracts the "rename an orphan tensor slot as a
    synthesized one" pattern used by qwen35moe's patch_embed split. Clear
    name + comment explains the workaround.

CMake: target_sources now globs llama/compat/*.cpp (CONFIGURE_DEPENDS),
so new .cpp files are picked up without CMake edits.

Nothing behaviorally changed. Verified gemma3 + qwen3.5 text + vision
still work end-to-end after a clean rebuild.
2026-04-20 09:29:34 -07:00

86 lines
3.8 KiB
C++
Vendored

#pragma once
// Internal helpers shared by the per-architecture handlers in
// llama-ollama-compat.cpp. Not part of the public API.
//
// Everything lives under namespace llama_ollama_compat::detail. The
// definitions live in llama-ollama-compat-util.cpp, which also owns the
// registry globals (tensor skip list, load-op table) that need a single
// translation unit.
#include <cstddef>
#include <cstdint>
#include <functional>
#include <initializer_list>
#include <string>
#include <vector>
#include "ggml.h"
#include "ggml-backend.h"
#include "gguf.h"
struct llama_model_loader;
namespace llama_ollama_compat::detail {
// -- gguf_context KV helpers --
bool has_key(const gguf_context * meta, const char * key);
void copy_u32_kv(gguf_context * meta, const char * src, const char * dst);
void copy_f32_kv(gguf_context * meta, const char * src, const char * dst);
void inject_u32_if_missing (gguf_context * meta, const char * key, uint32_t v);
void inject_f32_if_missing (gguf_context * meta, const char * key, float v);
void inject_str_if_missing (gguf_context * meta, const char * key, const char * v);
void inject_bool_if_missing(gguf_context * meta, const char * key, bool v);
void inject_f32_arr_if_missing(gguf_context * meta, const char * key,
const float * data, size_t n);
void truncate_str_arr (gguf_context * meta, const char * key, size_t new_n);
void truncate_data_arr(gguf_context * meta, const char * key,
gguf_type elem_type, size_t elem_size, size_t new_n);
// -- ggml_context tensor scans --
bool any_tensor_with_prefix(const ggml_context * ctx, const char * prefix);
// -- Tensor renaming / reshaping (mutates both gguf_context and ggml_context) --
void rename_tensor(gguf_context * meta, ggml_context * ctx,
const char * old_name, const char * new_name);
void rename_tensors_containing(gguf_context * meta, ggml_context * ctx,
const char * needle, const char * replacement);
void set_tensor_type (ggml_tensor * t, ggml_type type);
void set_tensor_shape(ggml_tensor * t, std::initializer_list<int64_t> shape);
bool reclaim_slot_as (gguf_context * meta, ggml_context * ctx,
const char * orphan_name, const char * new_name,
std::initializer_list<int64_t> shape, ggml_type type);
// -- File-offset capture (before rename) --
size_t tensor_file_offset(const gguf_context * meta, const char * name);
// -- Per-loader skip-prefix registry --
void add_skip_prefix(const llama_model_loader * ml, std::string prefix);
bool should_skip_tensor_prefix(const llama_model_loader * ml, const char * name);
// -- Load-time transform registry --
struct LoadOp {
std::function<bool(const char * src_file, void * dst, size_t dst_size)> apply;
const char * description;
};
void register_load_op(std::string dest_name, LoadOp op);
bool take_load_op (const char * dest_name, LoadOp & out); // removes + returns
// Read `size` bytes at `offset` from `path` into `dst`. Used by LoadOps.
bool read_at(const char * path, size_t offset, void * dst, size_t size);
// -- Common high-level transforms --
// F16 -> F32 promotion. Captures the source file offset at registration
// time so later renames/reshapes of this tensor don't invalidate the read.
void promote_tensor_to_f32(gguf_context * meta, ggml_context * ctx, const char * name);
// Concatenate N source tensors into one destination. Captures each source's
// file offset + byte size at registration time. Layout assumption: sources
// concatenate cleanly along the destination's slow ggml axis, which in
// C order means the destination bytes are src[0] || src[1] || ... .
void register_concat_load(const gguf_context * meta, std::string dest_name,
const std::vector<std::string> & src_names);
} // namespace llama_ollama_compat::detail