API 接口文档

🌟 FG-CLIP Embedding API

此服务对输入的文本或图像进行语义特征提取，并返回对应的语义 Embedding 向量。

Embedding 向量是一个由 "n*浮点数" 构成的列表，每一个浮点数代表一个特征维，当前版本模型 n=768，后续新模型上线后请查阅对应版本的模型文档说明。

Embedding 向量可以用于创建文本或图片分类器，增强文本或图片搜索、推荐功能，使结果更符合内容语义而非浅表特征。

FG-CLIP embedding 的一大独特优势是支持细粒度检索，通过输入 bbox 获取图像对应位置的局部特征和通过 text_box_flag 参数获取对应文本的局部特征。

全局图像特征：代表整体图像的特征向量。
局部图像特征：是指对图像中用户指定区域对应内容的特征向量，特征更为精细。FG-CLIP 模型的独特优势是能够为图像的局部区域提取媲美全局图像特征鉴别能力的局部特征，解决传统方法只能处理全局图像的“近视”问题。

API-1 通用Embedding 接口协议

🚀 Request Method 请求方法

此服务支持 POST 方法，将文本和图片数据发送到服务端，然后返回 embedding 特征数据。

    # python
    import requests
    url = "https://api.research.360.cn/generate_embedding"
    resp = requests.post(url=url,...)

    # curl
    curl -X 'POST' 'https://api.research.360.cn/generate_embedding'

🔐 Headers 请求头

HTTP 请求头信息包含了客户端向服务器发送请求时附带的各种细节信息，帮助服务器更好地处理请求。这些头部字段多种多样，用于说明请求的各个方面，如客户端信息、请求的内容类型、缓存策略等。

💾 accept

string Required HTTPS 请求的头部信息

客户端能够处理的内容类型（MIME 类型）列表，用于告知服务器期望接收的数据格式，例如：application/json。

    header = {"accept": "application/json",...} # python
    -H 'accept: application/json'  # curl

📪 Content-Type

string Required HTTPS 请求的头部信息

告知服务端，客户端发送至服务端的数据格式是某种特定格式，例如：application/json。

    header = {"Content-Type": "application/json",...} # python
    -H 'Content-Type: application/json'  # curl

🎰 Headers 完整示例

python 脚本

    header = {
        "accept": "application/json",
        "Content-Type": "application/json"
    }

curl 命令

    curl -X 'POST' 'https://api.research.360.cn/models/interface'
        -H 'accept: application/json'
        -H 'Content-Type: application/json'

📥 Request Body

HTTP 请求的 Body 部分是请求消息的可选部分，通常用于在客户端向服务器发送数据时携带实际内容。Body 的格式和内容类型由请求头中的 Content-Type 字段决定。

在本服务中，Body 是必选部分，格式为 json 格式的字符串。

1️⃣ `request_id`

string

单个请求的服务标识是由 uuid4 生成的，具备唯一性。"request_id": "2691733a-d172-4b20-8aab-4ecfeb089141"

2️⃣ model

string Required

此次服务调用模型，目前只有一个模型选项：fg-clip。

    request_body = {"model": "fg-clip",...} # python

3️⃣ input_type

string Required

指定传递给服务端的输入类型。

允许的值-列表:

text: 使用文本作为此次 embedding 调用的输入；
image: 使用图片作为此次 embedding 调用的输入。

    request_body = {"input_type": "image",...} # python
    request_body = {"input_type": "text",...} # python

4️⃣ embedding_types

list of strings Required

指定您想要获取的嵌入的类型。这些类型可以是以下任意一种或多种，建议只选一种嵌入类型。

允许的值-列表:

float: 获取 FLOAT32 型的 embedding 特征向量；
int8: 获取 INT8 型的 embedding 特征向量；
uint8: 获取 UINT8 型的 embedding 特征向量。

    request_body = {"embedding_types": ["float","int8","uint8"],...} # python

5️⃣ texts

list of strings Optional

输入服务端 embedding 的是一组文本列表。

输入最大支持 32 条文本输入，每条文本可以最多容纳 196 tokens,大概 150 个中文汉字。过长的文本会被截断，和另一个输入参数truncate搭配使用，具体看下文关于 truncate 的介绍。

文本 embedding 模型采用长短文本分治策略，64tokens 以下的短文本集合由短文本模型推理，64tokens 以上到 196tokens 之间的模型会由长文本模型推理。

同一次 emedding 请求的 input_type 是 image 时，需要保持 texts 为空 list，或者不输入。

    text_inputs = ["an apple","a pikachu"]
    request_body = {
        "model": "fg-clip",
        "input_type": "text",
        "embedding_types": ["float"],
        "texts": text_inputs,
        ...
        } # python

6️⃣ images

list of strings Optional

输入服务端 embedding 的是一组图像列表，最大支持 32 张图片输入。

图片列表支持 image_url 形式输入["https://xxx.jpg",""]；也可以 base64 形式输入["data:image/png;base64,base64_str",""]。 url 形式输入目前只支持以 http/https 开头的，域名非 ip 地址的（url 包含 ip 的暂不支持），以jpg|jpeg|png|bmp结尾的图片 url。

当此次 emedding 请求的 input_type 是 text 时，需要保持 images 为空 list，或者不输入。

    # image url 输入
    image_inputs = ["https://xxxx.png","https://xxxx.jpg",]
    request_body = {
        "model": "fg-clip",
        "input_type": "image",
        "embedding_types": ["float"],
        "images": image_inputs,
        ...
        } # python

    # image base64 输入
    import os
    import base64
    def batch_images_to_base64(folder_path):
        result = []
        # 支持的图片格式与MIME类型映射
        mime_types = {'.jpg': 'jpeg', '.jpeg': 'jpeg', '.png': 'png'}
        for filename in os.listdir(folder_path):
            filepath = os.path.join(folder_path,  filename)
            # 检查文件是否为图片
            if not os.path.isfile(filepath):
                continue
            # 获取扩展名并转为小写
            ext = os.path.splitext(filename)[1].lower()
            if ext not in mime_types:
                continue
            # 读取文件并编码
            with open(filepath, 'rb') as f:
                img_data = f.read()
                base64_str = base64.b64encode(img_data).decode('utf-8')
                data_uri = f"data:image/{mime_types[ext]};base64,{base64_str}"
                result.append(data_uri)
    images_path = "/data/images/"
    images_base64 = batch_images_to_base64("/data/chenchuang/images")
    images_base64 = images_base64[:32]
    request_body = {
        "model": "fg-clip",
        "input_type": "image",
        "embedding_types": ["float"],
        "images": images_base64,
        ...
        } # python

7️⃣ truncate

strings Optional

默认是 none。

其中none/start分别用于指定 API 如何处理超出最大文本长度（196 tokens）的输入，当 input_type 为 text 时，需要注意此输入。

start: 从头开始到最大 tokens 截断。
none: 不截断，超过最大 tokens 报错，此值为默认值.

在start这种情况下，会丢弃输入内容，直到剩余的输入恰好达到模型的最大输入 tokens 长度。

    request_body = {
        "model": "fg-clip",
        "input_type": "text",
        "embedding_types": ["float"],
        "texts": text_inputs,
        "truncate": "none",
        ...
        } # python

    request_body = {"truncate": "start",...} # python

8️⃣ image_boxes

three-dimensional list of float Optional

默认是 []。

当 input_type="image"并且需要获取图像局部特征，做细粒度检索时使用。一张图片支持多个 box，每个 box 含有 4 个 float 值。当请求参数输入image_boxes的时候，返回的 embeddings 和每个图像的 box 一一对应：比如输入的 image_boxes 为 [[[x1,y1,w1,h1],[x2,y2,w2,h2]],[[x3,y3,w3,h3]]]，输入的图像为 ["image1_url","image2_url"],输入的 embedding 为 3*dim,则 embedings 的顺序为 image1_box1，image1_box2，image2_box3。

box点位说明

如上图是在一个图片中画出小 box 的方法，在图片的左上角选定一个点 dot。按照如上的坐标轴，dot 的坐标是(x,y)，然后小的 box 的宽是 w，高是 h，组成一个 box 的输入，如[x,y,w,h]。原始图片的宽为 W，高为 H，满足 x+w <= W && y+h <= H，每个 box 的相对顺序和 image 的相对顺序一致。


    boxes_info = [[[52.0,399.0,412.0,340.0]],...]
    request_body = {
        "model": "fg-clip",
        "input_type": "image",
        "embedding_types": ["float"],
        "images": image_inputs,
        "image_boxes":boxes_info,
        ...
        } # python

9️⃣ text_box_flag

Boolean Optional

默认是 False。

当 input_type="text"并且需要获取文本的局部特征，做细粒度检索时使用。

    request_body = {
        "text_box_flag":True,
        ...
        } # python

🎰 Request Body 完整示例

python 脚本

    text_body = {
        "model": "fg-clip",
        "input_type": input_type,
        "embedding_types": embedding_types,
        "texts": texts,
        "truncate": truncate, ## 可选
        "text_box_flag":text_box_flag ## 可选
    }

    image_body = {
        "model": "fg-clip",
        "input_type": input_type,
        "embedding_types": embedding_types,
        "images": images,
        "image_boxes":boxes_info ## 可选
    }

curl 命令

    curl -X 'POST' 'https://api.research.360.cn/generate_embedding'
        -H 'accept: application/json'
        -H 'Content-Type: application/json'
        -d '{
            "model": "fg-clip",
            "input_type": "text",
            "embedding_types": ["float"],
            "texts": ["an apple","two apples"],
            "truncate": "start",
            "text_box_flag": false
        }'

    curl -X 'POST' 'https://api.research.360.cn/generate_embedding'
        -H 'accept: application/json'
        -H 'Content-Type: application/json'
        -d '{
            "model": "fg-clip",
            "input_type": "image",
            "embedding_types": ["float"],
            "images": ["https://xxxx.png","https://xxxx.jpg"],
            "image_boxes": [[7.03, 16.76, 149.32, 94.87]]
        }'

📤 Response

✅ context

code int: 0 表示成功
messages string : "OK" 表示成功
timestamp long long : 时间戳

{ "context": { "code": 0, "message": "OK", "timestamp": 1753086217 } }

✅ data

1️⃣ `request_id`

string

和输入参数request_id一致，具备唯一性。"request_id": "2691733a-d172-4b20-8aab-4ecfeb089141"

2️⃣ `embeddings`

object or null

具有不同数据类型 embedding 特征向量的对象。不同类型可以参考embedding_types

每个嵌入类型数组的长度将与原始texts/images数组的长度相同。

当请求参数 input_type="text" && text_box_flag == True 时,返回的 emdedding["float"]长文本的稠密特征。

当请求参数 input_type="image" && len(image_boxes) == len(images)时，返回的 emdedding["float"]为图像对应 box 的局部特征。

{
  "embeddings": {
    "float": [
      [
        0.01062996219843626, 0.026321588084101677, -0.0231630802154541,
        -0.0043367426842451096, 0.026321588084101677, -0.0231630802154541,
        ......
      ]
    ]
  }
}

Show 3 properties

`float`

list of lists of doubles or null

FLOAT32 类型的 embedding 特征列表。

`int8`

list of lists of integers or null

INT8 类型的 embedding 特征列表。每个值的范围为 -128 到 127。

`uint8`

list of lists of integers or null

UINT8 类型的 embedding 特征列表。每个值的范围为 0 到 255。

3️⃣ `texts`

list of strings or null

返回此次请求输入的文本列表中每个文本的长度。

{ "texts": ["275", "52", "50", "255"] }

4️⃣ `images`

list of objects or null

返回此次请求输入的图片列表中每个图片的维度信息。

{
  "images": [
    { "width": 1024, "height": 1542, "format": "JPEG", "bit_depth": 24 }
  ]
}

Show 4 properties

`width`

long

图片的宽（像素）。

`height`

long

图片的高（像素）。

`format`

string

图片的颜色格式。

`bit_depth`

long

图像的位深度。

5️⃣ `meta`

object or null

返回此次请求相关的计费信息。

{
  "meta": {
    "api_version": { "version": "2.0.1" },
    "billed_units": {
      "images": 32,
      "input_tokens": 25600,
      "output_tokens": 0
    }
  }
}

Show 4 properties

`api_version`

object or null

`billed_units`

object or null

Show 3 properties

`images`

double or null

已计费的图像数量。

`input_tokens`

double or null

已计费的输入文本 tokens 数量。

`output_tokens` . Defaults to `0`

double or null

已计费的输出文本 tokens 数量。输出文本 tokens 数量始终为 0。计费依据的是已计费的图像数量和已计费的输入文本 tokens 数量。

6️⃣ `created`

string or null 返回当前请求开始处理的时间，格式为%Y-%m-%d %H:%M:%S.%f

{ "created": "2025-07-10 10:23:24.946" }

🎰 Response 完整示例

{
  "context": { "code": 0, "message": "OK", "timestamp": 1753086217 },
  "data": {
    "id": "2691733a-d172-4b20-8aab-4ecfeb089141",
    "embeddings": {
      "float": [
        [
          0.01062996219843626, 0.026321588084101677, -0.0231630802154541,
          -0.0043367426842451096, 0.026321588084101677, -0.0231630802154541,
          ......
        ]
      ]
    },
    "texts": ["275", "52", "50", "255"],
    "images": [
      { "width": 1024, "height": 1542, "format": "JPEG", "bit_depth": 24 }
    ],
    "meta": {
      "api_version": { "version": "2.0.1" },
      "billed_units": {
        "images": 32,
        "input_tokens": 25600,
        "output_tokens": 0
      },
      "latencys": []
    },
    "created": "2025-07-10 10:23:24.946"
  }
}

API-2 OpenAI Embedding 接口协议

from openai import OpenAI

client = OpenAI(base_url="https://api.research.360.cn/v1",api_key="your_key")

response = client.embeddings.create(
    model="fg-clip",
    input=["xxxx"],
    encoding_format="float",
    extra_body={
        "request_id":" ",
        "input_type": "image",
        "truncate": "start",
        "image_boxes":[],
        "text_box_flag":False
    }
)

Resquest

model 参考上文
input 等同于上文的 texts 和 images
encoding_format 只支持float
extra_body
- request_id:参考上文
- input_type:参考上文
- truncate:参考上文
- image_boxes:参考上文
- text_box_flag:参考上文

Response

参考 openai CreateEmbeddingResponse 类

🔧 Image-Text Similarity Calculation

在得到对应的 embedding 特征向量后，相似度计算，包括

文本-文本相似度
文本-图像相似度
图像-图像相似度

都可以采用简单的余弦相似度方法进行计算

    import torch

    image_features = torch.tensor(image_embeddings["float"])
    text_features  = torch.tensor(text_embeddings["float"])

    probs = image_features @ text_features.T

    print(probs.shape)
    print("Label probs:", probs)

注意余弦相似度的值域是在[-1, +1]之间，如果需要将值域归一化到[0, 1]之间，推荐使用如下的计算方法：

import numpy as np

def logits(image_features, text_features):
    # 计算文本特征与图像特征的矩阵乘法
    logits_per_text = np.matmul(text_features, image_features.T)

    # 缩放因子和偏置
    logit_scale = np.array([4.7500])
    logit_bias  = np.array([-16.7500])

    # 应用缩放和偏置
    logits_per_text = logits_per_text * np.exp(logit_scale) + logit_bias

    # 转置得到图像的logits
    logits_per_image = logits_per_text.T

    # 计算sigmoid
    sims_matrix = 1 / (1 + np.exp(-logits_per_image))
    sims_matrix = np.squeeze(sims_matrix, axis=-1)

    return sims_matrix

Text input

Image input

复制

Response

复制

请先登录

🌟 FG-CLIP Embedding API

API-1 通用Embedding 接口协议

🚀 Request Method 请求方法

🔐 Headers 请求头

💾 accept

📪 Content-Type

🎰 Headers 完整示例

📥 Request Body

1️⃣ request_id

2️⃣ model

3️⃣ input_type

4️⃣ embedding_types

5️⃣ texts

6️⃣ images

7️⃣ truncate

8️⃣ image_boxes

9️⃣ text_box_flag

🎰 Request Body 完整示例

📤 Response

✅ context

✅ data

1️⃣ request_id

2️⃣ embeddings

float

int8

uint8

3️⃣ texts

4️⃣ images

width

height

format

bit_depth

5️⃣ meta

api_version

billed_units

images

input_tokens

output_tokens . Defaults to 0

6️⃣ created

🎰 Response 完整示例

API-2 OpenAI Embedding 接口协议

Resquest

Response

🔧 Image-Text Similarity Calculation

1️⃣ `request_id`

1️⃣ `request_id`

2️⃣ `embeddings`

`float`

`int8`

`uint8`

3️⃣ `texts`

4️⃣ `images`

`width`

`height`

`format`

`bit_depth`

5️⃣ `meta`

`api_version`

`billed_units`

`images`

`input_tokens`

`output_tokens` . Defaults to `0`

6️⃣ `created`