Reranker Model
Collection
A collection of Korean-specific reranking models β’ 2 items β’ Updated β’ 3
How to use upskyy/ko-reranker with sentence-transformers:
from sentence_transformers import CrossEncoder
model = CrossEncoder("upskyy/ko-reranker")
query = "Which planet is known as the Red Planet?"
passages = [
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)How to use upskyy/ko-reranker with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("upskyy/ko-reranker")
model = AutoModelForSequenceClassification.from_pretrained("upskyy/ko-reranker")ko-rerankerλ BAAI/bge-reranker-large λͺ¨λΈμ νκ΅μ΄ λ°μ΄ν°λ₯Ό finetuning ν model μ λλ€.
pip install -U FlagEmbedding
Get relevance scores (higher scores indicate more relevance):
from FlagEmbedding import FlagReranker
reranker = FlagReranker('upskyy/ko-reranker', use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
score = reranker.compute_score(['query', 'passage'])
print(score) # -1.861328125
# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
score = reranker.compute_score(['query', 'passage'], normalize=True)
print(score) # 0.13454832326359276
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores) # [-7.37109375, 8.5390625]
# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], normalize=True)
print(scores) # [0.0006287840192903181, 0.9998043646624727]
pip install -U sentence-transformers
Get relevance scores (higher scores indicate more relevance):
from sentence_transformers import SentenceTransformer
sentences_1 = ["κ²½μ μ λ¬Έκ°κ° κΈλ¦¬ μΈνμ λν μμΈ‘μ νκ³ μλ€.", "μ£Όμ μμ₯μμ ν ν¬μμκ° μ£Όμμ λ§€μνλ€."]
sentences_2 = ["ν ν¬μμκ° λΉνΈμ½μΈμ λ§€μνλ€.", "κΈμ΅ κ±°λμμμ μλ‘μ΄ λμ§νΈ μμ°μ΄ μμ₯λλ€."]
model = SentenceTransformer('upskyy/ko-reranker')
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)
Get relevance scores (higher scores indicate more relevance):
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('upskyy/ko-reranker')
model = AutoModelForSequenceClassification.from_pretrained('upskyy/ko-reranker')
model.eval()
pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
print(scores)
@misc{bge_embedding,
title={C-Pack: Packaged Resources To Advance General Chinese Embedding},
author={Shitao Xiao and Zheng Liu and Peitian Zhang and Niklas Muennighoff},
year={2023},
eprint={2309.07597},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.
# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("upskyy/ko-reranker") model = AutoModelForSequenceClassification.from_pretrained("upskyy/ko-reranker")