Sentence Similarity
sentence-transformers
Safetensors
English
bert
feature-extraction
Generated from Trainer
dataset_size:40482
loss:MatryoshkaLoss
loss:MultipleNegativesRankingLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use potsu-potsu/bge-base-mrl-train40k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use potsu-potsu/bge-base-mrl-train40k with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("potsu-potsu/bge-base-mrl-train40k") sentences = [ "List the deadliest viruses in the world.", "Mediator is a large multiprotein complex conserved in all eukaryotes, which has \na crucial coregulator function in transcription by RNA polymerase II (Pol II). \nHowever, the molecular mechanisms of its action in vivo remain to be understood. \nMed17 is an essential and central component of the Mediator head module. In this \nwork, we utilised our large collection of conditional temperature-sensitive \nmed17 mutants to investigate Mediator's role in coordinating preinitiation \ncomplex (PIC) formation in vivo at the genome level after a transfer to a \nnon-permissive temperature for 45 minutes. The effect of a yeast mutation \nproposed to be equivalent to the human Med17-L371P responsible for infantile \ncerebral atrophy was also analyzed. The ChIP-seq results demonstrate that med17 \nmutations differentially affected the global presence of several PIC components \nincluding Mediator, TBP, TFIIH modules and Pol II. Our data show that Mediator \nstabilizes TFIIK kinase and TFIIH core modules independently, suggesting that \nthe recruitment or the stability of TFIIH modules is regulated independently on \nyeast genome. We demonstrate that Mediator selectively contributes to TBP \nrecruitment or stabilization to chromatin. This study provides an extensive \ngenome-wide view of Mediator's role in PIC formation, suggesting that Mediator \ncoordinates multiple steps of a PIC assembly pathway.", "mTOR complex 2 (mTORC2) signaling is upregulated in multiple types of human \ncancer, but the molecular mechanisms underlying its activation and regulation \nremain elusive. Here, we show that microRNA-mediated upregulation of Rictor, an \nmTORC2-specific component, contributes to tumor progression. Rictor is \nupregulated via the repression of the miR-424/503 cluster in human prostate and \ncolon cancer cell lines that harbor c-Src upregulation and in Src-transformed \ncells. The tumorigenicity and invasive activity of these cells were suppressed \nby re-expression of miR-424/503. Rictor upregulation promotes formation of \nmTORC2 and induces activation of mTORC2, resulting in promotion of tumor growth \nand invasion. Furthermore, downregulation of miR-424/503 is associated with \nRictor upregulation in colon cancer tissues. These findings suggest that the \nmiR-424/503-Rictor pathway plays a crucial role in tumor progression.", "This year marks the 100th anniversary of the deadliest event in human history. \nIn 1918-1919, pandemic influenza appeared nearly simultaneously around the globe \nand caused extraordinary mortality (an estimated 50-100 million deaths) \nassociated with unexpected clinical and epidemiological features. The \ndescendants of the 1918 virus remain today; as endemic influenza viruses, they \ncause significant mortality each year. Although the ability to predict influenza \npandemics remains no better than it was a century ago, numerous scientific \nadvances provide an important head start in limiting severe disease and death \nfrom both current and future influenza viruses: identification and substantial \ncharacterization of the natural history and pathogenesis of the 1918 causative \nvirus itself, as well as hundreds of its viral descendants; development of \nmoderately effective vaccines; improved diagnosis and treatment of \ninfluenza-associated pneumonia; and effective prevention and control measures. \nRemaining challenges include development of vaccines eliciting significantly \nbroader protection (against antigenically different influenza viruses) that can \nprevent or significantly downregulate viral replication; more complete \ncharacterization of natural history and pathogenesis emphasizing the protective \nrole of mucosal immunity; and biomarkers of impending influenza-associated \npneumonia." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| license: apache-2.0 | |
| tags: | |
| - sentence-transformers | |
| - sentence-similarity | |
| - feature-extraction | |
| - generated_from_trainer | |
| - dataset_size:40482 | |
| - loss:MatryoshkaLoss | |
| - loss:MultipleNegativesRankingLoss | |
| widget: | |
| - source_sentence: List the deadliest viruses in the world. | |
| sentences: | |
| - "Mediator is a large multiprotein complex conserved in all eukaryotes, which has\ | |
| \ \na crucial coregulator function in transcription by RNA polymerase II (Pol\ | |
| \ II). \nHowever, the molecular mechanisms of its action in vivo remain to be\ | |
| \ understood. \nMed17 is an essential and central component of the Mediator head\ | |
| \ module. In this \nwork, we utilised our large collection of conditional temperature-sensitive\ | |
| \ \nmed17 mutants to investigate Mediator's role in coordinating preinitiation\ | |
| \ \ncomplex (PIC) formation in vivo at the genome level after a transfer to a\ | |
| \ \nnon-permissive temperature for 45 minutes. The effect of a yeast mutation\ | |
| \ \nproposed to be equivalent to the human Med17-L371P responsible for infantile\ | |
| \ \ncerebral atrophy was also analyzed. The ChIP-seq results demonstrate that\ | |
| \ med17 \nmutations differentially affected the global presence of several PIC\ | |
| \ components \nincluding Mediator, TBP, TFIIH modules and Pol II. Our data show\ | |
| \ that Mediator \nstabilizes TFIIK kinase and TFIIH core modules independently,\ | |
| \ suggesting that \nthe recruitment or the stability of TFIIH modules is regulated\ | |
| \ independently on \nyeast genome. We demonstrate that Mediator selectively contributes\ | |
| \ to TBP \nrecruitment or stabilization to chromatin. This study provides an extensive\ | |
| \ \ngenome-wide view of Mediator's role in PIC formation, suggesting that Mediator\ | |
| \ \ncoordinates multiple steps of a PIC assembly pathway." | |
| - "mTOR complex 2 (mTORC2) signaling is upregulated in multiple types of human \n\ | |
| cancer, but the molecular mechanisms underlying its activation and regulation\ | |
| \ \nremain elusive. Here, we show that microRNA-mediated upregulation of Rictor,\ | |
| \ an \nmTORC2-specific component, contributes to tumor progression. Rictor is\ | |
| \ \nupregulated via the repression of the miR-424/503 cluster in human prostate\ | |
| \ and \ncolon cancer cell lines that harbor c-Src upregulation and in Src-transformed\ | |
| \ \ncells. The tumorigenicity and invasive activity of these cells were suppressed\ | |
| \ \nby re-expression of miR-424/503. Rictor upregulation promotes formation of\ | |
| \ \nmTORC2 and induces activation of mTORC2, resulting in promotion of tumor growth\ | |
| \ \nand invasion. Furthermore, downregulation of miR-424/503 is associated with\ | |
| \ \nRictor upregulation in colon cancer tissues. These findings suggest that the\ | |
| \ \nmiR-424/503-Rictor pathway plays a crucial role in tumor progression." | |
| - "This year marks the 100th anniversary of the deadliest event in human history.\ | |
| \ \nIn 1918-1919, pandemic influenza appeared nearly simultaneously around the\ | |
| \ globe \nand caused extraordinary mortality (an estimated 50-100 million deaths)\ | |
| \ \nassociated with unexpected clinical and epidemiological features. The \ndescendants\ | |
| \ of the 1918 virus remain today; as endemic influenza viruses, they \ncause significant\ | |
| \ mortality each year. Although the ability to predict influenza \npandemics remains\ | |
| \ no better than it was a century ago, numerous scientific \nadvances provide\ | |
| \ an important head start in limiting severe disease and death \nfrom both current\ | |
| \ and future influenza viruses: identification and substantial \ncharacterization\ | |
| \ of the natural history and pathogenesis of the 1918 causative \nvirus itself,\ | |
| \ as well as hundreds of its viral descendants; development of \nmoderately effective\ | |
| \ vaccines; improved diagnosis and treatment of \ninfluenza-associated pneumonia;\ | |
| \ and effective prevention and control measures. \nRemaining challenges include\ | |
| \ development of vaccines eliciting significantly \nbroader protection (against\ | |
| \ antigenically different influenza viruses) that can \nprevent or significantly\ | |
| \ downregulate viral replication; more complete \ncharacterization of natural\ | |
| \ history and pathogenesis emphasizing the protective \nrole of mucosal immunity;\ | |
| \ and biomarkers of impending influenza-associated \npneumonia." | |
| - source_sentence: Where is X-ray free electron laser used? | |
| sentences: | |
| - "BACKGROUND: After tooth loss, the posterior maxilla is usually characterized\ | |
| \ by \nlimited bone height secondary to pneumatization of the maxillary sinus\ | |
| \ and/or \ncollapse of the alveolar ridge that preclude in many instances the\ | |
| \ installation \nof dental implants. In order to compensate for the lack of bone\ | |
| \ height, several \ntreatment options have been proposed. These treatment alternatives\ | |
| \ aimed at the \ninstallation of dental implants with or without the utilization\ | |
| \ of bone grafting \nmaterials avoiding the perforation of the Schneiderian membrane.\ | |
| \ Nevertheless, \nmembrane perforations represent the most common complication\ | |
| \ among these \nprocedures. Consequently, the present review aimed at the elucidation\ | |
| \ of the \nrelevance of this phenomenon on implant survival and complications.\n\ | |
| MATERIAL AND METHODS: Electronic and manual literature searches were performed\ | |
| \ \nby two independent reviewers in several databases, including MEDLINE, EMBASE,\ | |
| \ \nand Cochrane Oral Health Group Trials Register, for articles up to January\ | |
| \ 2018 \nreporting outcome of implant placement perforating the sinus floor without\ | |
| \ \nregenerative procedure (lateral sinus lift or transalveolar technique) and\ | |
| \ graft \nmaterial. The intrusion of the implants can occur during drilling or\ | |
| \ implant \nplacement, with and without punch out Schneiderian. Only studies with\ | |
| \ at least \n6 months of follow-up were included in the qualitative assessment.\n\ | |
| RESULTS: Eight studies provided information on the survival rate, with a global\ | |
| \ \nsample of 493 implants, being the weighted mean survival rate 95.6% (IC 95%),\ | |
| \ \nafter 52.7 months of follow-up. The level of implant penetration (≤ 4 mm or\ | |
| \ \n> 4 mm) did not report statistically significant differences in survival rate\ | |
| \ \n(p = 0.403). Seven studies provided information on the rate of clinical \n\ | |
| complications, being the mean complication rate 3.4% (IC 95%). The most frequent\ | |
| \ \nclinical complication was epistaxis, without finding significant differences\ | |
| \ \naccording to the level of penetration. Five studies provide information on\ | |
| \ the \nradiographic complication; the most common complication was thickening\ | |
| \ of the \nSchneiderian membrane. The weighted complication rate was 14.8% (IC\ | |
| \ 95%), and \npenetration level affects the rate of radiological complications,\ | |
| \ being these of \n5.29% in implant penetrating ≤4 mm and 29.3% in implant penetrating\ | |
| \ > 4 mm, \nwithout reaching statistical significant difference (p = 0.301).\n\ | |
| CONCLUSION: The overall survival rate of the implants into the sinus cavity was\ | |
| \ \n95.6%, without statistical differences according to the level of penetration.\ | |
| \ \nThe clinical and radiological complications were 3.4% and 14.8% respectively.\ | |
| \ \nThe most frequent clinical complication was the epistaxis, and the radiological\ | |
| \ \ncomplication was thickening of the Schneiderian membrane, without reaching\ | |
| \ \nstatistical significant difference according to the level of implant penetration\ | |
| \ \ninside the sinus." | |
| - "Ultrashort X-ray pulses from free-electron laser X-ray sources make it feasible\ | |
| \ \nto conduct small- and wide-angle scattering experiments on biomolecular samples\ | |
| \ \nin solution at sub-picosecond timescales. During these so-called fluctuation\ | |
| \ \nscattering experiments, the absence of rotational averaging, typically induced\ | |
| \ \nby Brownian motion in classic solution-scattering experiments, increases the\ | |
| \ \ninformation content of the data. In order to perform shape reconstruction\ | |
| \ or \nstructure refinement from such data, it is essential to compute the theoretical\ | |
| \ \nprofiles from three-dimensional models. Based on the three-dimensional Zernike\ | |
| \ \npolynomial expansion models, a fast method to compute the theoretical \nfluctuation\ | |
| \ scattering profiles has been derived. The theoretical profiles have \nbeen validated\ | |
| \ against simulated results obtained from 300 000 scattering \npatterns for several\ | |
| \ representative biomolecular species." | |
| - Hemophilic Pseudotumor is a rare complication of hemophilia. It is an encapsulated | |
| haematoma in patients with haemophilia which has a tendency to progress and produce | |
| clinical symptoms related to its anatomical location. The lesion most frequently | |
| occurs in the long bones, pelvis, small bones of the hands and feet, or rarely | |
| in the maxillofacial region. | |
| - source_sentence: For the constructions of which organs has 3D printing been tested? | |
| sentences: | |
| - "The ability to three-dimensionally interweave biological tissue with functional\ | |
| \ \nelectronics could enable the creation of bionic organs possessing enhanced\ | |
| \ \nfunctionalities over their human counterparts. Conventional electronic devices\ | |
| \ \nare inherently two-dimensional, preventing seamless multidimensional integration\ | |
| \ \nwith synthetic biology, as the processes and materials are very different.\ | |
| \ Here, \nwe present a novel strategy for overcoming these difficulties via additive\ | |
| \ \nmanufacturing of biological cells with structural and nanoparticle derived\ | |
| \ \nelectronic elements. As a proof of concept, we generated a bionic ear via\ | |
| \ 3D \nprinting of a cell-seeded hydrogel matrix in the anatomic geometry of a\ | |
| \ human \near, along with an intertwined conducting polymer consisting of infused\ | |
| \ silver \nnanoparticles. This allowed for in vitro culturing of cartilage tissue\ | |
| \ around an \ninductive coil antenna in the ear, which subsequently enables readout\ | |
| \ of \ninductively-coupled signals from cochlea-shaped electrodes. The printed\ | |
| \ ear \nexhibits enhanced auditory sensing for radio frequency reception, and\ | |
| \ \ncomplementary left and right ears can listen to stereo audio music. Overall,\ | |
| \ our \napproach suggests a means to intricately merge biologic and nanoelectronic\ | |
| \ \nfunctionalities via 3D printing." | |
| - "A case of heterochromia iridis and Horner's syndrome is reported in a 7-year\ | |
| \ old \ngirl with paravertebral neurilemmoma. These clinical findings can be useful\ | |
| \ in \nthe early diagnosis of mediastinal tumors in the paravertebral axis. While\ | |
| \ \ntypically associated with neuroblastoma, these findings can be due to tumors\ | |
| \ \nwhich are inately benign--in this case neurilemmoma. The mechanism for \n\ | |
| heterochromia is briefly discussed." | |
| - "The creation of complex neuronal networks relies on ligand-receptor interactions\ | |
| \ \nthat mediate attraction or repulsion towards specific targets. Roundabouts\ | |
| \ \ncomprise a family of single-pass transmembrane receptors facilitating this\ | |
| \ \nprocess upon interaction with the soluble extracellular ligand Slit protein\ | |
| \ \nfamily emanating from the midline. Due to the complexity and flexible nature\ | |
| \ of \nRobo receptors , their overall structure has remained elusive until now.\ | |
| \ Recent \nstructural studies of the Robo 1 and Robo 2 ectodomains have provided\ | |
| \ the basis \nfor a better understanding of their signalling mechanism. These\ | |
| \ structures \nreveal how Robo receptors adopt an auto-inhibited conformation\ | |
| \ on the cell \nsurface that can be further stabilised by cis and/or trans oligmerisation\ | |
| \ \narrays. Upon Slit -N binding Robo receptors must undergo a conformational\ | |
| \ change \nfor Ig4 mediated dimerisation and signaling, probably via endocytosis.\ | |
| \ \nFurthermore, it's become clear that Robo receptors do not only act alone,\ | |
| \ but as \nlarge and more complex cell surface receptor assemblies to manifest\ | |
| \ directional \nand growth effects in a concerted fashion. These context dependent\ | |
| \ assemblies \nprovide a mechanism to fine tune attractive and repulsive signals\ | |
| \ in a \ncombinatorial manner required during neuronal development. While a mechanistic\ | |
| \ \nunderstanding of Slit mediated Robo signaling has advanced significantly further\ | |
| \ \nstructural studies on larger assemblies are required for the design of new\ | |
| \ \nexperiments to elucidate their role in cell surface receptor complexes. These\ | |
| \ \nwill be necessary to understand the role of Slit -Robo signaling in \nneurogenesis,\ | |
| \ angiogenesis, organ development and cancer progression. In this \nchapter, we\ | |
| \ provide a review of the current knowledge in the field with a \nparticular focus\ | |
| \ on the Roundabout receptor family." | |
| - source_sentence: For the constructions of which organs has 3D printing been tested? | |
| sentences: | |
| - "Objective:To evaluate the value of improved Mallampati grading combined with\ | |
| \ \nNoSAS questionnaire in screening for obstructive sleep apnea (OSA). Method:A\ | |
| \ \ntotal of 344 patients admitted to our hospital for sleep disorders were studied.\ | |
| \ \nAll patients were measured for their height, weight, neck circumference and\ | |
| \ \nother parameters. NoSAS scores, improved Mallampati grading and polysomnography\ | |
| \ \n(PSG) were performed in these patients. According to AHI in PSG monitoring\ | |
| \ \nresults, patients were divided into non-osa group (AHI<5) 93 cases and OSA\ | |
| \ group \n251 cases. The OSA group were divided into mild (AHI 5-15), moderate(AHI\ | |
| \ 16-30) \nand severe OSA group(AHI>30) according to the PSG result. The ROC curve\ | |
| \ was \nplotted to evaluate the screening value of NoSAS and improved Mallampati\ | |
| \ grading \ncombined with NoSAS for OSA. Result:With the NoSAS score of 8 or 9\ | |
| \ as cutoffs \nfor analysis, the sensitivity for OSA was 0.733 and 0.701; the\ | |
| \ specificity for \nOSA was 0.538 and 0.624, respectively. The sensitivity and\ | |
| \ specificity of NoSAS \ncombined with improved Mallampati grading for screening\ | |
| \ OSA were 0.813 and \n0.710, respectively. Conclusion:As a new screening tool,\ | |
| \ NoSAS questionnaire is \nsimple and convenient, and has certain screening value\ | |
| \ to OSA. The improved \nMallampati grading combined with NoSAS questionnaire\ | |
| \ can obviously improve the \nscreening sensitivity and specificity of Osa, and\ | |
| \ has higher application value." | |
| - "The morphology and the functionality of the murid glandular complex, composed\ | |
| \ of \nthe submandibular and sublingual salivary glands (SSC), were the object\ | |
| \ of \nseveral studies conducted mainly using magnetic resonance imaging (MRI).\ | |
| \ Using a \n4.7 T scanner and a manganese-based contrast agent, we improved the\ | |
| \ \nsignal-to-noise ratio of the SSC relating to the surrounding anatomical \n\ | |
| structures allowing to obtain high-contrast 3D images of the SSC. In the last\ | |
| \ \nfew years, the large development in resin melting techniques opened the way\ | |
| \ for \nprinting 3D objects starting from a 3D stack of images. Here, we demonstrate\ | |
| \ the \nfeasibility of the 3D printing technique of soft tissues such as the SSC\ | |
| \ in the \nrat with the aim to improve the visualization of the organs. This approach\ | |
| \ is \nuseful to preserve the real in vivo morphology of the SCC in living animals\ | |
| \ \navoiding the anatomical shape changes due to the lack of relationships with\ | |
| \ the \nsurrounding organs in case of extraction. It is also harmless, repeatable\ | |
| \ and \ncan be applied to explore volumetric changes occurring during body growth,\ | |
| \ \nexcretory duct obstruction, tumorigenesis and regeneration processes. 3D \n\ | |
| printing allows to obtain a solid object with the same shape of the organ of \n\ | |
| interest, which can be observed, freely rotated and manipulated. To increase the\ | |
| \ \nvisibility of the details, it is possible to print the organs with a selected\ | |
| \ \nzoom factor, useful as in case of tiny organs in small mammalia. An immediate\ | |
| \ \napplication of this technique is represented by educational classes." | |
| - "Mobile phone use and risk of acoustic neuroma: results of the interphone \ncase-control\ | |
| \ study in five north European countries [corrected]." | |
| - source_sentence: What is known about the Digit Ratio (2D:4D) cancer? | |
| sentences: | |
| - "Proteins undergo conformational changes during their biological function. As\ | |
| \ \nsuch, a high-resolution structure of a protein's resting conformation provides\ | |
| \ a \nstarting point for elucidating its reaction mechanism, but provides no direct\ | |
| \ \ninformation concerning the protein's conformational dynamics. Several X-ray\ | |
| \ \nmethods have been developed to elucidate those conformational changes that\ | |
| \ occur \nduring a protein's reaction, including time-resolved Laue diffraction\ | |
| \ and \nintermediate trapping studies on three-dimensional protein crystals, and\ | |
| \ \ntime-resolved wide-angle X-ray scattering and X-ray absorption studies on\ | |
| \ \nproteins in the solution phase. This review emphasizes the scope and limitations\ | |
| \ \nof these complementary experimental approaches when seeking to understand\ | |
| \ \nprotein conformational dynamics. These methods are illustrated using a limited\ | |
| \ \nset of examples including myoglobin and haemoglobin in complex with carbon\ | |
| \ \nmonoxide, the simple light-driven proton pump bacteriorhodopsin, and the \n\ | |
| superoxide scavenger superoxide reductase. In conclusion, likely future \ndevelopments\ | |
| \ of these methods at synchrotron X-ray sources and the potential \nimpact of\ | |
| \ emerging X-ray free-electron laser facilities are speculated upon." | |
| - 'Extensive messenger RNA editing generates transcript and protein diversity in | |
| genes involved in neural excitability, as previously described, as well as in | |
| genes participating in a broad range of other cellular functions. ' | |
| - "BACKGROUND: The ratio of the lengths of index and ring fingers (2D:4D) is a \n\ | |
| marker of prenatal exposure to sex hormones, with low 2D:4D being indicative of\ | |
| \ \nhigh prenatal androgen action. Recent studies have reported a strong association\ | |
| \ \nbetween 2D:4D and risk of prostate cancer.\nMETHODS: A total of 6258 men participating\ | |
| \ in the Melbourne Collaborative Cohort \nStudy had 2D:4D assessed. Of these men,\ | |
| \ we identified 686 incident prostate \ncancer cases. Hazard ratios (HRs) and\ | |
| \ confidence intervals (CIs) were estimated \nfor a standard deviation increase\ | |
| \ in 2D:4D.\nRESULTS: No association was observed between 2D:4D and prostate cancer\ | |
| \ risk \noverall (HRs 1.00; 95% CIs, 0.92-1.08 for right, 0.93-1.08 for left).\ | |
| \ We \nobserved a weak inverse association between 2D:4D and risk of prostate\ | |
| \ cancer \nfor age <60, however 95% CIs included unity for all observed ages.\n\ | |
| CONCLUSION: Our results are not consistent with an association between 2D:4D and\ | |
| \ \noverall prostate cancer risk, but we cannot exclude a weak inverse association\ | |
| \ \nbetween 2D:4D and early onset prostate cancer risk." | |
| pipeline_tag: sentence-similarity | |
| library_name: sentence-transformers | |
| metrics: | |
| - cosine_accuracy@1 | |
| - cosine_accuracy@3 | |
| - cosine_accuracy@5 | |
| - cosine_accuracy@10 | |
| - cosine_precision@1 | |
| - cosine_precision@3 | |
| - cosine_precision@5 | |
| - cosine_precision@10 | |
| - cosine_recall@1 | |
| - cosine_recall@3 | |
| - cosine_recall@5 | |
| - cosine_recall@10 | |
| - cosine_ndcg@10 | |
| - cosine_mrr@10 | |
| - cosine_map@100 | |
| model-index: | |
| - name: Biomedical MRL | |
| results: | |
| - task: | |
| type: information-retrieval | |
| name: Information Retrieval | |
| dataset: | |
| name: dim 768 | |
| type: dim_768 | |
| metrics: | |
| - type: cosine_accuracy@1 | |
| value: 0.7397454031117398 | |
| name: Cosine Accuracy@1 | |
| - type: cosine_accuracy@3 | |
| value: 0.8472418670438473 | |
| name: Cosine Accuracy@3 | |
| - type: cosine_accuracy@5 | |
| value: 0.8925035360678925 | |
| name: Cosine Accuracy@5 | |
| - type: cosine_accuracy@10 | |
| value: 0.9292786421499293 | |
| name: Cosine Accuracy@10 | |
| - type: cosine_precision@1 | |
| value: 0.7397454031117398 | |
| name: Cosine Precision@1 | |
| - type: cosine_precision@3 | |
| value: 0.6058462989156059 | |
| name: Cosine Precision@3 | |
| - type: cosine_precision@5 | |
| value: 0.5295615275813296 | |
| name: Cosine Precision@5 | |
| - type: cosine_precision@10 | |
| value: 0.41103253182461097 | |
| name: Cosine Precision@10 | |
| - type: cosine_recall@1 | |
| value: 0.22757153438103173 | |
| name: Cosine Recall@1 | |
| - type: cosine_recall@3 | |
| value: 0.39389351666156774 | |
| name: Cosine Recall@3 | |
| - type: cosine_recall@5 | |
| value: 0.4953500769443452 | |
| name: Cosine Recall@5 | |
| - type: cosine_recall@10 | |
| value: 0.626185395476178 | |
| name: Cosine Recall@10 | |
| - type: cosine_ndcg@10 | |
| value: 0.7036538830306982 | |
| name: Cosine Ndcg@10 | |
| - type: cosine_mrr@10 | |
| value: 0.8041815406030398 | |
| name: Cosine Mrr@10 | |
| - type: cosine_map@100 | |
| value: 0.6499688056459438 | |
| name: Cosine Map@100 | |
| - task: | |
| type: information-retrieval | |
| name: Information Retrieval | |
| dataset: | |
| name: dim 512 | |
| type: dim_512 | |
| metrics: | |
| - type: cosine_accuracy@1 | |
| value: 0.7326732673267327 | |
| name: Cosine Accuracy@1 | |
| - type: cosine_accuracy@3 | |
| value: 0.842998585572843 | |
| name: Cosine Accuracy@3 | |
| - type: cosine_accuracy@5 | |
| value: 0.8882602545968883 | |
| name: Cosine Accuracy@5 | |
| - type: cosine_accuracy@10 | |
| value: 0.9151343705799151 | |
| name: Cosine Accuracy@10 | |
| - type: cosine_precision@1 | |
| value: 0.7326732673267327 | |
| name: Cosine Precision@1 | |
| - type: cosine_precision@3 | |
| value: 0.5964167845355963 | |
| name: Cosine Precision@3 | |
| - type: cosine_precision@5 | |
| value: 0.5278642149929279 | |
| name: Cosine Precision@5 | |
| - type: cosine_precision@10 | |
| value: 0.40990099009900993 | |
| name: Cosine Precision@10 | |
| - type: cosine_recall@1 | |
| value: 0.21918993091456265 | |
| name: Cosine Recall@1 | |
| - type: cosine_recall@3 | |
| value: 0.38673218299790596 | |
| name: Cosine Recall@3 | |
| - type: cosine_recall@5 | |
| value: 0.4915208575777972 | |
| name: Cosine Recall@5 | |
| - type: cosine_recall@10 | |
| value: 0.6229670136489501 | |
| name: Cosine Recall@10 | |
| - type: cosine_ndcg@10 | |
| value: 0.6971415938662006 | |
| name: Cosine Ndcg@10 | |
| - type: cosine_mrr@10 | |
| value: 0.7968989245863362 | |
| name: Cosine Mrr@10 | |
| - type: cosine_map@100 | |
| value: 0.6403253251933015 | |
| name: Cosine Map@100 | |
| - task: | |
| type: information-retrieval | |
| name: Information Retrieval | |
| dataset: | |
| name: dim 256 | |
| type: dim_256 | |
| metrics: | |
| - type: cosine_accuracy@1 | |
| value: 0.7227722772277227 | |
| name: Cosine Accuracy@1 | |
| - type: cosine_accuracy@3 | |
| value: 0.8373408769448374 | |
| name: Cosine Accuracy@3 | |
| - type: cosine_accuracy@5 | |
| value: 0.8769448373408769 | |
| name: Cosine Accuracy@5 | |
| - type: cosine_accuracy@10 | |
| value: 0.9108910891089109 | |
| name: Cosine Accuracy@10 | |
| - type: cosine_precision@1 | |
| value: 0.7227722772277227 | |
| name: Cosine Precision@1 | |
| - type: cosine_precision@3 | |
| value: 0.5893446487505893 | |
| name: Cosine Precision@3 | |
| - type: cosine_precision@5 | |
| value: 0.5131541725601132 | |
| name: Cosine Precision@5 | |
| - type: cosine_precision@10 | |
| value: 0.4048090523338048 | |
| name: Cosine Precision@10 | |
| - type: cosine_recall@1 | |
| value: 0.2165092120706659 | |
| name: Cosine Recall@1 | |
| - type: cosine_recall@3 | |
| value: 0.3843563311047163 | |
| name: Cosine Recall@3 | |
| - type: cosine_recall@5 | |
| value: 0.4706508437641641 | |
| name: Cosine Recall@5 | |
| - type: cosine_recall@10 | |
| value: 0.6082103871285517 | |
| name: Cosine Recall@10 | |
| - type: cosine_ndcg@10 | |
| value: 0.6857315358161504 | |
| name: Cosine Ndcg@10 | |
| - type: cosine_mrr@10 | |
| value: 0.7889281785321389 | |
| name: Cosine Mrr@10 | |
| - type: cosine_map@100 | |
| value: 0.6255397978739031 | |
| name: Cosine Map@100 | |
| - task: | |
| type: information-retrieval | |
| name: Information Retrieval | |
| dataset: | |
| name: dim 128 | |
| type: dim_128 | |
| metrics: | |
| - type: cosine_accuracy@1 | |
| value: 0.7072135785007072 | |
| name: Cosine Accuracy@1 | |
| - type: cosine_accuracy@3 | |
| value: 0.8076379066478077 | |
| name: Cosine Accuracy@3 | |
| - type: cosine_accuracy@5 | |
| value: 0.8458274398868458 | |
| name: Cosine Accuracy@5 | |
| - type: cosine_accuracy@10 | |
| value: 0.8967468175388967 | |
| name: Cosine Accuracy@10 | |
| - type: cosine_precision@1 | |
| value: 0.7072135785007072 | |
| name: Cosine Precision@1 | |
| - type: cosine_precision@3 | |
| value: 0.5605846298915607 | |
| name: Cosine Precision@3 | |
| - type: cosine_precision@5 | |
| value: 0.4876944837340877 | |
| name: Cosine Precision@5 | |
| - type: cosine_precision@10 | |
| value: 0.38189533239038187 | |
| name: Cosine Precision@10 | |
| - type: cosine_recall@1 | |
| value: 0.2131717638221153 | |
| name: Cosine Recall@1 | |
| - type: cosine_recall@3 | |
| value: 0.3571863197583239 | |
| name: Cosine Recall@3 | |
| - type: cosine_recall@5 | |
| value: 0.44275724893253604 | |
| name: Cosine Recall@5 | |
| - type: cosine_recall@10 | |
| value: 0.5763830904405497 | |
| name: Cosine Recall@10 | |
| - type: cosine_ndcg@10 | |
| value: 0.651957768079385 | |
| name: Cosine Ndcg@10 | |
| - type: cosine_mrr@10 | |
| value: 0.7681035450483825 | |
| name: Cosine Mrr@10 | |
| - type: cosine_map@100 | |
| value: 0.5861399094808066 | |
| name: Cosine Map@100 | |
| - task: | |
| type: information-retrieval | |
| name: Information Retrieval | |
| dataset: | |
| name: dim 64 | |
| type: dim_64 | |
| metrics: | |
| - type: cosine_accuracy@1 | |
| value: 0.6435643564356436 | |
| name: Cosine Accuracy@1 | |
| - type: cosine_accuracy@3 | |
| value: 0.7666195190947667 | |
| name: Cosine Accuracy@3 | |
| - type: cosine_accuracy@5 | |
| value: 0.8048090523338048 | |
| name: Cosine Accuracy@5 | |
| - type: cosine_accuracy@10 | |
| value: 0.8415841584158416 | |
| name: Cosine Accuracy@10 | |
| - type: cosine_precision@1 | |
| value: 0.6435643564356436 | |
| name: Cosine Precision@1 | |
| - type: cosine_precision@3 | |
| value: 0.5115511551155115 | |
| name: Cosine Precision@3 | |
| - type: cosine_precision@5 | |
| value: 0.45007072135785003 | |
| name: Cosine Precision@5 | |
| - type: cosine_precision@10 | |
| value: 0.3510608203677511 | |
| name: Cosine Precision@10 | |
| - type: cosine_recall@1 | |
| value: 0.18506567524592368 | |
| name: Cosine Recall@1 | |
| - type: cosine_recall@3 | |
| value: 0.3180821001225782 | |
| name: Cosine Recall@3 | |
| - type: cosine_recall@5 | |
| value: 0.3926270123067019 | |
| name: Cosine Recall@5 | |
| - type: cosine_recall@10 | |
| value: 0.5118404409971898 | |
| name: Cosine Recall@10 | |
| - type: cosine_ndcg@10 | |
| value: 0.5894018468562044 | |
| name: Cosine Ndcg@10 | |
| - type: cosine_mrr@10 | |
| value: 0.7115219685233828 | |
| name: Cosine Mrr@10 | |
| - type: cosine_map@100 | |
| value: 0.5197323616049745 | |
| name: Cosine Map@100 | |
| # Biomedical MRL | |
| This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. | |
| ## Model Details | |
| ### Model Description | |
| - **Model Type:** Sentence Transformer | |
| <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) --> | |
| - **Maximum Sequence Length:** 512 tokens | |
| - **Output Dimensionality:** 768 dimensions | |
| - **Similarity Function:** Cosine Similarity | |
| <!-- - **Training Dataset:** Unknown --> | |
| - **Language:** en | |
| - **License:** apache-2.0 | |
| ### Model Sources | |
| - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) | |
| - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) | |
| - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) | |
| ### Full Model Architecture | |
| ``` | |
| SentenceTransformer( | |
| (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel | |
| (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) | |
| (2): Normalize() | |
| ) | |
| ``` | |
| ## Usage | |
| ### Direct Usage (Sentence Transformers) | |
| First install the Sentence Transformers library: | |
| ```bash | |
| pip install -U sentence-transformers | |
| ``` | |
| Then you can load this model and run inference. | |
| ```python | |
| from sentence_transformers import SentenceTransformer | |
| # Download from the 🤗 Hub | |
| model = SentenceTransformer("potsu-potsu/bge-base-mrl-train40k") | |
| # Run inference | |
| sentences = [ | |
| 'What is known about the Digit Ratio (2D:4D) cancer?', | |
| 'BACKGROUND: The ratio of the lengths of index and ring fingers (2D:4D) is a \nmarker of prenatal exposure to sex hormones, with low 2D:4D being indicative of \nhigh prenatal androgen action. Recent studies have reported a strong association \nbetween 2D:4D and risk of prostate cancer.\nMETHODS: A total of 6258 men participating in the Melbourne Collaborative Cohort \nStudy had 2D:4D assessed. Of these men, we identified 686 incident prostate \ncancer cases. Hazard ratios (HRs) and confidence intervals (CIs) were estimated \nfor a standard deviation increase in 2D:4D.\nRESULTS: No association was observed between 2D:4D and prostate cancer risk \noverall (HRs 1.00; 95% CIs, 0.92-1.08 for right, 0.93-1.08 for left). We \nobserved a weak inverse association between 2D:4D and risk of prostate cancer \nfor age <60, however 95% CIs included unity for all observed ages.\nCONCLUSION: Our results are not consistent with an association between 2D:4D and \noverall prostate cancer risk, but we cannot exclude a weak inverse association \nbetween 2D:4D and early onset prostate cancer risk.', | |
| "Proteins undergo conformational changes during their biological function. As \nsuch, a high-resolution structure of a protein's resting conformation provides a \nstarting point for elucidating its reaction mechanism, but provides no direct \ninformation concerning the protein's conformational dynamics. Several X-ray \nmethods have been developed to elucidate those conformational changes that occur \nduring a protein's reaction, including time-resolved Laue diffraction and \nintermediate trapping studies on three-dimensional protein crystals, and \ntime-resolved wide-angle X-ray scattering and X-ray absorption studies on \nproteins in the solution phase. This review emphasizes the scope and limitations \nof these complementary experimental approaches when seeking to understand \nprotein conformational dynamics. These methods are illustrated using a limited \nset of examples including myoglobin and haemoglobin in complex with carbon \nmonoxide, the simple light-driven proton pump bacteriorhodopsin, and the \nsuperoxide scavenger superoxide reductase. In conclusion, likely future \ndevelopments of these methods at synchrotron X-ray sources and the potential \nimpact of emerging X-ray free-electron laser facilities are speculated upon.", | |
| ] | |
| embeddings = model.encode(sentences) | |
| print(embeddings.shape) | |
| # [3, 768] | |
| # Get the similarity scores for the embeddings | |
| similarities = model.similarity(embeddings, embeddings) | |
| print(similarities.shape) | |
| # [3, 3] | |
| ``` | |
| <!-- | |
| ### Direct Usage (Transformers) | |
| <details><summary>Click to see the direct usage in Transformers</summary> | |
| </details> | |
| --> | |
| <!-- | |
| ### Downstream Usage (Sentence Transformers) | |
| You can finetune this model on your own dataset. | |
| <details><summary>Click to expand</summary> | |
| </details> | |
| --> | |
| <!-- | |
| ### Out-of-Scope Use | |
| *List how the model may foreseeably be misused and address what users ought not to do with the model.* | |
| --> | |
| ## Evaluation | |
| ### Metrics | |
| #### Information Retrieval | |
| * Dataset: `dim_768` | |
| * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: | |
| ```json | |
| { | |
| "truncate_dim": 768 | |
| } | |
| ``` | |
| | Metric | Value | | |
| |:--------------------|:-----------| | |
| | cosine_accuracy@1 | 0.7397 | | |
| | cosine_accuracy@3 | 0.8472 | | |
| | cosine_accuracy@5 | 0.8925 | | |
| | cosine_accuracy@10 | 0.9293 | | |
| | cosine_precision@1 | 0.7397 | | |
| | cosine_precision@3 | 0.6058 | | |
| | cosine_precision@5 | 0.5296 | | |
| | cosine_precision@10 | 0.411 | | |
| | cosine_recall@1 | 0.2276 | | |
| | cosine_recall@3 | 0.3939 | | |
| | cosine_recall@5 | 0.4954 | | |
| | cosine_recall@10 | 0.6262 | | |
| | **cosine_ndcg@10** | **0.7037** | | |
| | cosine_mrr@10 | 0.8042 | | |
| | cosine_map@100 | 0.65 | | |
| #### Information Retrieval | |
| * Dataset: `dim_512` | |
| * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: | |
| ```json | |
| { | |
| "truncate_dim": 512 | |
| } | |
| ``` | |
| | Metric | Value | | |
| |:--------------------|:-----------| | |
| | cosine_accuracy@1 | 0.7327 | | |
| | cosine_accuracy@3 | 0.843 | | |
| | cosine_accuracy@5 | 0.8883 | | |
| | cosine_accuracy@10 | 0.9151 | | |
| | cosine_precision@1 | 0.7327 | | |
| | cosine_precision@3 | 0.5964 | | |
| | cosine_precision@5 | 0.5279 | | |
| | cosine_precision@10 | 0.4099 | | |
| | cosine_recall@1 | 0.2192 | | |
| | cosine_recall@3 | 0.3867 | | |
| | cosine_recall@5 | 0.4915 | | |
| | cosine_recall@10 | 0.623 | | |
| | **cosine_ndcg@10** | **0.6971** | | |
| | cosine_mrr@10 | 0.7969 | | |
| | cosine_map@100 | 0.6403 | | |
| #### Information Retrieval | |
| * Dataset: `dim_256` | |
| * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: | |
| ```json | |
| { | |
| "truncate_dim": 256 | |
| } | |
| ``` | |
| | Metric | Value | | |
| |:--------------------|:-----------| | |
| | cosine_accuracy@1 | 0.7228 | | |
| | cosine_accuracy@3 | 0.8373 | | |
| | cosine_accuracy@5 | 0.8769 | | |
| | cosine_accuracy@10 | 0.9109 | | |
| | cosine_precision@1 | 0.7228 | | |
| | cosine_precision@3 | 0.5893 | | |
| | cosine_precision@5 | 0.5132 | | |
| | cosine_precision@10 | 0.4048 | | |
| | cosine_recall@1 | 0.2165 | | |
| | cosine_recall@3 | 0.3844 | | |
| | cosine_recall@5 | 0.4707 | | |
| | cosine_recall@10 | 0.6082 | | |
| | **cosine_ndcg@10** | **0.6857** | | |
| | cosine_mrr@10 | 0.7889 | | |
| | cosine_map@100 | 0.6255 | | |
| #### Information Retrieval | |
| * Dataset: `dim_128` | |
| * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: | |
| ```json | |
| { | |
| "truncate_dim": 128 | |
| } | |
| ``` | |
| | Metric | Value | | |
| |:--------------------|:----------| | |
| | cosine_accuracy@1 | 0.7072 | | |
| | cosine_accuracy@3 | 0.8076 | | |
| | cosine_accuracy@5 | 0.8458 | | |
| | cosine_accuracy@10 | 0.8967 | | |
| | cosine_precision@1 | 0.7072 | | |
| | cosine_precision@3 | 0.5606 | | |
| | cosine_precision@5 | 0.4877 | | |
| | cosine_precision@10 | 0.3819 | | |
| | cosine_recall@1 | 0.2132 | | |
| | cosine_recall@3 | 0.3572 | | |
| | cosine_recall@5 | 0.4428 | | |
| | cosine_recall@10 | 0.5764 | | |
| | **cosine_ndcg@10** | **0.652** | | |
| | cosine_mrr@10 | 0.7681 | | |
| | cosine_map@100 | 0.5861 | | |
| #### Information Retrieval | |
| * Dataset: `dim_64` | |
| * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: | |
| ```json | |
| { | |
| "truncate_dim": 64 | |
| } | |
| ``` | |
| | Metric | Value | | |
| |:--------------------|:-----------| | |
| | cosine_accuracy@1 | 0.6436 | | |
| | cosine_accuracy@3 | 0.7666 | | |
| | cosine_accuracy@5 | 0.8048 | | |
| | cosine_accuracy@10 | 0.8416 | | |
| | cosine_precision@1 | 0.6436 | | |
| | cosine_precision@3 | 0.5116 | | |
| | cosine_precision@5 | 0.4501 | | |
| | cosine_precision@10 | 0.3511 | | |
| | cosine_recall@1 | 0.1851 | | |
| | cosine_recall@3 | 0.3181 | | |
| | cosine_recall@5 | 0.3926 | | |
| | cosine_recall@10 | 0.5118 | | |
| | **cosine_ndcg@10** | **0.5894** | | |
| | cosine_mrr@10 | 0.7115 | | |
| | cosine_map@100 | 0.5197 | | |
| <!-- | |
| ## Bias, Risks and Limitations | |
| *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* | |
| --> | |
| <!-- | |
| ### Recommendations | |
| *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* | |
| --> | |
| ## Training Details | |
| ### Training Dataset | |
| #### Unnamed Dataset | |
| * Size: 40,482 training samples | |
| * Columns: <code>anchor</code> and <code>positive</code> | |
| * Approximate statistics based on the first 1000 samples: | |
| | | anchor | positive | | |
| |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | |
| | type | string | string | | |
| | details | <ul><li>min: 6 tokens</li><li>mean: 16.0 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 287.89 tokens</li><li>max: 512 tokens</li></ul> | | |
| * Samples: | |
| | anchor | positive | | |
| |:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | |
| | <code>What is the implication of histone lysine methylation in medulloblastoma?</code> | <code>Aberrant patterns of H3K4, H3K9, and H3K27 histone lysine methylation were shown to result in histone code alterations, which induce changes in gene expression, and affect the proliferation rate of cells in medulloblastoma.</code> | | |
| | <code>What is the implication of histone lysine methylation in medulloblastoma?</code> | <code>Recent studies showed frequent mutations in histone H3 lysine 27 (H3K27) <br>demethylases in medulloblastomas of Group 3 and Group 4, suggesting a role for <br>H3K27 methylation in these tumors. Indeed, trimethylated H3K27 (H3K27me3) levels <br>were shown to be higher in Group 3 and 4 tumors compared to WNT and SHH <br>medulloblastomas, also in tumors without detectable mutations in demethylases. <br>Here, we report that polycomb genes, required for H3K27 methylation, are <br>consistently upregulated in Group 3 and 4 tumors. These tumors show high <br>expression of the homeobox transcription factor OTX2. Silencing of OTX2 in D425 <br>medulloblastoma cells resulted in downregulation of polycomb genes such as EZH2, <br>EED, SUZ12 and RBBP4 and upregulation of H3K27 demethylases KDM6A, KDM6B, JARID2 <br>and KDM7A. This was accompanied by decreased H3K27me3 and increased H3K27me1 <br>levels in promoter regions. Strikingly, the decrease of H3K27me3 was most <br>prominent in promoters that bind OTX2. OTX2-bound promoters showe...</code> | | |
| | <code>What is the implication of histone lysine methylation in medulloblastoma?</code> | <code>We used high-resolution SNP genotyping to identify regions of genomic gain and <br>loss in the genomes of 212 medulloblastomas, malignant pediatric brain tumors. <br>We found focal amplifications of 15 known oncogenes and focal deletions of 20 <br>known tumor suppressor genes (TSG), most not previously implicated in <br>medulloblastoma. Notably, we identified previously unknown amplifications and <br>homozygous deletions, including recurrent, mutually exclusive, highly focal <br>genetic events in genes targeting histone lysine methylation, particularly that <br>of histone 3, lysine 9 (H3K9). Post-translational modification of histone <br>proteins is critical for regulation of gene expression, can participate in <br>determination of stem cell fates and has been implicated in carcinogenesis. <br>Consistent with our genetic data, restoration of expression of genes controlling <br>H3K9 methylation greatly diminishes proliferation of medulloblastoma in vitro. <br>Copy number aberrations of genes with critical roles in writing...</code> | | |
| * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: | |
| ```json | |
| { | |
| "loss": "MultipleNegativesRankingLoss", | |
| "matryoshka_dims": [ | |
| 768, | |
| 512, | |
| 256, | |
| 128, | |
| 64 | |
| ], | |
| "matryoshka_weights": [ | |
| 1, | |
| 1, | |
| 1, | |
| 1, | |
| 1 | |
| ], | |
| "n_dims_per_step": -1 | |
| } | |
| ``` | |
| ### Training Hyperparameters | |
| #### Non-Default Hyperparameters | |
| - `eval_strategy`: epoch | |
| - `per_device_train_batch_size`: 32 | |
| - `per_device_eval_batch_size`: 16 | |
| - `gradient_accumulation_steps`: 16 | |
| - `learning_rate`: 2e-05 | |
| - `num_train_epochs`: 4 | |
| - `lr_scheduler_type`: cosine | |
| - `warmup_ratio`: 0.1 | |
| - `bf16`: True | |
| - `tf32`: True | |
| - `load_best_model_at_end`: True | |
| - `optim`: adamw_torch_fused | |
| - `batch_sampler`: no_duplicates | |
| #### All Hyperparameters | |
| <details><summary>Click to expand</summary> | |
| - `overwrite_output_dir`: False | |
| - `do_predict`: False | |
| - `eval_strategy`: epoch | |
| - `prediction_loss_only`: True | |
| - `per_device_train_batch_size`: 32 | |
| - `per_device_eval_batch_size`: 16 | |
| - `per_gpu_train_batch_size`: None | |
| - `per_gpu_eval_batch_size`: None | |
| - `gradient_accumulation_steps`: 16 | |
| - `eval_accumulation_steps`: None | |
| - `torch_empty_cache_steps`: None | |
| - `learning_rate`: 2e-05 | |
| - `weight_decay`: 0.0 | |
| - `adam_beta1`: 0.9 | |
| - `adam_beta2`: 0.999 | |
| - `adam_epsilon`: 1e-08 | |
| - `max_grad_norm`: 1.0 | |
| - `num_train_epochs`: 4 | |
| - `max_steps`: -1 | |
| - `lr_scheduler_type`: cosine | |
| - `lr_scheduler_kwargs`: {} | |
| - `warmup_ratio`: 0.1 | |
| - `warmup_steps`: 0 | |
| - `log_level`: passive | |
| - `log_level_replica`: warning | |
| - `log_on_each_node`: True | |
| - `logging_nan_inf_filter`: True | |
| - `save_safetensors`: True | |
| - `save_on_each_node`: False | |
| - `save_only_model`: False | |
| - `restore_callback_states_from_checkpoint`: False | |
| - `no_cuda`: False | |
| - `use_cpu`: False | |
| - `use_mps_device`: False | |
| - `seed`: 42 | |
| - `data_seed`: None | |
| - `jit_mode_eval`: False | |
| - `use_ipex`: False | |
| - `bf16`: True | |
| - `fp16`: False | |
| - `fp16_opt_level`: O1 | |
| - `half_precision_backend`: auto | |
| - `bf16_full_eval`: False | |
| - `fp16_full_eval`: False | |
| - `tf32`: True | |
| - `local_rank`: 0 | |
| - `ddp_backend`: None | |
| - `tpu_num_cores`: None | |
| - `tpu_metrics_debug`: False | |
| - `debug`: [] | |
| - `dataloader_drop_last`: False | |
| - `dataloader_num_workers`: 0 | |
| - `dataloader_prefetch_factor`: None | |
| - `past_index`: -1 | |
| - `disable_tqdm`: False | |
| - `remove_unused_columns`: True | |
| - `label_names`: None | |
| - `load_best_model_at_end`: True | |
| - `ignore_data_skip`: False | |
| - `fsdp`: [] | |
| - `fsdp_min_num_params`: 0 | |
| - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} | |
| - `fsdp_transformer_layer_cls_to_wrap`: None | |
| - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} | |
| - `deepspeed`: None | |
| - `label_smoothing_factor`: 0.0 | |
| - `optim`: adamw_torch_fused | |
| - `optim_args`: None | |
| - `adafactor`: False | |
| - `group_by_length`: False | |
| - `length_column_name`: length | |
| - `ddp_find_unused_parameters`: None | |
| - `ddp_bucket_cap_mb`: None | |
| - `ddp_broadcast_buffers`: False | |
| - `dataloader_pin_memory`: True | |
| - `dataloader_persistent_workers`: False | |
| - `skip_memory_metrics`: True | |
| - `use_legacy_prediction_loop`: False | |
| - `push_to_hub`: False | |
| - `resume_from_checkpoint`: None | |
| - `hub_model_id`: None | |
| - `hub_strategy`: every_save | |
| - `hub_private_repo`: None | |
| - `hub_always_push`: False | |
| - `gradient_checkpointing`: False | |
| - `gradient_checkpointing_kwargs`: None | |
| - `include_inputs_for_metrics`: False | |
| - `include_for_metrics`: [] | |
| - `eval_do_concat_batches`: True | |
| - `fp16_backend`: auto | |
| - `push_to_hub_model_id`: None | |
| - `push_to_hub_organization`: None | |
| - `mp_parameters`: | |
| - `auto_find_batch_size`: False | |
| - `full_determinism`: False | |
| - `torchdynamo`: None | |
| - `ray_scope`: last | |
| - `ddp_timeout`: 1800 | |
| - `torch_compile`: False | |
| - `torch_compile_backend`: None | |
| - `torch_compile_mode`: None | |
| - `include_tokens_per_second`: False | |
| - `include_num_input_tokens_seen`: False | |
| - `neftune_noise_alpha`: None | |
| - `optim_target_modules`: None | |
| - `batch_eval_metrics`: False | |
| - `eval_on_start`: False | |
| - `use_liger_kernel`: False | |
| - `eval_use_gather_object`: False | |
| - `average_tokens_across_devices`: False | |
| - `prompts`: None | |
| - `batch_sampler`: no_duplicates | |
| - `multi_dataset_batch_sampler`: proportional | |
| </details> | |
| ### Training Logs | |
| | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 | | |
| |:-------:|:-------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:| | |
| | 0.1264 | 10 | 65.1116 | - | - | - | - | - | | |
| | 0.2528 | 20 | 52.0541 | - | - | - | - | - | | |
| | 0.3791 | 30 | 36.0158 | - | - | - | - | - | | |
| | 0.5055 | 40 | 26.0258 | - | - | - | - | - | | |
| | 0.6319 | 50 | 24.2254 | - | - | - | - | - | | |
| | 0.7583 | 60 | 21.8763 | - | - | - | - | - | | |
| | 0.8847 | 70 | 18.0685 | - | - | - | - | - | | |
| | 1.0 | 80 | 17.7443 | 0.7094 | 0.7054 | 0.6895 | 0.6487 | 0.5783 | | |
| | 1.1264 | 90 | 14.5363 | - | - | - | - | - | | |
| | 1.2528 | 100 | 14.1097 | - | - | - | - | - | | |
| | 1.3791 | 110 | 13.5251 | - | - | - | - | - | | |
| | 1.5055 | 120 | 13.3574 | - | - | - | - | - | | |
| | 1.6319 | 130 | 13.3079 | - | - | - | - | - | | |
| | 1.7583 | 140 | 12.926 | - | - | - | - | - | | |
| | 1.8847 | 150 | 12.0388 | - | - | - | - | - | | |
| | 2.0 | 160 | 10.9161 | 0.7063 | 0.7005 | 0.6880 | 0.6514 | 0.5886 | | |
| | 2.1264 | 170 | 10.7059 | - | - | - | - | - | | |
| | 2.2528 | 180 | 10.1178 | - | - | - | - | - | | |
| | 2.3791 | 190 | 10.4664 | - | - | - | - | - | | |
| | 2.5055 | 200 | 10.4824 | - | - | - | - | - | | |
| | 2.6319 | 210 | 10.2784 | - | - | - | - | - | | |
| | 2.7583 | 220 | 9.2031 | - | - | - | - | - | | |
| | 2.8847 | 230 | 8.9788 | - | - | - | - | - | | |
| | 3.0 | 240 | 7.5905 | 0.7027 | 0.6964 | 0.6855 | 0.6515 | 0.5881 | | |
| | 3.1264 | 250 | 8.4637 | - | - | - | - | - | | |
| | 3.2528 | 260 | 9.4921 | - | - | - | - | - | | |
| | 3.3791 | 270 | 9.0615 | - | - | - | - | - | | |
| | 3.5055 | 280 | 9.0181 | - | - | - | - | - | | |
| | 3.6319 | 290 | 8.6193 | - | - | - | - | - | | |
| | 3.7583 | 300 | 8.3741 | - | - | - | - | - | | |
| | 3.8847 | 310 | 8.9504 | - | - | - | - | - | | |
| | **4.0** | **320** | **7.4761** | **0.7037** | **0.6971** | **0.6857** | **0.652** | **0.5894** | | |
| * The bold row denotes the saved checkpoint. | |
| ### Framework Versions | |
| - Python: 3.12.6 | |
| - Sentence Transformers: 4.1.0 | |
| - Transformers: 4.52.4 | |
| - PyTorch: 2.6.0+cu124 | |
| - Accelerate: 1.7.0 | |
| - Datasets: 3.6.0 | |
| - Tokenizers: 0.21.1 | |
| ## Citation | |
| ### BibTeX | |
| #### Sentence Transformers | |
| ```bibtex | |
| @inproceedings{reimers-2019-sentence-bert, | |
| title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", | |
| author = "Reimers, Nils and Gurevych, Iryna", | |
| booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", | |
| month = "11", | |
| year = "2019", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://arxiv.org/abs/1908.10084", | |
| } | |
| ``` | |
| #### MatryoshkaLoss | |
| ```bibtex | |
| @misc{kusupati2024matryoshka, | |
| title={Matryoshka Representation Learning}, | |
| author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, | |
| year={2024}, | |
| eprint={2205.13147}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG} | |
| } | |
| ``` | |
| #### MultipleNegativesRankingLoss | |
| ```bibtex | |
| @misc{henderson2017efficient, | |
| title={Efficient Natural Language Response Suggestion for Smart Reply}, | |
| author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, | |
| year={2017}, | |
| eprint={1705.00652}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL} | |
| } | |
| ``` | |
| <!-- | |
| ## Glossary | |
| *Clearly define terms in order to be accessible across audiences.* | |
| --> | |
| <!-- | |
| ## Model Card Authors | |
| *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* | |
| --> | |
| <!-- | |
| ## Model Card Contact | |
| *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* | |
| --> |