michaelfeil commited on
Commit
34b00ef
1 Parent(s): 9172545

Upload intfloat/e5-small-v2 ctranslate fp16 weights

Browse files
Files changed (3) hide show
  1. README.md +36 -6
  2. modules.json +20 -0
  3. sentence_bert_config.json +4 -0
README.md CHANGED
@@ -2608,17 +2608,37 @@ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on
2608
 
2609
  quantized version of [intfloat/e5-small-v2](https://huggingface.co/intfloat/e5-small-v2)
2610
  ```bash
2611
- pip install hf-hub-ctranslate2>=3.0.0 ctranslate2>=3.16.0
2612
  ```
2613
 
2614
  ```python
2615
  # from transformers import AutoTokenizer
2616
  model_name = "michaelfeil/ct2fast-e5-small-v2"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2617
 
2618
  from hf_hub_ctranslate2 import CT2SentenceTransformer
2619
  model = CT2SentenceTransformer(
2620
- model_name, compute_type="int8_float16", device="cuda",
2621
- repo_contains_ct2=True
2622
  )
2623
  embeddings = model.encode(
2624
  ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
@@ -2632,13 +2652,13 @@ scores = (embeddings @ embeddings.T) * 100
2632
  ```
2633
 
2634
  Checkpoint compatible to [ctranslate2>=3.16.0](https://github.com/OpenNMT/CTranslate2)
2635
- and [hf-hub-ctranslate2>=3.0.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
2636
  - `compute_type=int8_float16` for `device="cuda"`
2637
  - `compute_type=int8` for `device="cpu"`
2638
 
2639
  Converted on 2023-06-18 using
2640
  ```
2641
- ct2-transformers-converter --model intfloat/e5-small-v2 --output_dir ~/tmp-ct2fast-e5-small-v2 --force --copy_files tokenizer.json README.md special_tokens_map.json vocab.txt tokenizer_config.json .gitattributes --trust_remote_code
2642
  ```
2643
 
2644
  # Licence and other remarks:
@@ -2717,4 +2737,14 @@ If you find our paper or models helpful, please consider cite as follows:
2717
 
2718
  ## Limitations
2719
 
2720
- This model only works for English texts. Long texts will be truncated to at most 512 tokens.
 
 
 
 
 
 
 
 
 
 
 
2608
 
2609
  quantized version of [intfloat/e5-small-v2](https://huggingface.co/intfloat/e5-small-v2)
2610
  ```bash
2611
+ pip install hf-hub-ctranslate2>=2.11.0 ctranslate2>=3.16.0
2612
  ```
2613
 
2614
  ```python
2615
  # from transformers import AutoTokenizer
2616
  model_name = "michaelfeil/ct2fast-e5-small-v2"
2617
+ model_name_orig=intfloat/e5-small-v2
2618
+
2619
+ from hf_hub_ctranslate2 import EncoderCT2fromHfHub
2620
+ model = EncoderCT2fromHfHub(
2621
+ # load in int8 on CUDA
2622
+ model_name_or_path=model_name,
2623
+ device="cuda",
2624
+ compute_type="int8_float16",
2625
+ )
2626
+ outputs = model.generate(
2627
+ text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
2628
+ max_length=64,
2629
+ )
2630
+ # perform downstream tasks on outputs
2631
+ outputs["pooler_output"]
2632
+ outputs["last_hidden_state"]
2633
+ outputs["attention_mask"]
2634
+
2635
+ # alternative, use SentenceTransformer Mix-In
2636
+ # for end-to-end Sentence embeddings generation
2637
+ # not pulling from this repo
2638
 
2639
  from hf_hub_ctranslate2 import CT2SentenceTransformer
2640
  model = CT2SentenceTransformer(
2641
+ model_name_orig, compute_type="int8_float16", device="cuda",
 
2642
  )
2643
  embeddings = model.encode(
2644
  ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
 
2652
  ```
2653
 
2654
  Checkpoint compatible to [ctranslate2>=3.16.0](https://github.com/OpenNMT/CTranslate2)
2655
+ and [hf-hub-ctranslate2>=2.11.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
2656
  - `compute_type=int8_float16` for `device="cuda"`
2657
  - `compute_type=int8` for `device="cpu"`
2658
 
2659
  Converted on 2023-06-18 using
2660
  ```
2661
+ ct2-transformers-converter --model intfloat/e5-small-v2 --output_dir ~/tmp-ct2fast-e5-small-v2 --force --copy_files tokenizer.json sentence_bert_config.json README.md modules.json special_tokens_map.json vocab.txt tokenizer_config.json .gitattributes --trust_remote_code
2662
  ```
2663
 
2664
  # Licence and other remarks:
 
2737
 
2738
  ## Limitations
2739
 
2740
+ This model only works for English texts. Long texts will be truncated to at most 512 tokens.
2741
+
2742
+ ## Sentence Transformers
2743
+
2744
+ Below is an example for usage with sentence_transformers. `pip install sentence_transformers~=2.2.2`
2745
+ This is community contributed, and results may vary up to numerical precision.
2746
+ ```python
2747
+ from sentence_transformers import SentenceTransformer
2748
+ model = SentenceTransformer('intfloat/e5-small-v2')
2749
+ embeddings = model.encode(input_texts, normalize_embeddings=True)
2750
+ ```
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }