Spaces:

stevengrove
/

YOLO-World

Running on T4

App Files Files Community

stevengrove commited on Jan 30

Commit

186701e

•

1 Parent(s): d912a42

initial commit

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitignore +127 -0
README.md +141 -11
app.py +61 -0
assets/yolo_arch.png +0 -0
assets/yolo_logo.png +0 -0
configs/deploy/detection_onnxruntime-fp16_dynamic.py +18 -0
configs/deploy/detection_onnxruntime-int8_dynamic.py +20 -0
configs/deploy/detection_onnxruntime_static.py +18 -0
configs/deploy/detection_tensorrt-fp16_static-640x640.py +38 -0
configs/deploy/detection_tensorrt-int8_static-640x640.py +30 -0
configs/finetune_coco/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_coco_finetune.py +183 -0
configs/finetune_coco/yolo_world_m_t2i_bn_2e-4_100e_4x8gpus_coco_finetune.py +183 -0
configs/finetune_coco/yolo_world_s_t2i_bn_2e-4_100e_4x8gpus_coco_finetune.py +183 -0
configs/pretrain/yolo_world_l_dual_3block_l2norm_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py +173 -0
configs/pretrain/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py +182 -0
configs/pretrain/yolo_world_m_dual_3block_l2norm_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py +173 -0
configs/pretrain/yolo_world_m_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py +171 -0
configs/pretrain/yolo_world_s_dual_l2norm_3block_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py +173 -0
configs/pretrain/yolo_world_s_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py +172 -0
configs/scaleup/yolo_world_l_t2i_bn_2e-4_20e_4x8gpus_obj365v1_goldg_train_lvis_minival_s1024.py +216 -0
configs/scaleup/yolo_world_l_t2i_bn_2e-4_20e_4x8gpus_obj365v1_goldg_train_lvis_minival_s1280.py +216 -0
configs/scaleup/yolo_world_l_t2i_bn_2e-4_20e_4x8gpus_obj365v1_goldg_train_lvis_minival_s1280_v2.py +216 -0
deploy/__init__.py +1 -0
deploy/models/__init__.py +4 -0
docs/data.md +19 -0
docs/deploy.md +0 -0
docs/install.md +0 -0
docs/training.md +0 -0
requirements.txt +1 -0
setup.py +190 -0
taiji/drun +35 -0
taiji/erun +23 -0
taiji/etorchrun +51 -0
taiji/jizhi_run_vanilla +105 -0
third_party/mmyolo/.circleci/config.yml +34 -0
third_party/mmyolo/.circleci/docker/Dockerfile +11 -0
third_party/mmyolo/.circleci/test.yml +213 -0
third_party/mmyolo/.dev_scripts/gather_models.py +312 -0
third_party/mmyolo/.dev_scripts/print_registers.py +448 -0
third_party/mmyolo/.github/CODE_OF_CONDUCT.md +76 -0
third_party/mmyolo/.github/CONTRIBUTING.md +1 -0
third_party/mmyolo/.github/ISSUE_TEMPLATE/1-bug-report.yml +67 -0
third_party/mmyolo/.github/ISSUE_TEMPLATE/2-feature-request.yml +32 -0
third_party/mmyolo/.github/ISSUE_TEMPLATE/3-new-model.yml +30 -0
third_party/mmyolo/.github/ISSUE_TEMPLATE/4-documentation.yml +22 -0
third_party/mmyolo/.github/ISSUE_TEMPLATE/5-reimplementation.yml +87 -0
third_party/mmyolo/.github/ISSUE_TEMPLATE/config.yml +9 -0
third_party/mmyolo/.github/pull_request_template.md +25 -0
third_party/mmyolo/.github/workflows/deploy.yml +28 -0
third_party/mmyolo/.gitignore +126 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,127 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/en/_build/
+docs/zh_cn/_build/
+# PyBuilder
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# pyenv
+.python-version
+# celery beat schedule file
+celerybeat-schedule
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+data/
+data
+.vscode
+.idea
+.DS_Store
+# custom
+*.pkl
+*.pkl.json
+*.log.json
+docs/modelzoo_statistics.md
+mmdet/.mim
+work_dirs
+# Pytorch
+*.pth
+*.py~
+*.sh~
+# venus
+venus_run.sh

README.md CHANGED Viewed

@@ -1,11 +1,141 @@
----
-title: YOLO World
-emoji: 🔥
-colorFrom: pink
-colorTo: blue
-sdk: docker
-pinned: false
-license: apache-2.0
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div align="center">
+<center>
+<img width=500px src="./assets/yolo_logo.png">
+</center>
+<br>
+<a href="https://scholar.google.com/citations?hl=zh-CN&user=PH8rJHYAAAAJ">Tianheng Cheng*</a><sup><span>2,3</span></sup>,
+<a href="https://linsong.info/">Lin Song*</a><sup><span>1</span></sup>,
+<a href="">Yixiao Ge</a><sup><span>1,2</span></sup>,
+<a href="">Xinggang Wang</a><sup><span>3</span></sup>,
+<a href="http://eic.hust.edu.cn/professor/liuwenyu/"> Wenyu Liu</a><sup><span>3</span></sup>,
+<a href="">Ying Shan</a><sup><span>1,2</span></sup>
+</br>
+<sup>1</sup> Tencent AI Lab,  <sup>2</sup> ARC Lab, Tencent PCG
+<sup>3</sup> Huazhong University of Science and Technology
+<br>
+<div>
+[![arxiv paper](https://img.shields.io/badge/arXiv-Paper-red)](https://arxiv.org/abs/)
+[![video](https://img.shields.io/badge/🤗HugginngFace-Spaces-orange)](https://huggingface.co/)
+[![license](https://img.shields.io/badge/License-GPLv3.0-blue)](LICENSE)
+</div>
+</div>
+## Updates
+`[2024-1-25]:` We are excited to launch **YOLO-World**, a cutting-edge real-time open-vocabulary object detector.
+## Highlights
+This repo contains the PyTorch implementation, pre-trained weights, and pre-training/fine-tuning code for YOLO-World.
+* YOLO-World is pre-trained on large-scale datasets, including detection, grounding, and image-text datasets.
+* YOLO-World is the next-generation YOLO detector, with a strong open-vocabulary detection capability and grounding ability.
+* YOLO-World presents a *prompt-then-detect* paradigm for efficient user-vocabulary inference, which re-parameterizes vocabulary embeddings as parameters into the model and achieve superior inference speed. You can try to export your own detection model without extra training or fine-tuning in our [online demo]()!
+<center>
+<img width=800px src="./assets/yolo_arch.png">
+</center>
+## Abstract
+The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed. Furthermore, the fine-tuned YOLO-World achieves remarkable performance on several downstream tasks, including object detection and open-vocabulary instance segmentation.
+## Demo
+## Main Results
+We've pre-trained YOLO-World-S/M/L from scratch and evaluate on the `LVIS val-1.0` and `LVIS minival`. We provide the pre-trained model weights and training logs for applications/research or re-producing the results.
+### Zero-shot Inference on LVIS dataset
+| model | Pre-train Data | AP | AP<sub>r</sub> | AP<sub>c</sub> | AP<sub>f</sub> | FPS(V100) | weights | log |
+| :---- | :------------- | :-:| :------------: |:-------------: | :-------: | :-----: | :---: | :---: |
+| [YOLO-World-S](./configs/pretrain/yolo_world_s_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG | 17.6 | 11.9 | 14.5 | 23.2  | - | [wecom](https://drive.weixin.qq.com/s?k=AJEAIQdfAAoREsieRl) | [log]() |
+| [YOLO-World-M](./configs/pretrain/yolo_world_m_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG | 23.5 | 17.2 | 20.4 | 29.6  | - | [wecom](https://drive.weixin.qq.com/s?k=AJEAIQdfAAoj0byBC0) | [log]() |
+| [YOLO-World-L](./configs/pretrain/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py) | O365+GoldG | 25.7 | 18.7 | 22.6 | 32.2 | - | [wecom](https://drive.weixin.qq.com/s?k=AJEAIQdfAAoK06oxO2) | [log]() |
+**NOTE:**
+1. The evaluation results are tested on LVIS minival in a zero-shot manner.
+## Getting started
+### 1. Installation
+YOLO-World is developed based on `torch==1.11.0` `mmyolo==0.6.0` and `mmdetection==3.0.0`.
+```bash
+# install key dependencies
+pip install mmdetection==3.0.0 mmengine transformers
+# clone the repo
+git clone https://xxxx.YOLO-World.git
+cd YOLO-World
+# install mmyolo
+mkdir third_party
+git clone https://github.com/open-mmlab/mmyolo.git
+cd ..
+```
+### 2. Preparing Data
+We provide the details about the pre-training data in [docs/data](./docs/data.md).
+## Training & Evaluation
+We adopt the default [training](./tools/train.py) or [evaluation](./tools/test.py) scripts of [mmyolo](https://github.com/open-mmlab/mmyolo).
+We provide the configs for pre-training and fine-tuning in `configs/pretrain` and `configs/finetune_coco`.
+Training YOLO-World is easy:
+```bash
+chmod +x tools/dist_train.sh
+# sample command for pre-training, use AMP for mixed-precision training
+./tools/dist_train.sh configs/pretrain/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py 8 --amp
+```
+**NOTE:** YOLO-World is pre-trained on 4 nodes with 8 GPUs per node (32 GPUs in total). For pre-training, the `node_rank` and `nnodes` for multi-node training should be specified.
+Evalutating YOLO-World is also easy:
+```bash
+chmod +x tools/dist_test.sh
+./tools/dist_test.sh path/to/config path/to/weights 8
+```
+**NOTE:** We mainly evaluate the performance on LVIS-minival for pre-training.
+## Deployment
+We provide the details about deployment for downstream applications in [docs/deployment](./docs/deploy.md).
+You can directly download the ONNX model through the online [demo]() in Huggingface Spaces 🤗.
+## Acknowledgement
+We sincerely thank [mmyolo](https://github.com/open-mmlab/mmyolo), [mmdetection](https://github.com/open-mmlab/mmdetection), and [transformers](https://github.com/huggingface/transformers) for providing their wonderful code to the community!
+## Citations
+If you find YOLO-World is useful in your research or applications, please consider giving us a star 🌟 and citing it.
+```bibtex
+@article{cheng2024yolow,
+  title={YOLO-World: Real-Time Open-Vocabulary Object Detection},
+  author={Cheng, Tianheng and Song, Lin and Ge, Yixiao and Liu, Wenyu and Wang, Xinggang and Shan, Ying},
+  journal={arXiv preprint arXiv:},
+  year={2024}
+}
+```
+## Licence
+YOLO-World is under the GPL-v3 Licence and is supported for comercial usage.

app.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import argparse
+import os.path as osp
+from mmengine.config import Config, DictAction
+from mmengine.runner import Runner
+from mmengine.dataset import Compose
+from mmyolo.registry import RUNNERS
+from tools.demo import demo
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='YOLO-World Demo')
+    parser.add_argument('--config', default='configs/pretrain/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py')
+    parser.add_argument('--checkpoint', default='model_zoo/yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth')
+    parser.add_argument(
+        '--work-dir',
+        help='the directory to save the file containing evaluation metrics')
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
+    args = parser.parse_args()
+    return args
+if __name__ == '__main__':
+    args = parse_args()
+    # load config
+    cfg = Config.fromfile(args.config)
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+    if args.work_dir is not None:
+        cfg.work_dir = args.work_dir
+    elif cfg.get('work_dir', None) is None:
+        cfg.work_dir = osp.join('./work_dirs',
+                                osp.splitext(osp.basename(args.config))[0])
+    cfg.load_from = args.checkpoint
+    if 'runner_type' not in cfg:
+        runner = Runner.from_cfg(cfg)
+    else:
+        runner = RUNNERS.build(cfg)
+    runner.call_hook('before_run')
+    runner.load_or_resume()
+    pipeline = cfg.test_dataloader.dataset.pipeline
+    runner.pipeline = Compose(pipeline)
+    runner.model.eval()
+    demo(runner, args)

assets/yolo_arch.png ADDED Viewed

assets/yolo_logo.png ADDED Viewed

configs/deploy/detection_onnxruntime-fp16_dynamic.py ADDED Viewed

	@@ -0,0 +1,18 @@

+_base_ = (
+    '../../third_party/mmdeploy/configs/mmdet/detection/'
+    'detection_onnxruntime-fp16_dynamic.py')
+codebase_config = dict(
+    type='mmyolo',
+    task='ObjectDetection',
+    model_type='end2end',
+    post_processing=dict(
+        score_threshold=0.1,
+        confidence_threshold=0.005,
+        iou_threshold=0.3,
+        max_output_boxes_per_class=100,
+        pre_top_k=1000,
+        keep_top_k=100,
+        background_label_id=-1),
+    module=['mmyolo.deploy'])
+backend_config = dict(
+    type='onnxruntime')

configs/deploy/detection_onnxruntime-int8_dynamic.py ADDED Viewed

	@@ -0,0 +1,20 @@

+_base_ = (
+    '../../third_party/mmdeploy/configs/mmdet/detection/'
+    'detection_onnxruntime-fp16_dynamic.py')
+backend_config = dict(
+    precision='int8')
+codebase_config = dict(
+    type='mmyolo',
+    task='ObjectDetection',
+    model_type='end2end',
+    post_processing=dict(
+        score_threshold=0.1,
+        confidence_threshold=0.005,
+        iou_threshold=0.3,
+        max_output_boxes_per_class=100,
+        pre_top_k=1000,
+        keep_top_k=100,
+        background_label_id=-1),
+    module=['mmyolo.deploy'])
+backend_config = dict(
+    type='onnxruntime')

configs/deploy/detection_onnxruntime_static.py ADDED Viewed

	@@ -0,0 +1,18 @@

+_base_ = (
+    '../../third_party/mmyolo/configs/deploy/'
+    'detection_onnxruntime_static.py')
+codebase_config = dict(
+    type='mmyolo',
+    task='ObjectDetection',
+    model_type='end2end',
+    post_processing=dict(
+        score_threshold=0.25,
+        confidence_threshold=0.005,
+        iou_threshold=0.65,
+        max_output_boxes_per_class=200,
+        pre_top_k=1000,
+        keep_top_k=100,
+        background_label_id=-1),
+    module=['mmyolo.deploy'])
+backend_config = dict(
+    type='onnxruntime')

configs/deploy/detection_tensorrt-fp16_static-640x640.py ADDED Viewed

	@@ -0,0 +1,38 @@

+_base_ = (
+    '../../third_party/mmyolo/configs/deploy/'
+    'detection_tensorrt-fp16_static-640x640.py')
+onnx_config = dict(
+    type='onnx',
+    export_params=True,
+    keep_initializers_as_inputs=False,
+    opset_version=11,
+    save_file='end2end.onnx',
+    input_names=['input'],
+    output_names=['dets', 'labels'],
+    input_shape=(640, 640),
+    optimize=True)
+backend_config = dict(
+    type='tensorrt',
+    common_config=dict(fp16_mode=True, max_workspace_size=1 << 34),
+    model_inputs=[
+        dict(
+            input_shapes=dict(
+                input=dict(
+                    min_shape=[1, 3, 640, 640],
+                    opt_shape=[1, 3, 640, 640],
+                    max_shape=[1, 3, 640, 640])))
+    ])
+use_efficientnms = False  # whether to replace TRTBatchedNMS plugin with EfficientNMS plugin # noqa E501
+codebase_config = dict(
+    type='mmyolo',
+    task='ObjectDetection',
+    model_type='end2end',
+    post_processing=dict(
+        score_threshold=0.25,
+        confidence_threshold=0.005,
+        iou_threshold=0.65,
+        max_output_boxes_per_class=100,
+        pre_top_k=1,
+        keep_top_k=1,
+        background_label_id=-1),
+    module=['mmyolo.deploy'])

configs/deploy/detection_tensorrt-int8_static-640x640.py ADDED Viewed

	@@ -0,0 +1,30 @@

+_base_ = [
+    '../../third_party/mmdeploy/configs/mmdet/_base_/base_static.py',
+    '../../third_party/mmdeploy/configs/_base_/backends/tensorrt-int8.py']
+onnx_config = dict(input_shape=(640, 640))
+backend_config = dict(
+    common_config=dict(max_workspace_size=1 << 30),
+    model_inputs=[
+        dict(
+            input_shapes=dict(
+                input=dict(
+                    min_shape=[1, 3, 640, 640],
+                    opt_shape=[1, 3, 640, 640],
+                    max_shape=[1, 3, 640, 640])))
+    ])
+codebase_config = dict(
+    type='mmyolo',
+    task='ObjectDetection',
+    model_type='end2end',
+    post_processing=dict(
+        score_threshold=0.1,
+        confidence_threshold=0.005,
+        iou_threshold=0.3,
+        max_output_boxes_per_class=100,
+        pre_top_k=1000,
+        keep_top_k=100,
+        background_label_id=-1),
+    module=['mmyolo.deploy'])

configs/finetune_coco/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_coco_finetune.py ADDED Viewed

	@@ -0,0 +1,183 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_l_mask-refine_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 80
+num_training_classes = 80
+max_epochs = 80  # Maximum training epochs
+close_mosaic_epochs = 10
+save_epoch_intervals = 5
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-4
+weight_decay = 0.05
+train_batch_size_per_gpu = 16
+load_from = 'weights/yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth'
+persistent_workers = False
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='pretrained_models/clip-vit-base-patch32-projection',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWorldPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              num_csp_blocks=2),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    use_bn_head=True,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+mosaic_affine_transform = [
+    dict(
+        type='MultiModalMosaic',
+        img_scale=_base_.img_scale,
+        pad_val=114.0,
+        pre_transform=_base_.pre_transform),
+    dict(type='YOLOv5CopyPaste', prob=_base_.copypaste_prob),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        max_aspect_ratio=100.,
+        scaling_ratio_range=(1 - _base_.affine_scale,
+                             1 + _base_.affine_scale),
+        # img_scale is (width, height)
+        border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
+        border_val=(114, 114, 114),
+        min_area_ratio=_base_.min_area_ratio,
+        use_mask_refine=_base_.use_mask2refine)
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    *mosaic_affine_transform,
+    dict(
+        type='YOLOv5MultiModalMixUp',
+        prob=_base_.mixup_prob,
+        pre_transform=[*_base_.pre_transform,
+                       *mosaic_affine_transform]),
+    *_base_.last_transform[:-1],
+    *text_transform
+]
+train_pipeline_stage2 = [
+    *_base_.train_pipeline_stage2[:-1],
+    *text_transform
+]
+coco_train_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5CocoDataset',
+        data_root='data/coco',
+        ann_file='annotations/instances_train2017.json',
+        data_prefix=dict(img='train2017/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/coco_class_captions.json',
+    pipeline=train_pipeline)
+train_dataloader = dict(
+    persistent_workers=persistent_workers,
+    batch_size=train_batch_size_per_gpu,
+    collate_fn=dict(type='yolow_collate'),
+    dataset=coco_train_dataset)
+test_pipeline = [
+    *_base_.test_pipeline[:-1],
+    dict(type='LoadTextFixed'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5CocoDataset',
+        data_root='data/coco',
+        ann_file='annotations/instances_val2017.json',
+        data_prefix=dict(img='val2017/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/coco_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+# training settings
+default_hooks = dict(
+    param_scheduler=dict(
+        scheduler_type='linear',
+        lr_factor=0.01,
+        max_epochs=max_epochs),
+    checkpoint=dict(
+        max_keep_ckpts=-1,
+        save_best=None,
+        interval=save_epoch_intervals))
+custom_hooks = [
+    dict(
+        type='EMAHook',
+        ema_type='ExpMomentumEMA',
+        momentum=0.0001,
+        update_buffers=True,
+        strict_load=False,
+        priority=49),
+    dict(
+        type='mmdet.PipelineSwitchHook',
+        switch_epoch=max_epochs - close_mosaic_epochs,
+        switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(
+    max_epochs=max_epochs,
+    val_interval=5,
+    dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                        _base_.val_interval_stage2)])
+optim_wrapper = dict(
+    optimizer=dict(
+        _delete_=True,
+        type='AdamW',
+        lr=base_lr,
+        weight_decay=weight_decay,
+        batch_size_per_gpu=train_batch_size_per_gpu),
+    paramwise_cfg=dict(
+        bias_decay_mult=0.0,
+        norm_decay_mult=0.0,
+        custom_keys={'backbone.text_model': dict(lr_mult=0.01),
+                     'logit_scale': dict(weight_decay=0.0)}),
+    constructor='YOLOWv5OptimizerConstructor')
+# evaluation settings
+val_evaluator = dict(
+    _delete_=True,
+    type='mmdet.CocoMetric',
+    proposal_nums=(100, 1, 10),
+    ann_file='data/coco/annotations/instances_val2017.json',
+    metric='bbox')
+test_evaluator = val_evaluator

configs/finetune_coco/yolo_world_m_t2i_bn_2e-4_100e_4x8gpus_coco_finetune.py ADDED Viewed

	@@ -0,0 +1,183 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_m_mask-refine_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 80
+num_training_classes = 80
+max_epochs = 80  # Maximum training epochs
+close_mosaic_epochs = 10
+save_epoch_intervals = 5
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-4
+weight_decay = 0.05
+train_batch_size_per_gpu = 16
+load_from = 'weights/yolow-v8_m_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth'
+persistent_workers = False
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='pretrained_models/clip-vit-base-patch32-projection',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWorldPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              num_csp_blocks=2),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    use_bn_head=True,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+mosaic_affine_transform = [
+    dict(
+        type='MultiModalMosaic',
+        img_scale=_base_.img_scale,
+        pad_val=114.0,
+        pre_transform=_base_.pre_transform),
+    dict(type='YOLOv5CopyPaste', prob=_base_.copypaste_prob),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        max_aspect_ratio=100.,
+        scaling_ratio_range=(1 - _base_.affine_scale,
+                             1 + _base_.affine_scale),
+        # img_scale is (width, height)
+        border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
+        border_val=(114, 114, 114),
+        min_area_ratio=_base_.min_area_ratio,
+        use_mask_refine=_base_.use_mask2refine)
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    *mosaic_affine_transform,
+    dict(
+        type='YOLOv5MultiModalMixUp',
+        prob=_base_.mixup_prob,
+        pre_transform=[*_base_.pre_transform,
+                       *mosaic_affine_transform]),
+    *_base_.last_transform[:-1],
+    *text_transform
+]
+train_pipeline_stage2 = [
+    *_base_.train_pipeline_stage2[:-1],
+    *text_transform
+]
+coco_train_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5CocoDataset',
+        data_root='data/coco',
+        ann_file='annotations/instances_train2017.json',
+        data_prefix=dict(img='train2017/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/coco_class_captions.json',
+    pipeline=train_pipeline)
+train_dataloader = dict(
+    persistent_workers=persistent_workers,
+    batch_size=train_batch_size_per_gpu,
+    collate_fn=dict(type='yolow_collate'),
+    dataset=coco_train_dataset)
+test_pipeline = [
+    *_base_.test_pipeline[:-1],
+    dict(type='LoadTextFixed'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5CocoDataset',
+        data_root='data/coco',
+        ann_file='annotations/instances_val2017.json',
+        data_prefix=dict(img='val2017/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/coco_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+# training settings
+default_hooks = dict(
+    param_scheduler=dict(
+        scheduler_type='linear',
+        lr_factor=0.01,
+        max_epochs=max_epochs),
+    checkpoint=dict(
+        max_keep_ckpts=-1,
+        save_best=None,
+        interval=save_epoch_intervals))
+custom_hooks = [
+    dict(
+        type='EMAHook',
+        ema_type='ExpMomentumEMA',
+        momentum=0.0001,
+        update_buffers=True,
+        strict_load=False,
+        priority=49),
+    dict(
+        type='mmdet.PipelineSwitchHook',
+        switch_epoch=max_epochs - close_mosaic_epochs,
+        switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(
+    max_epochs=max_epochs,
+    val_interval=5,
+    dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                        _base_.val_interval_stage2)])
+optim_wrapper = dict(
+    optimizer=dict(
+        _delete_=True,
+        type='AdamW',
+        lr=base_lr,
+        weight_decay=weight_decay,
+        batch_size_per_gpu=train_batch_size_per_gpu),
+    paramwise_cfg=dict(
+        bias_decay_mult=0.0,
+        norm_decay_mult=0.0,
+        custom_keys={'backbone.text_model': dict(lr_mult=0.01),
+                     'logit_scale': dict(weight_decay=0.0)}),
+    constructor='YOLOWv5OptimizerConstructor')
+# evaluation settings
+val_evaluator = dict(
+    _delete_=True,
+    type='mmdet.CocoMetric',
+    proposal_nums=(100, 1, 10),
+    ann_file='data/coco/annotations/instances_val2017.json',
+    metric='bbox')
+test_evaluator = val_evaluator

configs/finetune_coco/yolo_world_s_t2i_bn_2e-4_100e_4x8gpus_coco_finetune.py ADDED Viewed

	@@ -0,0 +1,183 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_s_mask-refine_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 80
+num_training_classes = 80
+max_epochs = 80  # Maximum training epochs
+close_mosaic_epochs = 10
+save_epoch_intervals = 5
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-4
+weight_decay = 0.05
+train_batch_size_per_gpu = 16
+load_from = 'weights/yolow-v8_s_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth'
+persistent_workers = False
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='pretrained_models/clip-vit-base-patch32-projection',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWorldPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              num_csp_blocks=2),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    use_bn_head=True,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+mosaic_affine_transform = [
+    dict(
+        type='MultiModalMosaic',
+        img_scale=_base_.img_scale,
+        pad_val=114.0,
+        pre_transform=_base_.pre_transform),
+    dict(type='YOLOv5CopyPaste', prob=_base_.copypaste_prob),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        max_aspect_ratio=100.,
+        scaling_ratio_range=(1 - _base_.affine_scale,
+                             1 + _base_.affine_scale),
+        # img_scale is (width, height)
+        border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
+        border_val=(114, 114, 114),
+        min_area_ratio=_base_.min_area_ratio,
+        use_mask_refine=_base_.use_mask2refine)
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    *mosaic_affine_transform,
+    dict(
+        type='YOLOv5MultiModalMixUp',
+        prob=_base_.mixup_prob,
+        pre_transform=[*_base_.pre_transform,
+                       *mosaic_affine_transform]),
+    *_base_.last_transform[:-1],
+    *text_transform
+]
+train_pipeline_stage2 = [
+    *_base_.train_pipeline_stage2[:-1],
+    *text_transform
+]
+coco_train_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5CocoDataset',
+        data_root='data/coco',
+        ann_file='annotations/instances_train2017.json',
+        data_prefix=dict(img='train2017/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/coco_class_captions.json',
+    pipeline=train_pipeline)
+train_dataloader = dict(
+    persistent_workers=persistent_workers,
+    batch_size=train_batch_size_per_gpu,
+    collate_fn=dict(type='yolow_collate'),
+    dataset=coco_train_dataset)
+test_pipeline = [
+    *_base_.test_pipeline[:-1],
+    dict(type='LoadTextFixed'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5CocoDataset',
+        data_root='data/coco',
+        ann_file='annotations/instances_val2017.json',
+        data_prefix=dict(img='val2017/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/coco_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+# training settings
+default_hooks = dict(
+    param_scheduler=dict(
+        scheduler_type='linear',
+        lr_factor=0.01,
+        max_epochs=max_epochs),
+    checkpoint=dict(
+        max_keep_ckpts=-1,
+        save_best=None,
+        interval=save_epoch_intervals))
+custom_hooks = [
+    dict(
+        type='EMAHook',
+        ema_type='ExpMomentumEMA',
+        momentum=0.0001,
+        update_buffers=True,
+        strict_load=False,
+        priority=49),
+    dict(
+        type='mmdet.PipelineSwitchHook',
+        switch_epoch=max_epochs - close_mosaic_epochs,
+        switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(
+    max_epochs=max_epochs,
+    val_interval=5,
+    dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                        _base_.val_interval_stage2)])
+optim_wrapper = dict(
+    optimizer=dict(
+        _delete_=True,
+        type='AdamW',
+        lr=base_lr,
+        weight_decay=weight_decay,
+        batch_size_per_gpu=train_batch_size_per_gpu),
+    paramwise_cfg=dict(
+        bias_decay_mult=0.0,
+        norm_decay_mult=0.0,
+        custom_keys={'backbone.text_model': dict(lr_mult=0.01),
+                     'logit_scale': dict(weight_decay=0.0)}),
+    constructor='YOLOWv5OptimizerConstructor')
+# evaluation settings
+val_evaluator = dict(
+    _delete_=True,
+    type='mmdet.CocoMetric',
+    proposal_nums=(100, 1, 10),
+    ann_file='data/coco/annotations/instances_val2017.json',
+    metric='bbox')
+test_evaluator = val_evaluator

configs/pretrain/yolo_world_l_dual_3block_l2norm_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py ADDED Viewed

	@@ -0,0 +1,173 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_l_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 1203
+num_training_classes = 80
+max_epochs = 100  # Maximum training epochs
+close_mosaic_epochs = 2
+save_epoch_intervals = 2
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-3
+weight_decay = 0.05 / 2
+train_batch_size_per_gpu = 16
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='pretrained_models/clip-vit-base-patch32-projection',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWolrdDualPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              text_enhancder=dict(type='ImagePoolingAttentionModule',
+                                  embed_channels=256,
+                                  num_heads=8)),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    dict(type='MultiModalMosaic',
+         img_scale=_base_.img_scale,
+         pad_val=114.0,
+         pre_transform=_base_.pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+train_pipeline_stage2 = [*_base_.train_pipeline_stage2[:-1], *text_transform]
+obj365v1_train_dataset = dict(
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5Objects365V1Dataset',
+        data_root='data/objects365v1/',
+        ann_file='annotations/objects365_train.json',
+        data_prefix=dict(img='train/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/obj365v1_class_captions.json',
+    pipeline=train_pipeline)
+mg_train_dataset = dict(type='YOLOv5MixedGroundingDataset',
+                        data_root='data/mixed_grounding/',
+                        ann_file='annotations/final_mixed_train_no_coco.json',
+                        data_prefix=dict(img='gqa/images/'),
+                        filter_cfg=dict(filter_empty_gt=False, min_size=32),
+                        pipeline=train_pipeline)
+flickr_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/flickr/',
+    ann_file='annotations/final_flickr_separateGT_train.json',
+    data_prefix=dict(img='full_images/'),
+    filter_cfg=dict(filter_empty_gt=True, min_size=32),
+    pipeline=train_pipeline)
+train_dataloader = dict(batch_size=train_batch_size_per_gpu,
+                        collate_fn=dict(type='yolow_collate'),
+                        dataset=dict(_delete_=True,
+                                     type='ConcatDataset',
+                                     datasets=[
+                                         obj365v1_train_dataset,
+                                         flickr_train_dataset, mg_train_dataset
+                                     ],
+                                     ignore_keys=['classes', 'palette']))
+test_pipeline = [
+    *_base_.test_pipeline[:-1],
+    dict(type='LoadText'),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                    'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(type='YOLOv5LVISV1Dataset',
+                 data_root='data/coco/',
+                 test_mode=True,
+                 ann_file='lvis/lvis_v1_minival_inserted_image_name.json',
+                 data_prefix=dict(img=''),
+                 batch_shapes_cfg=None),
+    class_text_path='data/captions/lvis_v1_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+val_evaluator = dict(type='mmdet.LVISMetric',
+                     ann_file='data/coco/lvis/\
+                         lvis_v1_minival_inserted_image_name.json',
+                     metric='bbox')
+test_evaluator = val_evaluator
+# training settings
+default_hooks = dict(param_scheduler=dict(max_epochs=max_epochs),
+                     checkpoint=dict(interval=save_epoch_intervals,
+                                     rule='greater'))
+custom_hooks = [
+    dict(type='EMAHook',
+         ema_type='ExpMomentumEMA',
+         momentum=0.0001,
+         update_buffers=True,
+         strict_load=False,
+         priority=49),
+    dict(type='mmdet.PipelineSwitchHook',
+         switch_epoch=max_epochs - close_mosaic_epochs,
+         switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(max_epochs=max_epochs,
+                 val_interval=10,
+                 dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                                     _base_.val_interval_stage2)])
+optim_wrapper = dict(optimizer=dict(
+    _delete_=True,
+    type='AdamW',
+    lr=base_lr,
+    weight_decay=weight_decay,
+    batch_size_per_gpu=train_batch_size_per_gpu),
+                     paramwise_cfg=dict(bias_decay_mult=0.0,
+                                        norm_decay_mult=0.0,
+                                        custom_keys={
+                                            'backbone.text_model':
+                                            dict(lr_mult=0.01),
+                                            'logit_scale':
+                                            dict(weight_decay=0.0)
+                                        }),
+                     constructor='YOLOWv5OptimizerConstructor')

configs/pretrain/yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py ADDED Viewed

	@@ -0,0 +1,182 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_l_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 1203
+num_training_classes = 80
+max_epochs = 100  # Maximum training epochs
+close_mosaic_epochs = 2
+save_epoch_intervals = 2
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-3
+weight_decay = 0.05 / 2
+train_batch_size_per_gpu = 16
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='openai/clip-vit-base-patch32',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWorldPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              num_csp_blocks=2),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    use_bn_head=True,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    dict(type='MultiModalMosaic',
+         img_scale=_base_.img_scale,
+         pad_val=114.0,
+         pre_transform=_base_.pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+train_pipeline_stage2 = [*_base_.train_pipeline_stage2[:-1], *text_transform]
+obj365v1_train_dataset = dict(
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5Objects365V1Dataset',
+        data_root='data/objects365v1/',
+        ann_file='annotations/objects365_train.json',
+        data_prefix=dict(img='train/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/obj365v1_class_captions.json',
+    pipeline=train_pipeline)
+mg_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/mixed_grounding/',
+    ann_file='annotations/final_mixed_train_no_coco.json',
+    data_prefix=dict(img='gqa/images/'),
+    filter_cfg=dict(filter_empty_gt=False, min_size=32),
+    pipeline=train_pipeline)
+flickr_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/flickr/',
+    ann_file='annotations/final_flickr_separateGT_train.json',
+    data_prefix=dict(img='images/'),
+    filter_cfg=dict(filter_empty_gt=True, min_size=32),
+    pipeline=train_pipeline)
+train_dataloader = dict(
+    batch_size=train_batch_size_per_gpu,
+    collate_fn=dict(type='yolow_collate'),
+    dataset=dict(
+        _delete_=True,
+        type='ConcatDataset',
+        datasets=[
+            obj365v1_train_dataset,
+            flickr_train_dataset,
+            mg_train_dataset
+        ],
+        ignore_keys=['classes', 'palette']))
+test_pipeline = [
+    *_base_.test_pipeline[:-1],
+    dict(type='LoadText'),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                    'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5LVISV1Dataset',
+        data_root='data/lvis/',
+        test_mode=True,
+        ann_file='annotations/'
+                 'lvis_v1_minival_inserted_image_name.json',
+        data_prefix=dict(img=''),
+        batch_shapes_cfg=None),
+    class_text_path='data/captions/lvis_v1_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+val_evaluator = dict(
+    type='mmdet.LVISMetric',
+    ann_file='data/lvis/annotations/'
+             'lvis_v1_minival_inserted_image_name.json',
+    metric='bbox')
+test_evaluator = val_evaluator
+# training settings
+default_hooks = dict(
+    param_scheduler=dict(max_epochs=max_epochs),
+    checkpoint=dict(interval=save_epoch_intervals,
+                    rule='greater'))
+custom_hooks = [
+    dict(type='EMAHook',
+         ema_type='ExpMomentumEMA',
+         momentum=0.0001,
+         update_buffers=True,
+         strict_load=False,
+         priority=49),
+    dict(type='mmdet.PipelineSwitchHook',
+         switch_epoch=max_epochs - close_mosaic_epochs,
+         switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(
+    max_epochs=max_epochs,
+    val_interval=10,
+    dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                        _base_.val_interval_stage2)])
+optim_wrapper = dict(optimizer=dict(
+    _delete_=True,
+    type='AdamW',
+    lr=base_lr,
+    weight_decay=weight_decay,
+    batch_size_per_gpu=train_batch_size_per_gpu),
+    paramwise_cfg=dict(
+        bias_decay_mult=0.0,
+        norm_decay_mult=0.0,
+        custom_keys={
+            'backbone.text_model':
+            dict(lr_mult=0.01),
+            'logit_scale':
+            dict(weight_decay=0.0)
+        }),
+    constructor='YOLOWv5OptimizerConstructor')

configs/pretrain/yolo_world_m_dual_3block_l2norm_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py ADDED Viewed

	@@ -0,0 +1,173 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_m_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 1203
+num_training_classes = 80
+max_epochs = 100  # Maximum training epochs
+close_mosaic_epochs = 2
+save_epoch_intervals = 2
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-3
+weight_decay = 0.05 / 2
+train_batch_size_per_gpu = 16
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='pretrained_models/clip-vit-base-patch32-projection',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWolrdDualPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              text_enhancder=dict(type='ImagePoolingAttentionModule',
+                                  embed_channels=256,
+                                  num_heads=8)),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    dict(type='MultiModalMosaic',
+         img_scale=_base_.img_scale,
+         pad_val=114.0,
+         pre_transform=_base_.pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+train_pipeline_stage2 = [*_base_.train_pipeline_stage2[:-1], *text_transform]
+obj365v1_train_dataset = dict(
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5Objects365V1Dataset',
+        data_root='data/objects365v1/',
+        ann_file='annotations/objects365_train.json',
+        data_prefix=dict(img='train/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/obj365v1_class_captions.json',
+    pipeline=train_pipeline)
+mg_train_dataset = dict(type='YOLOv5MixedGroundingDataset',
+                        data_root='data/mixed_grounding/',
+                        ann_file='annotations/final_mixed_train_no_coco.json',
+                        data_prefix=dict(img='gqa/images/'),
+                        filter_cfg=dict(filter_empty_gt=False, min_size=32),
+                        pipeline=train_pipeline)
+flickr_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/flickr/',
+    ann_file='annotations/final_flickr_separateGT_train.json',
+    data_prefix=dict(img='full_images/'),
+    filter_cfg=dict(filter_empty_gt=True, min_size=32),
+    pipeline=train_pipeline)
+train_dataloader = dict(batch_size=train_batch_size_per_gpu,
+                        collate_fn=dict(type='yolow_collate'),
+                        dataset=dict(_delete_=True,
+                                     type='ConcatDataset',
+                                     datasets=[
+                                         obj365v1_train_dataset,
+                                         flickr_train_dataset, mg_train_dataset
+                                     ],
+                                     ignore_keys=['classes', 'palette']))
+test_pipeline = [
+    *_base_.test_pipeline[:-1],
+    dict(type='LoadText'),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                    'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(type='YOLOv5LVISV1Dataset',
+                 data_root='data/coco/',
+                 test_mode=True,
+                 ann_file='lvis/lvis_v1_minival_inserted_image_name.json',
+                 data_prefix=dict(img=''),
+                 batch_shapes_cfg=None),
+    class_text_path='data/captions/lvis_v1_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+val_evaluator = dict(type='mmdet.LVISMetric',
+                     ann_file='data/coco/lvis/\
+                         lvis_v1_minival_inserted_image_name.json',
+                     metric='bbox')
+test_evaluator = val_evaluator
+# training settings
+default_hooks = dict(param_scheduler=dict(max_epochs=max_epochs),
+                     checkpoint=dict(interval=save_epoch_intervals,
+                                     rule='greater'))
+custom_hooks = [
+    dict(type='EMAHook',
+         ema_type='ExpMomentumEMA',
+         momentum=0.0001,
+         update_buffers=True,
+         strict_load=False,
+         priority=49),
+    dict(type='mmdet.PipelineSwitchHook',
+         switch_epoch=max_epochs - close_mosaic_epochs,
+         switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(max_epochs=max_epochs,
+                 val_interval=10,
+                 dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                                     _base_.val_interval_stage2)])
+optim_wrapper = dict(optimizer=dict(
+    _delete_=True,
+    type='AdamW',
+    lr=base_lr,
+    weight_decay=weight_decay,
+    batch_size_per_gpu=train_batch_size_per_gpu),
+                     paramwise_cfg=dict(bias_decay_mult=0.0,
+                                        norm_decay_mult=0.0,
+                                        custom_keys={
+                                            'backbone.text_model':
+                                            dict(lr_mult=0.01),
+                                            'logit_scale':
+                                            dict(weight_decay=0.0)
+                                        }),
+                     constructor='YOLOWv5OptimizerConstructor')

configs/pretrain/yolo_world_m_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py ADDED Viewed

	@@ -0,0 +1,171 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_m_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 1203
+num_training_classes = 80
+max_epochs = 100  # Maximum training epochs
+close_mosaic_epochs = 2
+save_epoch_intervals = 2
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-3
+weight_decay = 0.05 / 2
+train_batch_size_per_gpu = 16
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='pretrained_models/clip-vit-base-patch32-projection',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWorldPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              num_csp_blocks=2),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    use_bn_head=True,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    dict(type='MultiModalMosaic',
+         img_scale=_base_.img_scale,
+         pad_val=114.0,
+         pre_transform=_base_.pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+train_pipeline_stage2 = [*_base_.train_pipeline_stage2[:-1], *text_transform]
+obj365v1_train_dataset = dict(
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5Objects365V1Dataset',
+        data_root='data/objects365v1/',
+        ann_file='annotations/objects365_train.json',
+        data_prefix=dict(img='train/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/obj365v1_class_captions.json',
+    pipeline=train_pipeline)
+mg_train_dataset = dict(type='YOLOv5MixedGroundingDataset',
+                        data_root='data/mixed_grounding/',
+                        ann_file='annotations/final_mixed_train_no_coco.json',
+                        data_prefix=dict(img='gqa/images/'),
+                        filter_cfg=dict(filter_empty_gt=False, min_size=32),
+                        pipeline=train_pipeline)
+flickr_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/flickr/',
+    ann_file='annotations/final_flickr_separateGT_train.json',
+    data_prefix=dict(img='full_images/'),
+    filter_cfg=dict(filter_empty_gt=True, min_size=32),
+    pipeline=train_pipeline)
+train_dataloader = dict(batch_size=train_batch_size_per_gpu,
+                        collate_fn=dict(type='yolow_collate'),
+                        dataset=dict(_delete_=True,
+                                     type='ConcatDataset',
+                                     datasets=[
+                                         obj365v1_train_dataset,
+                                         flickr_train_dataset, mg_train_dataset
+                                     ],
+                                     ignore_keys=['classes', 'palette']))
+test_pipeline = [
+    *_base_.test_pipeline[:-1],
+    dict(type='LoadText'),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                    'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(type='YOLOv5LVISV1Dataset',
+                 data_root='data/coco/',
+                 test_mode=True,
+                 ann_file='lvis/lvis_v1_minival_inserted_image_name.json',
+                 data_prefix=dict(img=''),
+                 batch_shapes_cfg=None),
+    class_text_path='data/captions/lvis_v1_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+val_evaluator = dict(type='mmdet.LVISMetric',
+                     ann_file='data/coco/lvis/lvis_v1_minival_inserted_image_name.json',
+                     metric='bbox')
+test_evaluator = val_evaluator
+# training settings
+default_hooks = dict(param_scheduler=dict(max_epochs=max_epochs),
+                     checkpoint=dict(interval=save_epoch_intervals,
+                                     rule='greater'))
+custom_hooks = [
+    dict(type='EMAHook',
+         ema_type='ExpMomentumEMA',
+         momentum=0.0001,
+         update_buffers=True,
+         strict_load=False,
+         priority=49),
+    dict(type='mmdet.PipelineSwitchHook',
+         switch_epoch=max_epochs - close_mosaic_epochs,
+         switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(max_epochs=max_epochs,
+                 val_interval=10,
+                 dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                                     _base_.val_interval_stage2)])
+optim_wrapper = dict(optimizer=dict(
+    _delete_=True,
+    type='AdamW',
+    lr=base_lr,
+    weight_decay=weight_decay,
+    batch_size_per_gpu=train_batch_size_per_gpu),
+                     paramwise_cfg=dict(bias_decay_mult=0.0,
+                                        norm_decay_mult=0.0,
+                                        custom_keys={
+                                            'backbone.text_model':
+                                            dict(lr_mult=0.01),
+                                            'logit_scale':
+                                            dict(weight_decay=0.0)
+                                        }),
+                     constructor='YOLOWv5OptimizerConstructor')

configs/pretrain/yolo_world_s_dual_l2norm_3block_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py ADDED Viewed

	@@ -0,0 +1,173 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_s_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 1203
+num_training_classes = 80
+max_epochs = 100  # Maximum training epochs
+close_mosaic_epochs = 2
+save_epoch_intervals = 2
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-3
+weight_decay = 0.05 / 2
+train_batch_size_per_gpu = 16
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='pretrained_models/clip-vit-base-patch32-projection',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWolrdDualPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              text_enhancder=dict(type='ImagePoolingAttentionModule',
+                                  embed_channels=256,
+                                  num_heads=8)),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    dict(type='MultiModalMosaic',
+         img_scale=_base_.img_scale,
+         pad_val=114.0,
+         pre_transform=_base_.pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+train_pipeline_stage2 = [*_base_.train_pipeline_stage2[:-1], *text_transform]
+obj365v1_train_dataset = dict(
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5Objects365V1Dataset',
+        data_root='data/objects365v1/',
+        ann_file='annotations/objects365_train.json',
+        data_prefix=dict(img='train/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/obj365v1_class_captions.json',
+    pipeline=train_pipeline)
+mg_train_dataset = dict(type='YOLOv5MixedGroundingDataset',
+                        data_root='data/mixed_grounding/',
+                        ann_file='annotations/final_mixed_train_no_coco.json',
+                        data_prefix=dict(img='gqa/images/'),
+                        filter_cfg=dict(filter_empty_gt=False, min_size=32),
+                        pipeline=train_pipeline)
+flickr_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/flickr/',
+    ann_file='annotations/final_flickr_separateGT_train.json',
+    data_prefix=dict(img='full_images/'),
+    filter_cfg=dict(filter_empty_gt=True, min_size=32),
+    pipeline=train_pipeline)
+train_dataloader = dict(batch_size=train_batch_size_per_gpu,
+                        collate_fn=dict(type='yolow_collate'),
+                        dataset=dict(_delete_=True,
+                                     type='ConcatDataset',
+                                     datasets=[
+                                         obj365v1_train_dataset,
+                                         flickr_train_dataset, mg_train_dataset
+                                     ],
+                                     ignore_keys=['classes', 'palette']))
+test_pipeline = [
+    *_base_.test_pipeline[:-1],
+    dict(type='LoadText'),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                    'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(type='YOLOv5LVISV1Dataset',
+                 data_root='data/coco/',
+                 test_mode=True,
+                 ann_file='lvis/lvis_v1_minival_inserted_image_name.json',
+                 data_prefix=dict(img=''),
+                 batch_shapes_cfg=None),
+    class_text_path='data/captions/lvis_v1_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+val_evaluator = dict(type='mmdet.LVISMetric',
+                     ann_file='data/coco/lvis/\
+                         lvis_v1_minival_inserted_image_name.json',
+                     metric='bbox')
+test_evaluator = val_evaluator
+# training settings
+default_hooks = dict(param_scheduler=dict(max_epochs=max_epochs),
+                     checkpoint=dict(interval=save_epoch_intervals,
+                                     rule='greater'))
+custom_hooks = [
+    dict(type='EMAHook',
+         ema_type='ExpMomentumEMA',
+         momentum=0.0001,
+         update_buffers=True,
+         strict_load=False,
+         priority=49),
+    dict(type='mmdet.PipelineSwitchHook',
+         switch_epoch=max_epochs - close_mosaic_epochs,
+         switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(max_epochs=max_epochs,
+                 val_interval=10,
+                 dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                                     _base_.val_interval_stage2)])
+optim_wrapper = dict(optimizer=dict(
+    _delete_=True,
+    type='AdamW',
+    lr=base_lr,
+    weight_decay=weight_decay,
+    batch_size_per_gpu=train_batch_size_per_gpu),
+                     paramwise_cfg=dict(bias_decay_mult=0.0,
+                                        norm_decay_mult=0.0,
+                                        custom_keys={
+                                            'backbone.text_model':
+                                            dict(lr_mult=0.01),
+                                            'logit_scale':
+                                            dict(weight_decay=0.0)
+                                        }),
+                     constructor='YOLOWv5OptimizerConstructor')

configs/pretrain/yolo_world_s_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py ADDED Viewed

	@@ -0,0 +1,172 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_s_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 1203
+num_training_classes = 80
+max_epochs = 100  # Maximum training epochs
+close_mosaic_epochs = 2
+save_epoch_intervals = 2
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-3
+# for 4 nodes, 8 gpus per node, 32 total gpus
+weight_decay = 0.05 / 2
+train_batch_size_per_gpu = 16
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='pretrained_models/clip-vit-base-patch32-projection',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWorldPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              num_csp_blocks=2),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    use_bn_head=True,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    dict(type='MultiModalMosaic',
+         img_scale=_base_.img_scale,
+         pad_val=114.0,
+         pre_transform=_base_.pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+train_pipeline_stage2 = [*_base_.train_pipeline_stage2[:-1], *text_transform]
+obj365v1_train_dataset = dict(
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5Objects365V1Dataset',
+        data_root='data/objects365v1/',
+        ann_file='annotations/objects365_train.json',
+        data_prefix=dict(img='train/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/obj365v1_class_captions.json',
+    pipeline=train_pipeline)
+mg_train_dataset = dict(type='YOLOv5MixedGroundingDataset',
+                        data_root='data/mixed_grounding/',
+                        ann_file='annotations/final_mixed_train_no_coco.json',
+                        data_prefix=dict(img='gqa/images/'),
+                        filter_cfg=dict(filter_empty_gt=False, min_size=32),
+                        pipeline=train_pipeline)
+flickr_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/flickr/',
+    ann_file='annotations/final_flickr_separateGT_train.json',
+    data_prefix=dict(img='full_images/'),
+    filter_cfg=dict(filter_empty_gt=True, min_size=32),
+    pipeline=train_pipeline)
+train_dataloader = dict(batch_size=train_batch_size_per_gpu,
+                        collate_fn=dict(type='yolow_collate'),
+                        dataset=dict(_delete_=True,
+                                     type='ConcatDataset',
+                                     datasets=[
+                                         obj365v1_train_dataset,
+                                         flickr_train_dataset, mg_train_dataset
+                                     ],
+                                     ignore_keys=['classes', 'palette']))
+test_pipeline = [
+    *_base_.test_pipeline[:-1],
+    dict(type='LoadText'),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                    'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(type='YOLOv5LVISV1Dataset',
+                 data_root='data/coco/',
+                 test_mode=True,
+                 ann_file='lvis/lvis_v1_minival_inserted_image_name.json',
+                 data_prefix=dict(img=''),
+                 batch_shapes_cfg=None),
+    class_text_path='data/captions/lvis_v1_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+val_evaluator = dict(type='mmdet.LVISMetric',
+                     ann_file='data/coco/lvis/lvis_v1_minival_inserted_image_name.json',
+                     metric='bbox')
+test_evaluator = val_evaluator
+# training settings
+default_hooks = dict(param_scheduler=dict(max_epochs=max_epochs),
+                     checkpoint=dict(interval=save_epoch_intervals,
+                                     rule='greater'))
+custom_hooks = [
+    dict(type='EMAHook',
+         ema_type='ExpMomentumEMA',
+         momentum=0.0001,
+         update_buffers=True,
+         strict_load=False,
+         priority=49),
+    dict(type='mmdet.PipelineSwitchHook',
+         switch_epoch=max_epochs - close_mosaic_epochs,
+         switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(max_epochs=max_epochs,
+                 val_interval=10,
+                 dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                                     _base_.val_interval_stage2)])
+optim_wrapper = dict(optimizer=dict(
+    _delete_=True,
+    type='AdamW',
+    lr=base_lr,
+    weight_decay=weight_decay,
+    batch_size_per_gpu=train_batch_size_per_gpu),
+                     paramwise_cfg=dict(bias_decay_mult=0.0,
+                                        norm_decay_mult=0.0,
+                                        custom_keys={
+                                            'backbone.text_model':
+                                            dict(lr_mult=0.01),
+                                            'logit_scale':
+                                            dict(weight_decay=0.0)
+                                        }),
+                     constructor='YOLOWv5OptimizerConstructor')

configs/scaleup/yolo_world_l_t2i_bn_2e-4_20e_4x8gpus_obj365v1_goldg_train_lvis_minival_s1024.py ADDED Viewed

	@@ -0,0 +1,216 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_l_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 1203
+num_training_classes = 80
+max_epochs = 20  # Maximum training epochs
+close_mosaic_epochs = 2
+save_epoch_intervals = 2
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-4
+weight_decay = 0.05 / 2
+train_batch_size_per_gpu = 8
+img_scale = (1024, 1024)
+load_from = 'work_dirs/model_zoo/yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth'  # noqa
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='openai/clip-vit-base-patch32',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWorldPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              num_csp_blocks=2),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    use_bn_head=True,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+mosaic_affine_transform = [
+    dict(type='MultiModalMosaic',
+         img_scale=img_scale,
+         pad_val=114.0,
+         pre_transform=_base_.pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border=(-img_scale[0] // 2, -img_scale[1] // 2),
+        border_val=(114, 114, 114))
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    *mosaic_affine_transform,
+    dict(
+        type='YOLOv5MultiModalMixUp',
+        prob=_base_.mixup_prob,
+        pre_transform=[*_base_.pre_transform,
+                       *mosaic_affine_transform]),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+train_pipeline_stage2 = [
+    *_base_.pre_transform,
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=True,
+        pad_val=dict(img=114.0)),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border_val=(114, 114, 114)),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+obj365v1_train_dataset = dict(
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5Objects365V1Dataset',
+        data_root='data/objects365v1/',
+        ann_file='annotations/objects365_train.json',
+        data_prefix=dict(img='train/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/obj365v1_class_captions.json',
+    pipeline=train_pipeline)
+mg_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/mixed_grounding/',
+    ann_file='annotations/final_mixed_train_no_coco.json',
+    data_prefix=dict(img='gqa/images/'),
+    filter_cfg=dict(filter_empty_gt=False, min_size=32),
+    pipeline=train_pipeline)
+flickr_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/flickr/',
+    ann_file='annotations/final_flickr_separateGT_train.json',
+    data_prefix=dict(img='images/'),
+    filter_cfg=dict(filter_empty_gt=True, min_size=32),
+    pipeline=train_pipeline)
+train_dataloader = dict(
+    batch_size=train_batch_size_per_gpu,
+    collate_fn=dict(type='yolow_collate'),
+    dataset=dict(
+        _delete_=True,
+        type='ConcatDataset',
+        datasets=[
+            obj365v1_train_dataset,
+            flickr_train_dataset,
+            mg_train_dataset
+        ],
+        ignore_keys=['classes', 'palette']))
+test_pipeline = [
+    dict(type='LoadImageFromFile', backend_args=_base_.backend_args),
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=False,
+        pad_val=dict(img=114)),
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(type='LoadText'),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                    'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5LVISV1Dataset',
+        data_root='data/lvis/',
+        test_mode=True,
+        ann_file='annotations/'
+                 'lvis_v1_minival_inserted_image_name.json',
+        data_prefix=dict(img=''),
+        batch_shapes_cfg=None),
+    class_text_path='data/captions/lvis_v1_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+val_evaluator = dict(
+    type='mmdet.LVISMetric',
+    ann_file='data/lvis/annotations/'
+             'lvis_v1_minival_inserted_image_name.json',
+    metric='bbox')
+test_evaluator = val_evaluator
+# training settings
+default_hooks = dict(
+    param_scheduler=dict(max_epochs=max_epochs),
+    checkpoint=dict(interval=save_epoch_intervals,
+                    rule='greater'))
+custom_hooks = [
+    dict(type='EMAHook',
+         ema_type='ExpMomentumEMA',
+         momentum=0.0001,
+         update_buffers=True,
+         strict_load=False,
+         priority=49),
+    dict(type='mmdet.PipelineSwitchHook',
+         switch_epoch=max_epochs - close_mosaic_epochs,
+         switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(
+    max_epochs=max_epochs,
+    val_interval=10,
+    dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                        _base_.val_interval_stage2)])
+optim_wrapper = dict(optimizer=dict(
+    _delete_=True,
+    type='AdamW',
+    lr=base_lr,
+    weight_decay=weight_decay,
+    batch_size_per_gpu=train_batch_size_per_gpu),
+    paramwise_cfg=dict(
+        bias_decay_mult=0.0,
+        norm_decay_mult=0.0,
+        custom_keys={
+            'backbone.text_model':
+            dict(lr_mult=0.0),
+            'logit_scale':
+            dict(weight_decay=0.0)
+        }),
+    constructor='YOLOWv5OptimizerConstructor')

configs/scaleup/yolo_world_l_t2i_bn_2e-4_20e_4x8gpus_obj365v1_goldg_train_lvis_minival_s1280.py ADDED Viewed

	@@ -0,0 +1,216 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_l_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 1203
+num_training_classes = 80
+max_epochs = 20  # Maximum training epochs
+close_mosaic_epochs = 2
+save_epoch_intervals = 2
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-4
+weight_decay = 0.05 / 2
+train_batch_size_per_gpu = 4
+img_scale = (1280, 1280)
+load_from = 'work_dirs/model_zoo/yolow-v8_l_clipv2_frozen_t2iv2_bn_o365_goldg_pretrain.pth'  # noqa
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='openai/clip-vit-base-patch32',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWorldPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              num_csp_blocks=2),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    use_bn_head=True,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+mosaic_affine_transform = [
+    dict(type='MultiModalMosaic',
+         img_scale=img_scale,
+         pad_val=114.0,
+         pre_transform=_base_.pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border=(-img_scale[0] // 2, -img_scale[1] // 2),
+        border_val=(114, 114, 114))
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    *mosaic_affine_transform,
+    dict(
+        type='YOLOv5MultiModalMixUp',
+        prob=_base_.mixup_prob,
+        pre_transform=[*_base_.pre_transform,
+                       *mosaic_affine_transform]),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+train_pipeline_stage2 = [
+    *_base_.pre_transform,
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=True,
+        pad_val=dict(img=114.0)),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border_val=(114, 114, 114)),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+obj365v1_train_dataset = dict(
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5Objects365V1Dataset',
+        data_root='data/objects365v1/',
+        ann_file='annotations/objects365_train.json',
+        data_prefix=dict(img='train/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/obj365v1_class_captions.json',
+    pipeline=train_pipeline)
+mg_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/mixed_grounding/',
+    ann_file='annotations/final_mixed_train_no_coco.json',
+    data_prefix=dict(img='gqa/images/'),
+    filter_cfg=dict(filter_empty_gt=False, min_size=32),
+    pipeline=train_pipeline)
+flickr_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/flickr/',
+    ann_file='annotations/final_flickr_separateGT_train.json',
+    data_prefix=dict(img='images/'),
+    filter_cfg=dict(filter_empty_gt=True, min_size=32),
+    pipeline=train_pipeline)
+train_dataloader = dict(
+    batch_size=train_batch_size_per_gpu,
+    collate_fn=dict(type='yolow_collate'),
+    dataset=dict(
+        _delete_=True,
+        type='ConcatDataset',
+        datasets=[
+            obj365v1_train_dataset,
+            flickr_train_dataset,
+            mg_train_dataset
+        ],
+        ignore_keys=['classes', 'palette']))
+test_pipeline = [
+    dict(type='LoadImageFromFile', backend_args=_base_.backend_args),
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=False,
+        pad_val=dict(img=114)),
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(type='LoadText'),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                    'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5LVISV1Dataset',
+        data_root='data/lvis/',
+        test_mode=True,
+        ann_file='annotations/'
+                 'lvis_v1_minival_inserted_image_name.json',
+        data_prefix=dict(img=''),
+        batch_shapes_cfg=None),
+    class_text_path='data/captions/lvis_v1_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+val_evaluator = dict(
+    type='mmdet.LVISMetric',
+    ann_file='data/lvis/annotations/'
+             'lvis_v1_minival_inserted_image_name.json',
+    metric='bbox')
+test_evaluator = val_evaluator
+# training settings
+default_hooks = dict(
+    param_scheduler=dict(max_epochs=max_epochs),
+    checkpoint=dict(interval=save_epoch_intervals,
+                    rule='greater'))
+custom_hooks = [
+    dict(type='EMAHook',
+         ema_type='ExpMomentumEMA',
+         momentum=0.0001,
+         update_buffers=True,
+         strict_load=False,
+         priority=49),
+    dict(type='mmdet.PipelineSwitchHook',
+         switch_epoch=max_epochs - close_mosaic_epochs,
+         switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(
+    max_epochs=max_epochs,
+    val_interval=10,
+    dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                        _base_.val_interval_stage2)])
+optim_wrapper = dict(optimizer=dict(
+    _delete_=True,
+    type='AdamW',
+    lr=base_lr,
+    weight_decay=weight_decay,
+    batch_size_per_gpu=train_batch_size_per_gpu),
+    paramwise_cfg=dict(
+        bias_decay_mult=0.0,
+        norm_decay_mult=0.0,
+        custom_keys={
+            'backbone.text_model':
+            dict(lr_mult=0.0),
+            'logit_scale':
+            dict(weight_decay=0.0)
+        }),
+    constructor='YOLOWv5OptimizerConstructor')

configs/scaleup/yolo_world_l_t2i_bn_2e-4_20e_4x8gpus_obj365v1_goldg_train_lvis_minival_s1280_v2.py ADDED Viewed

	@@ -0,0 +1,216 @@

+_base_ = ('../../third_party/mmyolo/configs/yolov8/'
+          'yolov8_l_syncbn_fast_8xb16-500e_coco.py')
+custom_imports = dict(imports=['yolo_world'],
+                      allow_failed_imports=False)
+# hyper-parameters
+num_classes = 1203
+num_training_classes = 80
+max_epochs = 20  # Maximum training epochs
+close_mosaic_epochs = 2
+save_epoch_intervals = 2
+text_channels = 512
+neck_embed_channels = [128, 256, _base_.last_stage_out_channels // 2]
+neck_num_heads = [4, 8, _base_.last_stage_out_channels // 2 // 32]
+base_lr = 2e-4
+weight_decay = 0.05 / 2
+train_batch_size_per_gpu = 6
+img_scale = (1280, 1280)
+load_from = 'work_dirs/yolo_world_l_t2i_bn_2e-4_20e_4x8gpus_obj365v1_goldg_train_lvis_minival_s1280/epoch_20.pth'  # noqa
+# model settings
+model = dict(
+    type='YOLOWorldDetector',
+    mm_neck=True,
+    num_train_classes=num_training_classes,
+    num_test_classes=num_classes,
+    data_preprocessor=dict(type='YOLOWDetDataPreprocessor'),
+    backbone=dict(
+        _delete_=True,
+        type='MultiModalYOLOBackbone',
+        image_model={{_base_.model.backbone}},
+        text_model=dict(
+            type='HuggingCLIPLanguageBackbone',
+            model_name='openai/clip-vit-base-patch32',
+            frozen_modules=['all'])),
+    neck=dict(type='YOLOWorldPAFPN',
+              guide_channels=text_channels,
+              embed_channels=neck_embed_channels,
+              num_heads=neck_num_heads,
+              block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'),
+              num_csp_blocks=2),
+    bbox_head=dict(type='YOLOWorldHead',
+                   head_module=dict(type='YOLOWorldHeadModule',
+                                    embed_dims=text_channels,
+                                    use_bn_head=True,
+                                    num_classes=num_training_classes)),
+    train_cfg=dict(assigner=dict(num_classes=num_training_classes)))
+# dataset settings
+text_transform = [
+    dict(type='RandomLoadText',
+         num_neg_samples=(num_classes, num_classes),
+         max_num_samples=num_training_classes,
+         padding_to_max=True,
+         padding_value=''),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                    'flip_direction', 'texts'))
+]
+mosaic_affine_transform = [
+    dict(type='MultiModalMosaic',
+         img_scale=img_scale,
+         pad_val=114.0,
+         pre_transform=_base_.pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border=(-img_scale[0] // 2, -img_scale[1] // 2),
+        border_val=(114, 114, 114))
+]
+train_pipeline = [
+    *_base_.pre_transform,
+    *mosaic_affine_transform,
+    dict(
+        type='YOLOv5MultiModalMixUp',
+        prob=_base_.mixup_prob,
+        pre_transform=[*_base_.pre_transform,
+                       *mosaic_affine_transform]),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+train_pipeline_stage2 = [
+    *_base_.pre_transform,
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=True,
+        pad_val=dict(img=114.0)),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - _base_.affine_scale, 1 + _base_.affine_scale),
+        max_aspect_ratio=_base_.max_aspect_ratio,
+        border_val=(114, 114, 114)),
+    *_base_.last_transform[:-1],
+    *text_transform,
+]
+obj365v1_train_dataset = dict(
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5Objects365V1Dataset',
+        data_root='data/objects365v1/',
+        ann_file='annotations/objects365_train.json',
+        data_prefix=dict(img='train/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32)),
+    class_text_path='data/captions/obj365v1_class_captions.json',
+    pipeline=train_pipeline)
+mg_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/mixed_grounding/',
+    ann_file='annotations/final_mixed_train_no_coco.json',
+    data_prefix=dict(img='gqa/images/'),
+    filter_cfg=dict(filter_empty_gt=False, min_size=32),
+    pipeline=train_pipeline)
+flickr_train_dataset = dict(
+    type='YOLOv5MixedGroundingDataset',
+    data_root='data/flickr/',
+    ann_file='annotations/final_flickr_separateGT_train.json',
+    data_prefix=dict(img='images/'),
+    filter_cfg=dict(filter_empty_gt=True, min_size=32),
+    pipeline=train_pipeline)
+train_dataloader = dict(
+    batch_size=train_batch_size_per_gpu,
+    collate_fn=dict(type='yolow_collate'),
+    dataset=dict(
+        _delete_=True,
+        type='ConcatDataset',
+        datasets=[
+            obj365v1_train_dataset,
+            flickr_train_dataset,
+            mg_train_dataset
+        ],
+        ignore_keys=['classes', 'palette']))
+test_pipeline = [
+    dict(type='LoadImageFromFile', backend_args=_base_.backend_args),
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=False,
+        pad_val=dict(img=114)),
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(type='LoadText'),
+    dict(type='mmdet.PackDetInputs',
+         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                    'scale_factor', 'pad_param', 'texts'))
+]
+coco_val_dataset = dict(
+    _delete_=True,
+    type='MultiModalDataset',
+    dataset=dict(
+        type='YOLOv5LVISV1Dataset',
+        data_root='data/lvis/',
+        test_mode=True,
+        ann_file='annotations/'
+                 'lvis_v1_minival_inserted_image_name.json',
+        data_prefix=dict(img=''),
+        batch_shapes_cfg=None),
+    class_text_path='data/captions/lvis_v1_class_captions.json',
+    pipeline=test_pipeline)
+val_dataloader = dict(dataset=coco_val_dataset)
+test_dataloader = val_dataloader
+val_evaluator = dict(
+    type='mmdet.LVISMetric',
+    ann_file='data/lvis/annotations/'
+             'lvis_v1_minival_inserted_image_name.json',
+    metric='bbox')
+test_evaluator = val_evaluator
+# training settings
+default_hooks = dict(
+    param_scheduler=dict(max_epochs=max_epochs),
+    checkpoint=dict(interval=save_epoch_intervals,
+                    rule='greater'))
+custom_hooks = [
+    dict(type='EMAHook',
+         ema_type='ExpMomentumEMA',
+         momentum=0.0001,
+         update_buffers=True,
+         strict_load=False,
+         priority=49),
+    dict(type='mmdet.PipelineSwitchHook',
+         switch_epoch=max_epochs - close_mosaic_epochs,
+         switch_pipeline=train_pipeline_stage2)
+]
+train_cfg = dict(
+    max_epochs=max_epochs,
+    val_interval=10,
+    dynamic_intervals=[((max_epochs - close_mosaic_epochs),
+                        _base_.val_interval_stage2)])
+optim_wrapper = dict(optimizer=dict(
+    _delete_=True,
+    type='AdamW',
+    lr=base_lr,
+    weight_decay=weight_decay,
+    batch_size_per_gpu=train_batch_size_per_gpu),
+    paramwise_cfg=dict(
+        bias_decay_mult=0.0,
+        norm_decay_mult=0.0,
+        custom_keys={
+            'backbone.text_model':
+            dict(lr_mult=0.0),
+            'logit_scale':
+            dict(weight_decay=0.0)
+        }),
+    constructor='YOLOWv5OptimizerConstructor')

deploy/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .models import * # noqa

deploy/models/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from .detectors import *  # noqa
+from .dense_heads import *  # noqa
+from .layers import *  # noqa
+from .necks import *  # noqa

docs/data.md ADDED Viewed

	@@ -0,0 +1,19 @@

+## Preparing Data for YOLO-World
+### Overview
+### Pre-training Data
+| Data | Samples | Type | Boxes | Annotations |
+| :-- | :-----: | :---:| :---: | :---------: |
+| Objects365v1 |  | detection | | |
+| GQA | | ground | | |
+| Flickr | | ground | | |

docs/deploy.md ADDED Viewed

File without changes

docs/install.md ADDED Viewed

File without changes

docs/training.md ADDED Viewed

File without changes

requirements.txt CHANGED Viewed

@@ -15,3 +15,4 @@ regex
 pot
 sentencepiece
 tokenizers

 pot
 sentencepiece
 tokenizers

setup.py ADDED Viewed

	@@ -0,0 +1,190 @@

+# Copyright (c) Tencent Inc. All rights reserved.
+import os
+import os.path as osp
+import shutil
+import sys
+import warnings
+from setuptools import find_packages, setup
+def readme():
+    with open('README.md', encoding='utf-8') as f:
+        content = f.read()
+    return content
+def get_version():
+    version_file = 'yolo_world/version.py'
+    with open(version_file, 'r', encoding='utf-8') as f:
+        exec(compile(f.read(), version_file, 'exec'))
+    return locals()['__version__']
+def parse_requirements(fname='requirements.txt', with_version=True):
+    """Parse the package dependencies listed in a requirements file but strips
+    specific versioning information.
+    Args:
+        fname (str): path to requirements file
+        with_version (bool, default=False): if True include version specs
+    Returns:
+        List[str]: list of requirements items
+    CommandLine:
+        python -c "import setup; print(setup.parse_requirements())"
+    """
+    import re
+    import sys
+    from os.path import exists
+    require_fpath = fname
+    def parse_line(line):
+        """Parse information from a line in a requirements text file."""
+        if line.startswith('-r '):
+            # Allow specifying requirements in other files
+            target = line.split(' ')[1]
+            for info in parse_require_file(target):
+                yield info
+        else:
+            info = {'line': line}
+            if line.startswith('-e '):
+                info['package'] = line.split('#egg=')[1]
+            else:
+                # Remove versioning from the package
+                pat = '(' + '|'.join(['>=', '==', '>']) + ')'
+                parts = re.split(pat, line, maxsplit=1)
+                parts = [p.strip() for p in parts]
+                info['package'] = parts[0]
+                if len(parts) > 1:
+                    op, rest = parts[1:]
+                    if ';' in rest:
+                        # Handle platform specific dependencies
+                        # http://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-platform-specific-dependencies
+                        version, platform_deps = map(str.strip,
+                                                     rest.split(';'))
+                        info['platform_deps'] = platform_deps
+                    else:
+                        version = rest  # NOQA
+                    if '--' in version:
+                        # the `extras_require` doesn't accept options.
+                        version = version.split('--')[0].strip()
+                    info['version'] = (op, version)
+            yield info
+    def parse_require_file(fpath):
+        with open(fpath, 'r') as f:
+            for line in f.readlines():
+                line = line.strip()
+                if line and not line.startswith('#'):
+                    for info in parse_line(line):
+                        yield info
+    def gen_packages_items():
+        if exists(require_fpath):
+            for info in parse_require_file(require_fpath):
+                parts = [info['package']]
+                if with_version and 'version' in info:
+                    parts.extend(info['version'])
+                if not sys.version.startswith('3.4'):
+                    # apparently package_deps are broken in 3.4
+                    platform_deps = info.get('platform_deps')
+                    if platform_deps is not None:
+                        parts.append(';' + platform_deps)
+                item = ''.join(parts)
+                yield item
+    packages = list(gen_packages_items())
+    return packages
+def add_mim_extension():
+    """Add extra files that are required to support MIM into the package.
+    These files will be added by creating a symlink to the originals if the
+    package is installed in `editable` mode (e.g. pip install -e .), or by
+    copying from the originals otherwise.
+    """
+    # parse installment mode
+    if 'develop' in sys.argv:
+        # installed by `pip install -e .`
+        mode = 'symlink'
+    elif 'sdist' in sys.argv or 'bdist_wheel' in sys.argv:
+        # installed by `pip install .`
+        # or create source distribution by `python setup.py sdist`
+        mode = 'copy'
+    else:
+        return
+    filenames = ['tools', 'configs', 'model-index.yml', 'dataset-index.yml']
+    repo_path = osp.dirname(__file__)
+    mim_path = osp.join(repo_path, 'yolo_world', '.mim')
+    os.makedirs(mim_path, exist_ok=True)
+    for filename in filenames:
+        if osp.exists(filename):
+            src_path = osp.join(repo_path, filename)
+            tar_path = osp.join(mim_path, filename)
+            if osp.isfile(tar_path) or osp.islink(tar_path):
+                os.remove(tar_path)
+            elif osp.isdir(tar_path):
+                shutil.rmtree(tar_path)
+            if mode == 'symlink':
+                src_relpath = osp.relpath(src_path, osp.dirname(tar_path))
+                try:
+                    os.symlink(src_relpath, tar_path)
+                except OSError:
+                    # Creating a symbolic link on windows may raise an
+                    # `OSError: [WinError 1314]` due to privilege. If
+                    # the error happens, the src file will be copied
+                    mode = 'copy'
+                    warnings.warn(
+                        f'Failed to create a symbolic link for {src_relpath}, '
+                        f'and it will be copied to {tar_path}')
+                else:
+                    continue
+            if mode == 'copy':
+                if osp.isfile(src_path):
+                    shutil.copyfile(src_path, tar_path)
+                elif osp.isdir(src_path):
+                    shutil.copytree(src_path, tar_path)
+                else:
+                    warnings.warn(f'Cannot copy file {src_path}.')
+            else:
+                raise ValueError(f'Invalid mode {mode}')
+if __name__ == '__main__':
+    setup(
+        name='yolo_world',
+        version=get_version(),
+        description='YOLO-World: Real-time Open Vocabulary Object Detection',
+        long_description=readme(),
+        long_description_content_type='text/markdown',
+        keywords='object detection',
+        packages=find_packages(exclude=(
+            'data', 'third_party', 'tools')),
+        include_package_data=True,
+        python_requires='>=3.7',
+        classifiers=[
+            'Development Status :: 4 - Beta',
+            'License :: OSI Approved :: Apache Software License',
+            'Operating System :: OS Independent',
+            'Programming Language :: Python :: 3',
+            'Programming Language :: Python :: 3.7',
+            'Programming Language :: Python :: 3.8',
+            'Programming Language :: Python :: 3.9',
+            'Programming Language :: Python :: 3.10',
+            'Programming Language :: Python :: 3.11',
+            'Topic :: Scientific/Engineering :: Artificial Intelligence',
+        ],
+        author='Tencent AILab',
+        author_email='[email protected]',
+        license='Apache License 2.0',
+        install_requires=parse_requirements('requirements.txt'),
+        zip_safe=False)

taiji/drun ADDED Viewed

	@@ -0,0 +1,35 @@

+#!/bin/bash
+DOCKER_IMAGE="mirrors.tencent.com/ronnysong_rd/fastdet:torch2.0.1-cuda11.7"
+if [ ! -n "$DEBUG" ]; then
+    COMMAND_PREFIX="pip3 install -e ."
+else
+    COMMAND_PREFIX="pip3 install -q -e third_party/mmengine;
+                    pip3 install -q -e third_party/mmdetection;
+                    pip3 install -q -e third_party/mmcv;
+                    pip3 install -q -e third_party/mmyolo;
+                    pip3 install -q -e ."
+fi
+sudo nvidia-docker run \
+    --rm \
+    -it \
+    -e NVIDIA_VISIBLE_DEVICES=all \
+    --env="DISPLAY" \
+    --env="QT_X11_NO_MITSHM=1" \
+    --volume="$HOME/.Xauthority:/root/.Xauthority:rw" \
+    --shm-size=20gb \
+    --network=host \
+    -v /apdcephfs/:/apdcephfs/ \
+    -v /apdcephfs_cq2/:/apdcephfs_cq2/ \
+    -v /apdcephfs_cq3/:/apdcephfs_cq3/ \
+    -v /data/:/data/ \
+    -w $PWD \
+    $DOCKER_IMAGE \
+    bash -c "export TRANSFORMERS_CACHE=$PWD/work_dirs/.cache/transformers;
+             export TORCH_HOME=$PWD/work_dirs/.cache/torch;
+             export CLIP_CACHE=$PWD/work_dirs/.cache/clip;
+             export HF_HOME=$PWD/work_dirs/.cache/hf;
+             export TOKENIZERS_PARALLELISM=false;
+             $COMMAND_PREFIX
+             $*"

taiji/erun ADDED Viewed

	@@ -0,0 +1,23 @@

+#!/bin/bash
+export NCCL_IB_GID_INDEX=3
+export TRANSFORMERS_CACHE=$PWD/work_dirs/.cache/transformers
+export TORCH_HOME=$PWD/work_dirs/.cache/torch
+export CLIP_CACHE=$PWD/work_dirs/.cache/clip
+export HF_HOME=$PWD/work_dirs/.cache/hf
+export TOKENIZERS_PARALLELISM=false
+export MKL_NUM_THREADS=1
+export OMP_NUM_THREADS=1
+export TORCH_DISTRIBUTED_DEBUG=INFO
+export HF_DATASETS_OFFLINE=1
+export TRANSFORMERS_OFFLINE=1
+export http_proxy="http://star-proxy.oa.com:3128"
+export https_proxy="http://star-proxy.oa.com:3128"
+export ftp_proxy="http://star-proxy.oa.com:3128"
+export no_proxy=".woa.com,mirrors.cloud.tencent.com,tlinux-mirror.tencent-cloud.com,tlinux-mirrorlist.tencent-cloud.com,localhost,127.0.0.1,mirrors-tlinux.tencentyun.com,.oa.com,.local,.3gqq.com,.7700.org,.ad.com,.ada_sixjoy.com,.addev.com,.app.local,.apps.local,.aurora.com,.autotest123.com,.bocaiwawa.com,.boss.com,.cdc.com,.cdn.com,.cds.com,.cf.com,.cjgc.local,.cm.com,.code.com,.datamine.com,.dvas.com,.dyndns.tv,.ecc.com,.expochart.cn,.expovideo.cn,.fms.com,.great.com,.hadoop.sec,.heme.com,.home.com,.hotbar.com,.ibg.com,.ied.com,.ieg.local,.ierd.com,.imd.com,.imoss.com,.isd.com,.isoso.com,.itil.com,.kao5.com,.kf.com,.kitty.com,.lpptp.com,.m.com,.matrix.cloud,.matrix.net,.mickey.com,.mig.local,.mqq.com,.oiweb.com,.okbuy.isddev.com,.oss.com,.otaworld.com,.paipaioa.com,.qqbrowser.local,.qqinternal.com,.qqwork.com,.rtpre.com,.sc.oa.com,.sec.com,.server.com,.service.com,.sjkxinternal.com,.sllwrnm5.cn,.sng.local,.soc.com,.t.km,.tcna.com,.teg.local,.tencentvoip.com,.tenpayoa.com,.test.air.tenpay.com,.tr.com,.tr_autotest123.com,.vpn.com,.wb.local,.webdev.com,.webdev2.com,.wizard.com,.wqq.com,.wsd.com,.sng.com,.music.lan,.mnet2.com,.tencentb2.com,.tmeoa.com,.pcg.com,www.wip3.adobe.com,www-mm.wip3.adobe.com,mirrors.tencent.com,csighub.tencentyun.com"
+sed -i 's/np.float/float/g' /usr/local/python/lib/python3.8/site-packages/lvis/eval.py
+touch /tmp/.unhold
+pip3 install -e .
+$*
+rm /tmp/.unhold

taiji/etorchrun ADDED Viewed

	@@ -0,0 +1,51 @@

+#!/bin/bash
+if [ ! -n "$SH" ]; then
+    #export NCCL_IB_GID_INDEX=3
+    export NCCL_IB_DISABLE=1
+    export NCCL_P2P_DISABLE=1
+    export NCCL_SOCKET_IFNAME=eth1
+else
+    export NCCL_IB_GID_INDEX=3
+    export NCCL_IB_SL=3
+    export NCCL_CHECKS_DISABLE=1
+    export NCCL_P2P_DISABLE=0
+    export NCCL_IB_DISABLE=0
+    export NCCL_LL_THRESHOLD=16384
+    export NCCL_IB_CUDA_SUPPORT=1
+    export NCCL_SOCKET_IFNAME=bond1
+    export UCX_NET_DEVICES=bond1
+    export NCCL_IB_HCA=mlx5_bond_1,mlx5_bond_5,mlx5_bond_3,mlx5_bond_7,mlx5_bond_4,mlx5_bond_8,mlx5_bond_2,mlx5_bond_6
+    export NCCL_COLLNET_ENABLE=0
+    export SHARP_COLL_ENABLE_SAT=0
+    export NCCL_NET_GDR_LEVEL=2
+    export NCCL_IB_QPS_PER_CONNECTION=4
+    export NCCL_IB_TC=160
+    export NCCL_PXN_DISABLE=1
+    export GLOO_SOCKET_IFNAME=bond1
+    export NCCL_DEBUG=info
+fi
+export TRANSFORMERS_CACHE=$PWD/work_dirs/.cache/transformers
+export TORCH_HOME=$PWD/work_dirs/.cache/torch
+export CLIP_CACHE=$PWD/work_dirs/.cache/clip
+export HF_HOME=$PWD/work_dirs/.cache/hf
+export TOKENIZERS_PARALLELISM=false
+export MKL_NUM_THREADS=1
+export OMP_NUM_THREADS=1
+export TORCH_DISTRIBUTED_DEBUG=INFO
+export HF_DATASETS_OFFLINE=1
+export TRANSFORMERS_OFFLINE=1
+export http_proxy="http://star-proxy.oa.com:3128"
+export https_proxy="http://star-proxy.oa.com:3128"
+export ftp_proxy="http://star-proxy.oa.com:3128"
+export no_proxy=".woa.com,mirrors.cloud.tencent.com,tlinux-mirror.tencent-cloud.com,tlinux-mirrorlist.tencent-cloud.com,localhost,127.0.0.1,mirrors-tlinux.tencentyun.com,.oa.com,.local,.3gqq.com,.7700.org,.ad.com,.ada_sixjoy.com,.addev.com,.app.local,.apps.local,.aurora.com,.autotest123.com,.bocaiwawa.com,.boss.com,.cdc.com,.cdn.com,.cds.com,.cf.com,.cjgc.local,.cm.com,.code.com,.datamine.com,.dvas.com,.dyndns.tv,.ecc.com,.expochart.cn,.expovideo.cn,.fms.com,.great.com,.hadoop.sec,.heme.com,.home.com,.hotbar.com,.ibg.com,.ied.com,.ieg.local,.ierd.com,.imd.com,.imoss.com,.isd.com,.isoso.com,.itil.com,.kao5.com,.kf.com,.kitty.com,.lpptp.com,.m.com,.matrix.cloud,.matrix.net,.mickey.com,.mig.local,.mqq.com,.oiweb.com,.okbuy.isddev.com,.oss.com,.otaworld.com,.paipaioa.com,.qqbrowser.local,.qqinternal.com,.qqwork.com,.rtpre.com,.sc.oa.com,.sec.com,.server.com,.service.com,.sjkxinternal.com,.sllwrnm5.cn,.sng.local,.soc.com,.t.km,.tcna.com,.teg.local,.tencentvoip.com,.tenpayoa.com,.test.air.tenpay.com,.tr.com,.tr_autotest123.com,.vpn.com,.wb.local,.webdev.com,.webdev2.com,.wizard.com,.wqq.com,.wsd.com,.sng.com,.music.lan,.mnet2.com,.tencentb2.com,.tmeoa.com,.pcg.com,www.wip3.adobe.com,www-mm.wip3.adobe.com,mirrors.tencent.com,csighub.tencentyun.com"
+sed -i 's/np.float/float/g' /usr/local/python/lib/python3.8/site-packages/lvis/eval.py
+touch /tmp/.unhold
+pip3 install -e .
+torchrun --nnodes=$1 --nproc_per_node=$2 --node_rank=$INDEX --master_addr=$CHIEF_IP ${@:3}
+rm /tmp/.unhold

taiji/jizhi_run_vanilla ADDED Viewed

	@@ -0,0 +1,105 @@

+#!/bin/bash
+if [[ $1 = "--help" ]] || [[ $1 = "-h" ]]
+then
+    echo "Usage: jizhi_run NUM_MECHINES NUM_GPUS TASK_NAME <CMDS>"
+fi
+# user configuration
+TOKEN=$TOKEN
+if [ ! -n "$IMAGE_FULL_NAME" ]; then
+    IMAGE_FULL_NAME="mirrors.tencent.com/ronnysong_rd/fastdet:torch2.0.1-cuda11.7"
+fi
+if [ ! -n "$BUSINESS_FLAG" ]; then
+    BUSINESS_FLAG="TEG_AILab_CVC_chongqing"
+fi
+if [ ! -n "$CEPH_BUSINESS_FLAG" ]; then
+    CEPH_BUSINESS_FLAG="TEG_AILab_CVC_chongqing"
+fi
+if [ ! -n "$GPU_NAME" ]; then
+    GPU_NAME="V100"
+fi
+if [ ! -n "$PRIORITY_LEVEL" ]; then
+    PRIORITY_LEVEL="HIGH"
+fi
+if [ ! -n "$ELASTIC_LEVEL" ]; then
+    ELASTIC_LEVEL=1
+fi
+if [ ! -n "$RDMA" ]; then
+    RDMA="false"
+fi
+if [ ! -n "$CUDA" ]; then
+    CUDA="11.0"
+fi
+CMD_PATH="start.sh"
+CONF_PATH="jizhi_conf.json"
+ROOT_PATH=$PWD
+UUID=$(date +%s)
+rm -f $CMD_PATH
+echo 'cd '$ROOT_PATH >> $CMD_PATH
+echo 'export HF_HOME="'$ROOT_PATH'/work_dirs/.cache/hf"' >> $CMD_PATH
+echo 'export TORCH_HOME="'$ROOT_PATH'/work_dirs/.cache/torch"' >> $CMD_PATH
+echo 'export CLIP_CACHE="'$ROOT_PATH'/work_dirs/.cache/clip"' >> $CMD_PATH
+echo 'export TRANSFORMERS_CACHE="'$ROOT_PATH'/work_dirs/.cache/transformers"' >> $CMD_PATH
+echo 'export MKL_NUM_THREADS=1' >> $CMD_PATH
+echo 'export OMP_NUM_THREADS=1' >> $CMD_PATH
+echo 'export TOKENIZERS_PARALLELISM=false' >> $CMD_PATH
+echo 'export TORCH_DISTRIBUTED_DEBUG=INFO' >> $CMD_PATH
+echo 'export NCCL_IB_GID_INDEX=3' >> $CMD_PATH
+if [ $BUSINESS_FLAG = "TaiJi_HYAide_BUFFER_SH_A800H" ]; then
+    echo 'export NCCL_IB_GID_INDEX=3' >> $CMD_PATH
+    echo 'export NCCL_IB_SL=3' >> $CMD_PATH
+    echo 'export NCCL_CHECKS_DISABLE=1' >> $CMD_PATH
+    echo 'export NCCL_P2P_DISABLE=0' >> $CMD_PATH
+    echo 'export NCCL_IB_DISABLE=0' >> $CMD_PATH
+    echo 'export NCCL_LL_THRESHOLD=16384' >> $CMD_PATH
+    echo 'export NCCL_IB_CUDA_SUPPORT=1' >> $CMD_PATH
+    echo 'export NCCL_SOCKET_IFNAME=bond1' >> $CMD_PATH
+    echo 'export UCX_NET_DEVICES=bond1' >> $CMD_PATH
+    echo 'export NCCL_IB_HCA=mlx5_bond_1,mlx5_bond_5,mlx5_bond_3,mlx5_bond_7,mlx5_bond_4,mlx5_bond_8,mlx5_bond_2,mlx5_bond_6' >> $CMD_PATH
+    echo 'export NCCL_COLLNET_ENABLE=0' >> $CMD_PATH
+    echo 'export SHARP_COLL_ENABLE_SAT=0' >> $CMD_PATH
+    echo 'export NCCL_NET_GDR_LEVEL=2' >> $CMD_PATH
+    echo 'export NCCL_IB_QPS_PER_CONNECTION=4' >> $CMD_PATH
+    echo 'export NCCL_IB_TC=160' >> $CMD_PATH
+    echo 'export NCCL_PXN_DISABLE=1' >> $CMD_PATH
+fi
+echo ${@:4} >> $CMD_PATH
+chmod +x $CMD_PATH
+rm -f $CONF_PATH
+#INIT_CMD="jizhi_client mount -bf TEG_AILab_CVC_chongqing -tk $TOKEN"
+INIT_CMD=""
+echo '{' > $CONF_PATH
+echo '"Token": "'$TOKEN'",' >> $CONF_PATH
+echo '"business_flag": "'$BUSINESS_FLAG'",' >> $CONF_PATH
+echo '"model_local_file_path": "'$ROOT_PATH'/'$CMD_PATH'",' >> $CONF_PATH
+echo '"host_num": '$1',' >> $CONF_PATH
+echo '"host_gpu_num": '$2',' >> $CONF_PATH
+echo '"task_flag": "'$3'_'$UUID'",' >> $CONF_PATH
+echo '"priority_level": "'$PRIORITY_LEVEL'",' >> $CONF_PATH
+echo '"elastic_level": '$ELASTIC_LEVEL',' >> $CONF_PATH
+echo '"cuda_version": "'$CUDA'",' >> $CONF_PATH
+echo '"image_full_name": "'$IMAGE_FULL_NAME'",' >> $CONF_PATH
+echo '"GPUName": "'$GPU_NAME'",' >> $CONF_PATH
+echo '"mount_ceph_business_flag": "'$CEPH_BUSINESS_FLAG'",' >> $CONF_PATH
+echo '"exec_start_in_all_mpi_pods": true,' >> $CONF_PATH
+echo '"enable_rdma": '$RDMA',' >> $CONF_PATH
+echo '"init_cmd": "'$INIT_CMD'",' >> $CONF_PATH
+echo '"envs": {' >> $CONF_PATH
+echo '    "HUNYUAN_TASK_CATEGORY": "LLM",' >> $CONF_PATH
+echo '    "HUNYUAN_TASK_MODEL_TYPE": "SFT",' >> $CONF_PATH
+echo '    "HUNYUAN_TASK_DOMAIN": "NLP",' >> $CONF_PATH
+echo '    "HUNYUAN_TASK_START_MODEL_TYPE": "7B冷启"}' >> $CONF_PATH
+echo '}' >> $CONF_PATH
+jizhi_client start -scfg $CONF_PATH
+rm -f $CMD_PATH
+rm -f $CONF_PATH

third_party/mmyolo/.circleci/config.yml ADDED Viewed

	@@ -0,0 +1,34 @@

+version: 2.1
+# this allows you to use CircleCI's dynamic configuration feature
+setup: true
+# the path-filtering orb is required to continue a pipeline based on
+# the path of an updated fileset
+orbs:
+  path-filtering: circleci/[email protected]
+workflows:
+  # the always-run workflow is always triggered, regardless of the pipeline parameters.
+  always-run:
+    jobs:
+      # the path-filtering/filter job determines which pipeline
+      # parameters to update.
+      - path-filtering/filter:
+          name: check-updated-files
+          # 3-column, whitespace-delimited mapping. One mapping per
+          # line:
+          # <regex path-to-test> <parameter-to-set> <value-of-pipeline-parameter>
+          mapping: |
+            mmyolo/.* lint_only false
+            requirements/.* lint_only false
+            tests/.* lint_only false
+            tools/.* lint_only false
+            configs/.* lint_only false
+            .circleci/.* lint_only false
+          base-revision: main
+          # this is the path of the configuration we should trigger once
+          # path filtering and pipeline parameter value updates are
+          # complete. In this case, we are using the parent dynamic
+          # configuration itself.
+          config-path: .circleci/test.yml

third_party/mmyolo/.circleci/docker/Dockerfile ADDED Viewed

	@@ -0,0 +1,11 @@

+ARG PYTORCH="1.8.1"
+ARG CUDA="10.2"
+ARG CUDNN="7"
+FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel
+# To fix GPG key error when running apt-get update
+RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
+RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub
+RUN apt-get update && apt-get install -y ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 libgl1-mesa-glx

third_party/mmyolo/.circleci/test.yml ADDED Viewed

	@@ -0,0 +1,213 @@

+version: 2.1
+# the default pipeline parameters, which will be updated according to
+# the results of the path-filtering orb
+parameters:
+  lint_only:
+    type: boolean
+    default: true
+jobs:
+  lint:
+    docker:
+      - image: cimg/python:3.7.4
+    steps:
+      - checkout
+      - run:
+          name: Install pre-commit hook
+          command: |
+            pip install pre-commit
+            pre-commit install
+      - run:
+          name: Linting
+          command: pre-commit run --all-files
+      - run:
+          name: Check docstring coverage
+          command: |
+            pip install interrogate
+            interrogate -v --ignore-init-method --ignore-module --ignore-nested-functions --ignore-magic --ignore-regex "__repr__" --fail-under 90 mmyolo
+  build_cpu:
+    parameters:
+      # The python version must match available image tags in
+      # https://circleci.com/developer/images/image/cimg/python
+      python:
+        type: string
+      torch:
+        type: string
+      torchvision:
+        type: string
+    docker:
+      - image: cimg/python:<< parameters.python >>
+    resource_class: large
+    steps:
+      - checkout
+      - run:
+          name: Install Libraries
+          command: |
+            sudo apt-get update
+            sudo apt-get install -y ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 libgl1-mesa-glx libjpeg-dev zlib1g-dev libtinfo-dev libncurses5
+      - run:
+          name: Configure Python & pip
+          command: |
+            pip install --upgrade pip
+            pip install wheel
+      - run:
+          name: Install PyTorch
+          command: |
+            python -V
+            pip install torch==<< parameters.torch >>+cpu torchvision==<< parameters.torchvision >>+cpu -f https://download.pytorch.org/whl/torch_stable.html
+      - run:
+          name: Install ONNXRuntime
+          command: |
+            pip install onnxruntime==1.8.1
+            wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
+            tar xvf onnxruntime-linux-x64-1.8.1.tgz
+      - run:
+          name: Install mmyolo dependencies
+          command: |
+            pip install -U openmim
+            mim install git+https://github.com/open-mmlab/mmengine.git@main
+            mim install 'mmcv >= 2.0.0'
+            mim install git+https://github.com/open-mmlab/[email protected]
+            pip install -r requirements/albu.txt
+            pip install -r requirements/tests.txt
+      - run:
+          name: Install mmdeploy
+          command: |
+            pip install setuptools
+            git clone -b dev-1.x --depth 1 https://github.com/open-mmlab/mmdeploy.git mmdeploy --recurse-submodules
+            wget https://github.com/Kitware/CMake/releases/download/v3.20.0/cmake-3.20.0-linux-x86_64.tar.gz
+            tar -xzvf cmake-3.20.0-linux-x86_64.tar.gz
+            sudo ln -sf $(pwd)/cmake-3.20.0-linux-x86_64/bin/* /usr/bin/
+            cd mmdeploy && mkdir build && cd build && cmake .. -DMMDEPLOY_TARGET_BACKENDS=ort -DONNXRUNTIME_DIR=/home/circleci/project/onnxruntime-linux-x64-1.8.1 && make -j8 && make install
+            export LD_LIBRARY_PATH=/home/circleci/project/onnxruntime-linux-x64-1.8.1/lib:${LD_LIBRARY_PATH}
+            cd /home/circleci/project/mmdeploy && python -m pip install -v -e .
+      - run:
+          name: Build and install
+          command: |
+            pip install -e .
+      - run:
+          name: Run unittests
+          command: |
+            export LD_LIBRARY_PATH=/home/circleci/project/onnxruntime-linux-x64-1.8.1/lib:${LD_LIBRARY_PATH}
+            pytest tests/
+#            coverage run --branch --source mmyolo -m pytest tests/
+#            coverage xml
+#            coverage report -m
+  build_cuda:
+    parameters:
+      torch:
+        type: string
+      cuda:
+        type: enum
+        enum: ["10.1", "10.2", "11.0", "11.7"]
+      cudnn:
+        type: integer
+        default: 7
+    machine:
+      image: ubuntu-2004-cuda-11.4:202110-01
+      # docker_layer_caching: true
+    resource_class: gpu.nvidia.small
+    steps:
+      - checkout
+      - run:
+          # Cloning repos in VM since Docker doesn't have access to the private key
+          name: Clone Repos
+          command: |
+            git clone -b main --depth 1 https://github.com/open-mmlab/mmengine.git /home/circleci/mmengine
+            git clone -b dev-3.x --depth 1 https://github.com/open-mmlab/mmdetection.git /home/circleci/mmdetection
+      - run:
+          name: Build Docker image
+          command: |
+            docker build .circleci/docker -t mmyolo:gpu --build-arg PYTORCH=<< parameters.torch >> --build-arg CUDA=<< parameters.cuda >> --build-arg CUDNN=<< parameters.cudnn >>
+            docker run --gpus all -t -d -v /home/circleci/project:/mmyolo -v /home/circleci/mmengine:/mmengine -v /home/circleci/mmdetection:/mmdetection -w /mmyolo --name mmyolo mmyolo:gpu
+      - run:
+          name: Install mmyolo dependencies
+          command: |
+            docker exec mmyolo pip install -U openmim
+            docker exec mmyolo mim install -e /mmengine
+            docker exec mmyolo mim install 'mmcv >= 2.0.0'
+            docker exec mmyolo pip install -e /mmdetection
+            docker exec mmyolo pip install -r requirements/albu.txt
+            docker exec mmyolo pip install -r requirements/tests.txt
+      - run:
+          name: Build and install
+          command: |
+            docker exec mmyolo pip install -e .
+      - run:
+          name: Run unittests
+          command: |
+            docker exec mmyolo pytest tests/
+workflows:
+  pr_stage_lint:
+    when: << pipeline.parameters.lint_only >>
+    jobs:
+      - lint:
+          name: lint
+          filters:
+            branches:
+              ignore:
+                - main
+  pr_stage_test:
+    when:
+      not: << pipeline.parameters.lint_only >>
+    jobs:
+      - lint:
+          name: lint
+          filters:
+            branches:
+              ignore:
+                - main
+      - build_cpu:
+          name: minimum_version_cpu
+          torch: 1.8.0
+          torchvision: 0.9.0
+          python: 3.8.0 # The lowest python 3.7.x version available on CircleCI images
+          requires:
+            - lint
+      - build_cpu:
+          name: maximum_version_cpu
+          # mmdeploy not supported
+#          torch: 2.0.0
+#          torchvision: 0.15.1
+          torch: 1.12.1
+          torchvision: 0.13.1
+          python: 3.9.0
+          requires:
+            - minimum_version_cpu
+      - hold:
+          type: approval
+          requires:
+            - maximum_version_cpu
+      - build_cuda:
+          name: mainstream_version_gpu
+          torch: 1.8.1
+          # Use double quotation mark to explicitly specify its type
+          # as string instead of number
+          cuda: "10.2"
+          requires:
+            - hold
+      - build_cuda:
+          name: maximum_version_gpu
+          torch: 2.0.0
+          cuda: "11.7"
+          cudnn: 8
+          requires:
+            - hold
+  merge_stage_test:
+    when:
+      not: << pipeline.parameters.lint_only >>
+    jobs:
+      - build_cuda:
+          name: minimum_version_gpu
+          torch: 1.7.0
+          # Use double quotation mark to explicitly specify its type
+          # as string instead of number
+          cuda: "11.0"
+          cudnn: 8
+          filters:
+            branches:
+              only:
+                - main

third_party/mmyolo/.dev_scripts/gather_models.py ADDED Viewed

	@@ -0,0 +1,312 @@

+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import glob
+import os
+import os.path as osp
+import shutil
+import subprocess
+import time
+from collections import OrderedDict
+import torch
+import yaml
+from mmengine.config import Config
+from mmengine.fileio import dump
+from mmengine.utils import mkdir_or_exist, scandir
+def ordered_yaml_dump(data, stream=None, Dumper=yaml.SafeDumper, **kwds):
+    class OrderedDumper(Dumper):
+        pass
+    def _dict_representer(dumper, data):
+        return dumper.represent_mapping(
+            yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, data.items())
+    OrderedDumper.add_representer(OrderedDict, _dict_representer)
+    return yaml.dump(data, stream, OrderedDumper, **kwds)
+def process_checkpoint(in_file, out_file):
+    checkpoint = torch.load(in_file, map_location='cpu')
+    # remove optimizer for smaller file size
+    if 'optimizer' in checkpoint:
+        del checkpoint['optimizer']
+    if 'message_hub' in checkpoint:
+        del checkpoint['message_hub']
+    if 'ema_state_dict' in checkpoint:
+        del checkpoint['ema_state_dict']
+    for key in list(checkpoint['state_dict']):
+        if key.startswith('data_preprocessor'):
+            checkpoint['state_dict'].pop(key)
+        elif 'priors_base_sizes' in key:
+            checkpoint['state_dict'].pop(key)
+        elif 'grid_offset' in key:
+            checkpoint['state_dict'].pop(key)
+        elif 'prior_inds' in key:
+            checkpoint['state_dict'].pop(key)
+    # if it is necessary to remove some sensitive data in checkpoint['meta'],
+    # add the code here.
+    if torch.__version__ >= '1.6':
+        torch.save(checkpoint, out_file, _use_new_zipfile_serialization=False)
+    else:
+        torch.save(checkpoint, out_file)
+    sha = subprocess.check_output(['sha256sum', out_file]).decode()
+    final_file = out_file.rstrip('.pth') + f'-{sha[:8]}.pth'
+    subprocess.Popen(['mv', out_file, final_file])
+    return final_file
+def is_by_epoch(config):
+    cfg = Config.fromfile('./configs/' + config)
+    return cfg.train_cfg.type == 'EpochBasedTrainLoop'
+def get_final_epoch_or_iter(config):
+    cfg = Config.fromfile('./configs/' + config)
+    if cfg.train_cfg.type == 'EpochBasedTrainLoop':
+        return cfg.train_cfg.max_epochs
+    else:
+        return cfg.train_cfg.max_iters
+def get_best_epoch_or_iter(exp_dir):
+    best_epoch_iter_full_path = list(
+        sorted(glob.glob(osp.join(exp_dir, 'best_*.pth'))))[-1]
+    best_epoch_or_iter_model_path = best_epoch_iter_full_path.split('/')[-1]
+    best_epoch_or_iter = best_epoch_or_iter_model_path. \
+        split('_')[-1].split('.')[0]
+    return best_epoch_or_iter_model_path, int(best_epoch_or_iter)
+def get_real_epoch_or_iter(config):
+    cfg = Config.fromfile('./configs/' + config)
+    if cfg.train_cfg.type == 'EpochBasedTrainLoop':
+        epoch = cfg.train_cfg.max_epochs
+        return epoch
+    else:
+        return cfg.runner.max_iters
+def get_final_results(log_json_path,
+                      epoch_or_iter,
+                      results_lut='coco/bbox_mAP',
+                      by_epoch=True):
+    result_dict = dict()
+    with open(log_json_path) as f:
+        r = f.readlines()[-1]
+        last_metric = r.split(',')[0].split(': ')[-1].strip()
+    result_dict[results_lut] = last_metric
+    return result_dict
+def get_dataset_name(config):
+    # If there are more dataset, add here.
+    name_map = dict(
+        CityscapesDataset='Cityscapes',
+        CocoDataset='COCO',
+        PoseCocoDataset='COCO Person',
+        YOLOv5CocoDataset='COCO',
+        CocoPanopticDataset='COCO',
+        YOLOv5DOTADataset='DOTA 1.0',
+        DeepFashionDataset='Deep Fashion',
+        LVISV05Dataset='LVIS v0.5',
+        LVISV1Dataset='LVIS v1',
+        VOCDataset='Pascal VOC',
+        YOLOv5VOCDataset='Pascal VOC',
+        WIDERFaceDataset='WIDER Face',
+        OpenImagesDataset='OpenImagesDataset',
+        OpenImagesChallengeDataset='OpenImagesChallengeDataset')
+    cfg = Config.fromfile('./configs/' + config)
+    return name_map[cfg.dataset_type]
+def find_last_dir(model_dir):
+    dst_times = []
+    for time_stamp in os.scandir(model_dir):
+        if osp.isdir(time_stamp):
+            dst_time = time.mktime(
+                time.strptime(time_stamp.name, '%Y%m%d_%H%M%S'))
+            dst_times.append([dst_time, time_stamp.name])
+    return max(dst_times, key=lambda x: x[0])[1]
+def convert_model_info_to_pwc(model_infos):
+    pwc_files = {}
+    for model in model_infos:
+        cfg_folder_name = osp.split(model['config'])[-2]
+        pwc_model_info = OrderedDict()
+        pwc_model_info['Name'] = osp.split(model['config'])[-1].split('.')[0]
+        pwc_model_info['In Collection'] = 'Please fill in Collection name'
+        pwc_model_info['Config'] = osp.join('configs', model['config'])
+        # get metadata
+        meta_data = OrderedDict()
+        if 'epochs' in model:
+            meta_data['Epochs'] = get_real_epoch_or_iter(model['config'])
+        else:
+            meta_data['Iterations'] = get_real_epoch_or_iter(model['config'])
+        pwc_model_info['Metadata'] = meta_data
+        # get dataset name
+        dataset_name = get_dataset_name(model['config'])
+        # get results
+        results = []
+        # if there are more metrics, add here.
+        if 'bbox_mAP' in model['results']:
+            metric = round(model['results']['bbox_mAP'] * 100, 1)
+            results.append(
+                OrderedDict(
+                    Task='Object Detection',
+                    Dataset=dataset_name,
+                    Metrics={'box AP': metric}))
+        if 'segm_mAP' in model['results']:
+            metric = round(model['results']['segm_mAP'] * 100, 1)
+            results.append(
+                OrderedDict(
+                    Task='Instance Segmentation',
+                    Dataset=dataset_name,
+                    Metrics={'mask AP': metric}))
+        if 'PQ' in model['results']:
+            metric = round(model['results']['PQ'], 1)
+            results.append(
+                OrderedDict(
+                    Task='Panoptic Segmentation',
+                    Dataset=dataset_name,
+                    Metrics={'PQ': metric}))
+        pwc_model_info['Results'] = results
+        link_string = 'https://download.openmmlab.com/mmyolo/v0/'
+        link_string += '{}/{}'.format(model['config'].rstrip('.py'),
+                                      osp.split(model['model_path'])[-1])
+        pwc_model_info['Weights'] = link_string
+        if cfg_folder_name in pwc_files:
+            pwc_files[cfg_folder_name].append(pwc_model_info)
+        else:
+            pwc_files[cfg_folder_name] = [pwc_model_info]
+    return pwc_files
+def parse_args():
+    parser = argparse.ArgumentParser(description='Gather benchmarked models')
+    parser.add_argument(
+        'root',
+        type=str,
+        help='root path of benchmarked models to be gathered')
+    parser.add_argument(
+        'out', type=str, help='output path of gathered models to be stored')
+    parser.add_argument(
+        '--best',
+        action='store_true',
+        help='whether to gather the best model.')
+    args = parser.parse_args()
+    return args
+# TODO: Refine
+def main():
+    args = parse_args()
+    models_root = args.root
+    models_out = args.out
+    mkdir_or_exist(models_out)
+    # find all models in the root directory to be gathered
+    raw_configs = list(scandir('./configs', '.py', recursive=True))
+    # filter configs that is not trained in the experiments dir
+    used_configs = []
+    for raw_config in raw_configs:
+        if osp.exists(osp.join(models_root, raw_config)):
+            used_configs.append(raw_config)
+    print(f'Find {len(used_configs)} models to be gathered')
+    # find final_ckpt and log file for trained each config
+    # and parse the best performance
+    model_infos = []
+    for used_config in used_configs:
+        exp_dir = osp.join(models_root, used_config)
+        by_epoch = is_by_epoch(used_config)
+        # check whether the exps is finished
+        if args.best is True:
+            final_model, final_epoch_or_iter = get_best_epoch_or_iter(exp_dir)
+        else:
+            final_epoch_or_iter = get_final_epoch_or_iter(used_config)
+            final_model = '{}_{}.pth'.format('epoch' if by_epoch else 'iter',
+                                             final_epoch_or_iter)
+        model_path = osp.join(exp_dir, final_model)
+        # skip if the model is still training
+        if not osp.exists(model_path):
+            continue
+        # get the latest logs
+        latest_exp_name = find_last_dir(exp_dir)
+        latest_exp_json = osp.join(exp_dir, latest_exp_name, 'vis_data',
+                                   latest_exp_name + '.json')
+        model_performance = get_final_results(
+            latest_exp_json, final_epoch_or_iter, by_epoch=by_epoch)
+        if model_performance is None:
+            continue
+        model_info = dict(
+            config=used_config,
+            results=model_performance,
+            final_model=final_model,
+            latest_exp_json=latest_exp_json,
+            latest_exp_name=latest_exp_name)
+        model_info['epochs' if by_epoch else 'iterations'] = \
+            final_epoch_or_iter
+        model_infos.append(model_info)
+    # publish model for each checkpoint
+    publish_model_infos = []
+    for model in model_infos:
+        model_publish_dir = osp.join(models_out, model['config'].rstrip('.py'))
+        mkdir_or_exist(model_publish_dir)
+        model_name = osp.split(model['config'])[-1].split('.')[0]
+        model_name += '_' + model['latest_exp_name']
+        publish_model_path = osp.join(model_publish_dir, model_name)
+        trained_model_path = osp.join(models_root, model['config'],
+                                      model['final_model'])
+        # convert model
+        final_model_path = process_checkpoint(trained_model_path,
+                                              publish_model_path)
+        # copy log
+        shutil.copy(model['latest_exp_json'],
+                    osp.join(model_publish_dir, f'{model_name}.log.json'))
+        # copy config to guarantee reproducibility
+        config_path = model['config']
+        config_path = osp.join(
+            'configs',
+            config_path) if 'configs' not in config_path else config_path
+        target_config_path = osp.split(config_path)[-1]
+        shutil.copy(config_path, osp.join(model_publish_dir,
+                                          target_config_path))
+        model['model_path'] = final_model_path
+        publish_model_infos.append(model)
+    models = dict(models=publish_model_infos)
+    print(f'Totally gathered {len(publish_model_infos)} models')
+    dump(models, osp.join(models_out, 'model_info.json'))
+    pwc_files = convert_model_info_to_pwc(publish_model_infos)
+    for name in pwc_files:
+        with open(osp.join(models_out, name + '_metafile.yml'), 'w') as f:
+            ordered_yaml_dump(pwc_files[name], f, encoding='utf-8')
+if __name__ == '__main__':
+    main()

third_party/mmyolo/.dev_scripts/print_registers.py ADDED Viewed

	@@ -0,0 +1,448 @@

+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import importlib
+import os
+import os.path as osp
+import pkgutil
+import sys
+import tempfile
+from multiprocessing import Pool
+from pathlib import Path
+import numpy as np
+import pandas as pd
+# host_addr = 'https://gitee.com/open-mmlab'
+host_addr = 'https://github.com/open-mmlab'
+tools_list = ['tools', '.dev_scripts']
+proxy_names = {
+    'mmdet': 'mmdetection',
+    'mmseg': 'mmsegmentation',
+    'mmcls': 'mmclassification'
+}
+merge_module_keys = {'mmcv': ['mmengine']}
+# exclude_prefix = {'mmcv': ['<class \'mmengine.model.']}
+exclude_prefix = {}
+markdown_title = '# MM 系列开源库注册表\n'
+markdown_title += '（注意：本文档是通过 .dev_scripts/print_registers.py 脚本自动生成）'
+def capitalize(repo_name):
+    lower = repo_name.lower()
+    if lower == 'mmcv':
+        return repo_name.upper()
+    elif lower.startswith('mm'):
+        return 'MM' + repo_name[2:]
+    return repo_name.capitalize()
+def mkdir_or_exist(dir_name, mode=0o777):
+    if dir_name == '':
+        return
+    dir_name = osp.expanduser(dir_name)
+    os.makedirs(dir_name, mode=mode, exist_ok=True)
+def parse_repo_name(repo_name):
+    proxy_names_rev = dict(zip(proxy_names.values(), proxy_names.keys()))
+    repo_name = proxy_names.get(repo_name, repo_name)
+    module_name = proxy_names_rev.get(repo_name, repo_name)
+    return repo_name, module_name
+def git_pull_branch(repo_name, branch_name='', pulldir='.'):
+    mkdir_or_exist(pulldir)
+    exec_str = f'cd {pulldir};git init;git pull '
+    exec_str += f'{host_addr}/{repo_name}.git'
+    if branch_name:
+        exec_str += f' {branch_name}'
+    returncode = os.system(exec_str)
+    if returncode:
+        raise RuntimeError(
+            f'failed to get the remote repo, code: {returncode}')
+def load_modules_from_dir(module_name, module_root, throw_error=False):
+    print(f'loading the {module_name} modules...')
+    # # install the dependencies
+    # if osp.exists(osp.join(pkg_dir, 'requirements.txt')):
+    #     os.system('pip install -r requirements.txt')
+    # get all module list
+    module_list = []
+    error_dict = {}
+    module_root = osp.join(module_root, module_name)
+    assert osp.exists(module_root), \
+        f'cannot find the module root: {module_root}'
+    for _root, _dirs, _files in os.walk(module_root):
+        if (('__init__.py' not in _files)
+                and (osp.split(_root)[1] != '__pycache__')):
+            # add __init__.py file to the package
+            with open(osp.join(_root, '__init__.py'), 'w') as _:
+                pass
+    def _onerror(*args, **kwargs):
+        pass
+    for _finder, _name, _ispkg in pkgutil.walk_packages([module_root],
+                                                        prefix=module_name +
+                                                        '.',
+                                                        onerror=_onerror):
+        try:
+            module = importlib.import_module(_name)
+            module_list.append(module)
+        except Exception as e:
+            if throw_error:
+                raise e
+            _error_msg = f'{type(e)}: {e}.'
+            print(f'cannot import the module: {_name} ({_error_msg})')
+            assert (_name not in error_dict), \
+                f'duplicate error name was found: {_name}'
+            error_dict[_name] = _error_msg
+    for module in module_list:
+        assert module.__file__.startswith(module_root), \
+            f'the importing path of package was wrong: {module.__file__}'
+    print('modules were loaded...')
+    return module_list, error_dict
+def get_registries_from_modules(module_list):
+    registries = {}
+    objects_set = set()
+    # import the Registry class,
+    # import at the beginning is not allowed
+    # because it is not the temp package
+    from mmengine.registry import Registry
+    # only get the specific registries in module list
+    for module in module_list:
+        for obj_name in dir(module):
+            _obj = getattr(module, obj_name)
+            if isinstance(_obj, Registry):
+                objects_set.add(_obj)
+    for _obj in objects_set:
+        if _obj.scope not in registries:
+            registries[_obj.scope] = {}
+        registries_scope = registries[_obj.scope]
+        assert _obj.name not in registries_scope, \
+            f'multiple definition of {_obj.name} in registries'
+        registries_scope[_obj.name] = {
+            key: str(val)
+            for key, val in _obj.module_dict.items()
+        }
+    print('registries got...')
+    return registries
+def merge_registries(src_dict, dst_dict):
+    assert type(src_dict) == type(dst_dict), \
+        (f'merge type is not supported: '
+         f'{type(dst_dict)} and {type(src_dict)}')
+    if isinstance(src_dict, str):
+        return
+    for _k, _v in dst_dict.items():
+        if (_k not in src_dict):
+            src_dict.update({_k: _v})
+        else:
+            assert isinstance(_v, (dict, str)) and \
+                isinstance(src_dict[_k], (dict, str)), \
+                'merge type is not supported: ' \
+                f'{type(_v)} and {type(src_dict[_k])}'
+            merge_registries(src_dict[_k], _v)
+def exclude_registries(registries, exclude_key):
+    for _k in list(registries.keys()):
+        _v = registries[_k]
+        if isinstance(_v, str) and _v.startswith(exclude_key):
+            registries.pop(_k)
+        elif isinstance(_v, dict):
+            exclude_registries(_v, exclude_key)
+def get_scripts_from_dir(root):
+    def _recurse(_dict, _chain):
+        if len(_chain) <= 1:
+            _dict[_chain[0]] = None
+            return
+        _key, *_chain = _chain
+        if _key not in _dict:
+            _dict[_key] = {}
+        _recurse(_dict[_key], _chain)
+    # find all scripts in the root directory. (not just ('.py', '.sh'))
+    # can not use the scandir function in mmengine to scan the dir,
+    # because mmengine import is not allowed before git pull
+    scripts = {}
+    for _subroot, _dirs, _files in os.walk(root):
+        for _file in _files:
+            _script = osp.join(osp.relpath(_subroot, root), _file)
+            _recurse(scripts, Path(_script).parts)
+    return scripts
+def get_version_from_module_name(module_name, branch):
+    branch_str = str(branch) if branch is not None else ''
+    version_str = ''
+    try:
+        exec(f'import {module_name}')
+        _module = eval(f'{module_name}')
+        if hasattr(_module, '__version__'):
+            version_str = str(_module.__version__)
+        else:
+            version_str = branch_str
+        version_str = f' ({version_str})' if version_str else version_str
+    except (ImportError, AttributeError) as e:
+        print(f'can not get the version of module {module_name}: {e}')
+    return version_str
+def print_tree(print_dict):
+    # recursive print the dict tree
+    def _recurse(_dict, _connector='', n=0):
+        assert isinstance(_dict, dict), 'recursive type must be dict'
+        tree = ''
+        for idx, (_key, _val) in enumerate(_dict.items()):
+            sub_tree = ''
+            _last = (idx == (len(_dict) - 1))
+            if isinstance(_val, str):
+                _key += f' ({_val})'
+            elif isinstance(_val, dict):
+                sub_tree = _recurse(_val,
+                                    _connector + ('   ' if _last else '│  '),
+                                    n + 1)
+            else:
+                assert (_val is None), f'unknown print type {_val}'
+            tree += '  ' + _connector + \
+                    ('└─' if _last else '├─') + f'({n}) {_key}' + '\n'
+            tree += sub_tree
+        return tree
+    for _pname, _pdict in print_dict.items():
+        print('-' * 100)
+        print(f'{_pname}\n' + _recurse(_pdict))
+def divide_list_into_groups(_array, _maxsize_per_group):
+    if not _array:
+        return _array
+    _groups = np.asarray(len(_array) / _maxsize_per_group)
+    if len(_array) % _maxsize_per_group:
+        _groups = np.floor(_groups) + 1
+    _groups = _groups.astype(int)
+    return np.array_split(_array, _groups)
+def registries_to_html(registries, title=''):
+    max_col_per_row = 5
+    max_size_per_cell = 20
+    html = ''
+    table_data = []
+    # save repository registries
+    for registry_name, registry_dict in registries.items():
+        # filter the empty registries
+        if not registry_dict:
+            continue
+        registry_strings = []
+        if isinstance(registry_dict, dict):
+            registry_dict = list(registry_dict.keys())
+        elif isinstance(registry_dict, list):
+            pass
+        else:
+            raise TypeError(
+                f'unknown type of registry_dict {type(registry_dict)}')
+        for _k in registry_dict:
+            registry_strings.append(f'<li>{_k}</li>')
+        table_data.append((registry_name, registry_strings))
+    # sort the data list
+    table_data = sorted(table_data, key=lambda x: len(x[1]))
+    # split multi parts
+    table_data_multi_parts = []
+    for (registry_name, registry_strings) in table_data:
+        multi_parts = False
+        if len(registry_strings) > max_size_per_cell:
+            multi_parts = True
+        for cell_idx, registry_cell in enumerate(
+                divide_list_into_groups(registry_strings, max_size_per_cell)):
+            registry_str = ''.join(registry_cell.tolist())
+            registry_str = f'<ul>{registry_str}</ul>'
+            table_data_multi_parts.append([
+                registry_name if not multi_parts else
+                f'{registry_name} (part {cell_idx + 1})', registry_str
+            ])
+    for table_data in divide_list_into_groups(table_data_multi_parts,
+                                              max_col_per_row):
+        table_data = list(zip(*table_data.tolist()))
+        html += dataframe_to_html(
+            pd.DataFrame([table_data[1]], columns=table_data[0]))
+    if html:
+        html = f'<div align=\'center\'><b>{title}</b></div>\n{html}'
+        html = f'<details open>{html}</details>\n'
+    return html
+def tools_to_html(tools_dict, repo_name=''):
+    def _recurse(_dict, _connector, _result):
+        assert isinstance(_dict, dict), \
+            f'unknown recurse type: {_dict} ({type(_dict)})'
+        for _k, _v in _dict.items():
+            if _v is None:
+                if _connector not in _result:
+                    _result[_connector] = []
+                _result[_connector].append(_k)
+            else:
+                _recurse(_v, osp.join(_connector, _k), _result)
+    table_data = {}
+    title = f'{capitalize(repo_name)} Tools'
+    _recurse(tools_dict, '', table_data)
+    return registries_to_html(table_data, title)
+def dataframe_to_html(dataframe):
+    styler = dataframe.style
+    styler = styler.hide(axis='index')
+    styler = styler.format(na_rep='-')
+    styler = styler.set_properties(**{
+        'text-align': 'left',
+        'align': 'center',
+        'vertical-align': 'top'
+    })
+    styler = styler.set_table_styles([{
+        'selector':
+        'thead th',
+        'props':
+        'align:center;text-align:center;vertical-align:bottom'
+    }])
+    html = styler.to_html()
+    html = f'<div align=\'center\'>\n{html}</div>'
+    return html
+def generate_markdown_by_repository(repo_name,
+                                    module_name,
+                                    branch,
+                                    pulldir,
+                                    throw_error=False):
+    # add the pull dir to the system path so that it can be found
+    if pulldir not in sys.path:
+        sys.path.insert(0, pulldir)
+    module_list, error_dict = load_modules_from_dir(
+        module_name, pulldir, throw_error=throw_error)
+    registries_tree = get_registries_from_modules(module_list)
+    if error_dict:
+        error_dict_name = 'error_modules'
+        assert (error_dict_name not in registries_tree), \
+            f'duplicate module name was found: {error_dict_name}'
+        registries_tree.update({error_dict_name: error_dict})
+    # get the tools files
+    for tools_name in tools_list:
+        assert (tools_name not in registries_tree), \
+            f'duplicate tools name was found: {tools_name}'
+        tools_tree = osp.join(pulldir, tools_name)
+        tools_tree = get_scripts_from_dir(tools_tree)
+        registries_tree.update({tools_name: tools_tree})
+    # print_tree(registries_tree)
+    # get registries markdown string
+    module_registries = registries_tree.get(module_name, {})
+    for merge_key in merge_module_keys.get(module_name, []):
+        merge_dict = registries_tree.get(merge_key, {})
+        merge_registries(module_registries, merge_dict)
+    for exclude_key in exclude_prefix.get(module_name, []):
+        exclude_registries(module_registries, exclude_key)
+    markdown_str = registries_to_html(
+        module_registries, title=f'{capitalize(repo_name)} Module Components')
+    # get tools markdown string
+    tools_registries = {}
+    for tools_name in tools_list:
+        tools_registries.update(
+            {tools_name: registries_tree.get(tools_name, {})})
+    markdown_str += tools_to_html(tools_registries, repo_name=repo_name)
+    version_str = get_version_from_module_name(module_name, branch)
+    title_str = f'\n\n## {capitalize(repo_name)}{version_str}\n'
+    # remove the pull dir from system path
+    if pulldir in sys.path:
+        sys.path.remove(pulldir)
+    return f'{title_str}{markdown_str}'
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='print registries in openmmlab repositories')
+    parser.add_argument(
+        '-r',
+        '--repositories',
+        nargs='+',
+        default=['mmdet', 'mmcls', 'mmseg', 'mmengine', 'mmcv'],
+        type=str,
+        help='git repositories name in OpenMMLab')
+    parser.add_argument(
+        '-b',
+        '--branches',
+        nargs='+',
+        default=['3.x', '1.x', '1.x', 'main', '2.x'],
+        type=str,
+        help='the branch names of git repositories, the length of branches '
+        'must be same as the length of repositories')
+    parser.add_argument(
+        '-o', '--out', type=str, default='.', help='output path of the file')
+    parser.add_argument(
+        '--throw-error',
+        action='store_true',
+        default=False,
+        help='whether to throw error when trying to import modules')
+    args = parser.parse_args()
+    return args
+# TODO: Refine
+def main():
+    args = parse_args()
+    repositories = args.repositories
+    branches = args.branches
+    assert isinstance(repositories, list), \
+        'Type of repositories must be list'
+    if branches is None:
+        branches = [None] * len(repositories)
+    assert isinstance(branches, list) and \
+           len(branches) == len(repositories), \
+           'The length of branches must be same as ' \
+           'that of repositories'
+    assert isinstance(args.out, str), \
+        'The type of output path must be string'
+    # save path of file
+    mkdir_or_exist(args.out)
+    save_path = osp.join(args.out, 'registries_info.md')
+    with tempfile.TemporaryDirectory() as tmpdir:
+        # multi process init
+        pool = Pool(processes=len(repositories))
+        multi_proc_input_list = []
+        multi_proc_output_list = []
+        # get the git repositories
+        for branch, repository in zip(branches, repositories):
+            repo_name, module_name = parse_repo_name(repository)
+            pulldir = osp.join(tmpdir, f'tmp_{repo_name}')
+            git_pull_branch(
+                repo_name=repo_name, branch_name=branch, pulldir=pulldir)
+            multi_proc_input_list.append(
+                (repo_name, module_name, branch, pulldir, args.throw_error))
+        print('starting the multi process to get the registries')
+        for multi_proc_input in multi_proc_input_list:
+            multi_proc_output_list.append(
+                pool.apply_async(generate_markdown_by_repository,
+                                 multi_proc_input))
+        pool.close()
+        pool.join()
+        with open(save_path, 'w', encoding='utf-8') as fw:
+            fw.write(f'{markdown_title}\n')
+            for multi_proc_output in multi_proc_output_list:
+                markdown_str = multi_proc_output.get()
+                fw.write(f'{markdown_str}\n')
+    print(f'saved registries to the path: {save_path}')
+if __name__ == '__main__':
+    main()

third_party/mmyolo/.github/CODE_OF_CONDUCT.md ADDED Viewed

	@@ -0,0 +1,76 @@

+# Contributor Covenant Code of Conduct
+## Our Pledge
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to making participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+## Our Standards
+Examples of behavior that contributes to creating a positive environment
+include:
+- Using welcoming and inclusive language
+- Being respectful of differing viewpoints and experiences
+- Gracefully accepting constructive criticism
+- Focusing on what is best for the community
+- Showing empathy towards other community members
+Examples of unacceptable behavior by participants include:
+- The use of sexualized language or imagery and unwelcome sexual attention or
+  advances
+- Trolling, insulting/derogatory comments, and personal or political attacks
+- Public or private harassment
+- Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+- Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+## Our Responsibilities
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+## Scope
+This Code of Conduct applies both within project spaces and in public spaces
+when an individual is representing the project or its community. Examples of
+representing a project or community include using an official project e-mail
+address, posting via an official social media account, or acting as an appointed
+representative at an online or offline event. Representation of a project may be
+further defined and clarified by project maintainers.
+## Enforcement
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at [email protected]. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+## Attribution
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq
+[homepage]: https://www.contributor-covenant.org

third_party/mmyolo/.github/CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ We appreciate all contributions to improve MMYOLO. Please refer to [CONTRIBUTING.md](https://github.com/open-mmlab/mmcv/blob/master/CONTRIBUTING.md) in MMCV for more details about the contributing guideline.

third_party/mmyolo/.github/ISSUE_TEMPLATE/1-bug-report.yml ADDED Viewed

	@@ -0,0 +1,67 @@

+name: "🐞 Bug report"
+description: "Create a report to help us reproduce and fix the bug"
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thank you for reporting this issue to help us improve!
+        If you have already identified the reason, we strongly appreciate you creating a new PR to fix it [here](https://github.com/open-mmlab/mmyolo/pulls)!
+        If this issue is about installing MMCV, please file an issue at [MMCV](https://github.com/open-mmlab/mmcv/issues/new/choose).
+        If you need our help, please fill in as much of the following form as you're able.
+  - type: checkboxes
+    attributes:
+      label: Prerequisite
+      description: Please check the following items before creating a new issue.
+      options:
+      - label: I have searched [the existing and past issues](https://github.com/open-mmlab/mmyolo/issues) but cannot get the expected help.
+        required: true
+      - label: I have read the [FAQ documentation](https://mmyolo.readthedocs.io/en/latest/faq.html) but cannot get the expected help.
+        required: true
+      - label: The bug has not been fixed in the [latest version](https://github.com/open-mmlab/mmyolo).
+        required: true
+  - type: textarea
+    attributes:
+      label: 🐞 Describe the bug
+      description: |
+        Please provide a clear and concise description of what the bug is.
+        Preferably a simple and minimal code snippet that we can reproduce the error by running the code.
+      placeholder: |
+        A clear and concise description of what the bug is.
+        ```python
+        # Sample code to reproduce the problem
+        ```
+        ```shell
+        The command or script you run.
+        ```
+        ```
+        The error message or logs you got, with the full traceback.
+        ```
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: Environment
+      description: |
+        Please run `python mmyolo/utils/collect_env.py` to collect necessary environment information and paste it here.
+        You may add addition that may be helpful for locating the problem, such as
+          - How you installed PyTorch \[e.g., pip, conda, source\]
+          - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: Additional information
+      description: Tell us anything else you think we should know.
+      placeholder: |
+        1. Did you make any modifications on the code or config? Did you understand what you have modified?
+        2. What dataset did you use?
+        3. What do you think might be the reason?

third_party/mmyolo/.github/ISSUE_TEMPLATE/2-feature-request.yml ADDED Viewed

	@@ -0,0 +1,32 @@

+name: 🚀 Feature request
+description: Suggest an idea for this project
+labels: [feature request]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thank you for suggesting an idea to make MMYOLO better.
+        We strongly appreciate you creating a PR to implete this feature [here](https://github.com/open-mmlab/mmyolo/pulls)!
+        If you need our help, please fill in as much of the following form as you're able.
+  - type: textarea
+    attributes:
+      label: What is the problem this feature will solve?
+      placeholder: |
+        E.g., It is inconvenient when \[....\].
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: What is the feature you are proposing to solve the problem?
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: What alternatives have you considered?
+      description: |
+        Add any other context or screenshots about the feature request here.

third_party/mmyolo/.github/ISSUE_TEMPLATE/3-new-model.yml ADDED Viewed

	@@ -0,0 +1,30 @@

+name: "\U0001F31F New model/dataset addition"
+description: Submit a proposal/request to implement a new model / dataset
+labels: [ "New model/dataset" ]
+body:
+  - type: textarea
+    id: description-request
+    validations:
+      required: true
+    attributes:
+      label: Model/Dataset description
+      description: |
+        Put any and all important information relative to the model/dataset
+  - type: checkboxes
+    attributes:
+      label: Open source status
+      description: |
+          Please provide the open-source status, which would be very helpful
+      options:
+        - label: "The model implementation is available"
+        - label: "The model weights are available."
+  - type: textarea
+    id: additional-info
+    attributes:
+      label: Provide useful links for the implementation
+      description: |
+        Please provide information regarding the implementation, the weights, and the authors.
+        Please mention the authors by @gh-username if you're aware of their usernames.

third_party/mmyolo/.github/ISSUE_TEMPLATE/4-documentation.yml ADDED Viewed

	@@ -0,0 +1,22 @@

+name: 📚 Documentation
+description: Report an issue related to https://mmyolo.readthedocs.io/en/latest/.
+body:
+- type: textarea
+  attributes:
+    label: 📚 The doc issue
+    description: >
+      A clear and concise description of what content in https://mmyolo.readthedocs.io/en/latest/ is an issue.
+  validations:
+    required: true
+- type: textarea
+  attributes:
+    label: Suggest a potential alternative/fix
+    description: >
+      Tell us how we could improve the documentation in this regard.
+- type: markdown
+  attributes:
+    value: >
+      Thanks for contributing 🎉!

third_party/mmyolo/.github/ISSUE_TEMPLATE/5-reimplementation.yml ADDED Viewed

	@@ -0,0 +1,87 @@

+name: "💥 Reimplementation Questions"
+description: "Ask about questions during model reimplementation"
+body:
+  - type: markdown
+    attributes:
+      value: |
+        If you have already identified the reason, we strongly appreciate you creating a new PR to fix it [here](https://github.com/open-mmlab/mmyolo/pulls)!
+  - type: checkboxes
+    attributes:
+      label: Prerequisite
+      description: Please check the following items before creating a new issue.
+      options:
+      - label: I have searched [the existing and past issues](https://github.com/open-mmlab/mmyolo/issues) but cannot get the expected help.
+        required: true
+      - label: I have read the [FAQ documentation](https://mmyolo.readthedocs.io/en/latest/faq.html) but cannot get the expected help.
+        required: true
+      - label: The bug has not been fixed in the [latest version](https://github.com/open-mmlab/mmyolo).
+        required: true
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: 💬 Describe the reimplementation questions
+      description: |
+        A clear and concise description of what the problem you meet and what have you done.
+        There are several common situations in the reimplementation issues as below
+        1. Reimplement a model in the model zoo using the provided configs
+        2. Reimplement a model in the model zoo on other dataset (e.g., custom datasets)
+        3. Reimplement a custom model but all the components are implemented in MMDetection
+        4. Reimplement a custom model with new modules implemented by yourself
+        There are several things to do for different cases as below.
+        - For case 1 & 3, please follow the steps in the following sections thus we could help to quick identify the issue.
+        - For case 2 & 4, please understand that we are not able to do much help here because we usually do not know the full code and the users should be responsible to the code they write.
+        - One suggestion for case 2 & 4 is that the users should first check whether the bug lies in the self-implemented code or the original code. For example, users can first make sure that the same model runs well on supported datasets. If you still need help, please describe what you have done and what you obtain in the issue, and follow the steps in the following sections and try as clear as possible so that we can better help you.
+      placeholder: |
+        A clear and concise description of what the bug is.
+        What config dir you run?
+        ```none
+        A placeholder for the config.
+        ```
+        ```shell
+        The command or script you run.
+        ```
+        ```
+        The error message or logs you got, with the full traceback.
+        ```
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: Environment
+      description: |
+         Please run `python mmyolo/utils/collect_env.py` to collect necessary environment information and paste it here.
+         You may add addition that may be helpful for locating the problem, such as
+            - How you installed PyTorch \[e.g., pip, conda, source\]
+            - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)
+    validations:
+      required: true
+  - type: textarea
+    attributes:
+      label: Expected results
+      description: If applicable, paste the related results here, e.g., what you expect and what you get.
+      placeholder: |
+         ```none
+         A placeholder for results comparison
+         ```
+  - type: textarea
+    attributes:
+      label: Additional information
+      description: Tell us anything else you think we should know.
+      placeholder: |
+        1. Did you make any modifications on the code or config? Did you understand what you have modified?
+        2. What dataset did you use?
+        3. What do you think might be the reason?

third_party/mmyolo/.github/ISSUE_TEMPLATE/config.yml ADDED Viewed

	@@ -0,0 +1,9 @@

+blank_issues_enabled: true
+contact_links:
+  - name: 💬 Forum
+    url: https://github.com/open-mmlab/mmyolo/discussions
+    about: Ask general usage questions and discuss with other MMYOLO community members
+  - name: 🌐 Explore OpenMMLab
+    url: https://openmmlab.com/
+    about: Get know more about OpenMMLab

third_party/mmyolo/.github/pull_request_template.md ADDED Viewed

	@@ -0,0 +1,25 @@

+Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
+## Motivation
+Please describe the motivation for this PR and the goal you want to achieve through this PR.
+## Modification
+Please briefly describe what modification is made in this PR.
+## BC-breaking (Optional)
+Does the modification introduce changes that break the backward compatibility of the downstream repos?
+If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
+## Use cases (Optional)
+If this PR introduces a new feature, it is better to list some use cases here and update the documentation.
+## Checklist
+1. Pre-commit or other linting tools are used to fix potential lint issues.
+2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
+3. If the modification has a potential influence on downstream projects, this PR should be tested with downstream projects, like MMDetection or MMClassification.
+4. The documentation has been modified accordingly, like docstring or example tutorials.

third_party/mmyolo/.github/workflows/deploy.yml ADDED Viewed

	@@ -0,0 +1,28 @@

+name: deploy
+on: push
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+jobs:
+  build-n-publish:
+    runs-on: ubuntu-latest
+    if: startsWith(github.event.ref, 'refs/tags')
+    steps:
+      - uses: actions/checkout@v2
+      - name: Set up Python 3.7
+        uses: actions/setup-python@v2
+        with:
+          python-version: 3.7
+      - name: Install torch
+        run: pip install torch
+      - name: Install wheel
+        run: pip install wheel
+      - name: Build MMYOLO
+        run: python setup.py sdist bdist_wheel
+      - name: Publish distribution to PyPI
+        run: |
+          pip install twine
+          twine upload dist/* -u __token__ -p ${{ secrets.pypi_password }}

third_party/mmyolo/.gitignore ADDED Viewed

	@@ -0,0 +1,126 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/en/_build/
+docs/zh_cn/_build/
+# PyBuilder
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# pyenv
+.python-version
+# celery beat schedule file
+celerybeat-schedule
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+data/
+data
+.vscode
+.idea
+.DS_Store
+# custom
+*.pkl
+*.pkl.json
+*.log.json
+docs/modelzoo_statistics.md
+mmyolo/.mim
+output/
+work_dirs
+yolov5-6.1/
+# Pytorch
+*.pth
+*.pt
+*.py~
+*.sh~