structure.md

October 29, 2024 · View on GitHub
InternEvo System Structure

The system code file structure is shown below:
├── configs                                  # Configuration module, managing model and training-related parameters
│   └── 7B_sft.py                            # 7B_sft.py is a sample configuration file for the system demo
├── internlm                                 # Main directory of the system code
│   ├── apis                                 # Interface module, containing some interface functions related to inference, etc.
│   ├── core                                 # Core module, managing parallel context and training scheduling engine for training and inference
│   │   ├── communication                    # Communication module, responsible for p2p communication in pipeline parallel scheduling
│   │   ├── context                          # Context module, mainly responsible for initializing parallel process groups and managing parallel context
│   │   │   ├── parallel_context.py
│   │   │   └── process_group_initializer.py
│   │   ├── scheduler                        # Scheduling module, which manages schedulers for parallel training, including non-pipeline and pipeline parallel schedulers
│   │   │   ├── no_pipeline_scheduler.py
│   │   │   ├── pipeline_scheduler_1f1b.py
│   │   │   └── pipeline_scheduler_zb.py
│   │   ├── engine.py                        # Responsible for managing the training and evaluation process of the model
│   │   └── trainer.py                       # Responsible for managing the training engine and scheduler
│   ├── data                                 # Data module, responsible for managing dataset generation and processing
│   ├── initialize                           # Initialization module, responsible for managing distributed environment startup and trainer initialization
│   ├── model                                # Model module, responsible for managing model structure definition and implementation
│   ├── solver                               # Responsible for managing the implementation of optimizer and lr_scheduler, etc.
│   └── utils                                # Auxiliary module, responsible for managing logs, storage, model registration, etc.
├── train.py                                 # Main function entry file for model training
├── requirements                             # List of dependent packages for system running
├── third_party                              # Third-party modules on which the system depends, including apex and flash-attention, etc.
├── tools                                    # Some script tools for processing and converting raw datasets, model checkpoint conversion, etc.
└── version.txt                              # System version number