Spaces:

yuntian-group
/

neural-os

Runtime error

neural-os / MULTI_GPU_SETUP.md

da03

096295a 5 months ago

5.72 kB

	# Multi-GPU Setup Guide

	This guide explains how to run the neural OS demo with multiple GPUs and user queue management.

	## Architecture Overview

	The system has been split into two main components:

	1. Dispatcher (`dispatcher.py`): Handles WebSocket connections, manages user queues, and routes requests to workers
	2. Worker (`worker.py`): Runs the actual model inference on individual GPUs

	## Files Overview

	- `main.py` - Original single-GPU implementation (kept as backup)
	- `dispatcher.py` - Queue management and WebSocket handling
	- `worker.py` - GPU worker for model inference
	- `start_workers.py` - Helper script to start multiple workers
	- `start_system.sh` - Shell script to start the entire system
	- `tail_workers.py` - Script to monitor all worker logs simultaneously
	- `requirements.txt` - Dependencies
	- `static/index.html` - Frontend interface

	## Setup Instructions

	### 1. Install Dependencies

	```bash
	pip install -r requirements.txt
	```

	### 2. Start the Dispatcher

	The dispatcher runs on port 7860 and manages user connections and queues:

	```bash
	python dispatcher.py
	```

	### 3. Start Workers (One per GPU)

	Start one worker for each GPU you want to use. Workers automatically register with the dispatcher.

	#### GPU 0:
	```bash
	python worker.py --gpu-id 0
	```

	#### GPU 1:
	```bash
	python worker.py --gpu-id 1
	```

	#### GPU 2:
	```bash
	python worker.py --gpu-id 2
	```

	And so on for additional GPUs.

	Workers run on ports 8001, 8002, 8003, etc. (8001 + GPU_ID).

	### 4. Access the Application

	Open your browser and go to: `http://localhost:7860`

	## System Behavior

	### Queue Management

	- No Queue: Users get normal timeout behavior (20 seconds of inactivity)
	- With Queue: Users get limited session time (60 seconds) with warnings and grace periods
	- Grace Period: If queue becomes empty during grace period, time limits are removed

	### User Experience

	1. Immediate Access: If GPUs are available, users start immediately
	2. Queue Position: Users see their position and estimated wait time
	3. Session Warnings: Users get warnings when their time is running out
	4. Grace Period: 10-second countdown when session time expires, but if queue empties, users can continue
	5. Queue Updates: Real-time updates on queue position every 5 seconds

	### Worker Management

	- Workers automatically register with the dispatcher on startup
	- Workers send periodic pings (every 10 seconds) to maintain connection
	- Workers handle session cleanup when users disconnect
	- Each worker can handle one session at a time

	### Input Queue Optimization

	The system implements intelligent input filtering to maintain performance:

	- Queue Management: Each worker maintains an input queue per session
	- Interesting Input Detection: The system identifies "interesting" inputs (clicks, key presses) vs. uninteresting ones (mouse movements)
	- Smart Processing: When multiple inputs are queued:
	- Processes "interesting" inputs immediately, skipping boring mouse movements
	- If no interesting inputs are found, processes the latest mouse position
	- This prevents the system from getting bogged down processing every mouse movement
	- Performance: Maintains responsiveness even during rapid mouse movements

	## Configuration

	### Dispatcher Settings (in `dispatcher.py`)

	```python
	self.IDLE_TIMEOUT = 20.0 # When no queue
	self.QUEUE_WARNING_TIME = 10.0
	self.MAX_SESSION_TIME_WITH_QUEUE = 60.0 # When there's a queue
	self.QUEUE_SESSION_WARNING_TIME = 45.0 # 15 seconds before timeout
	self.GRACE_PERIOD = 10.0
	```

	### Worker Settings (in `worker.py`)

	```python
	self.MODEL_NAME = "yuntian-deng/computer-model-s-newnewd-freezernn-origunet-nospatial-online-x0-joint-onlineonly-222222k7-06k"
	self.SCREEN_WIDTH = 512
	self.SCREEN_HEIGHT = 384
	self.NUM_SAMPLING_STEPS = 32
	self.USE_RNN = False
	```

	## Monitoring

	### Health Checks

	Check worker health:
	```bash
	curl http://localhost:8001/health # GPU 0
	curl http://localhost:8002/health # GPU 1
	```

	### Logs

	The system provides detailed logging for debugging and monitoring:

	Dispatcher logs:
	- `dispatcher.log` - All dispatcher activity, session management, queue operations

	Worker logs:
	- `workers.log` - Summary output from the worker startup script
	- `worker_gpu_0.log` - Detailed logs from GPU 0 worker
	- `worker_gpu_1.log` - Detailed logs from GPU 1 worker
	- `worker_gpu_N.log` - Detailed logs from GPU N worker

	Monitor all worker logs:
	```bash
	# Tail all worker logs simultaneously
	python tail_workers.py --num-gpus 2

	# Or monitor individual workers
	tail -f worker_gpu_0.log
	tail -f worker_gpu_1.log
	```

	## Troubleshooting

	### Common Issues

	1. Worker not registering: Check that dispatcher is running first
	2. GPU memory issues: Ensure each worker is assigned to a different GPU
	3. Port conflicts: Make sure ports 7860, 8001, 8002, etc. are available
	4. Model loading errors: Check that model files and configurations are present

	### Debug Mode

	Enable debug logging by setting log level in both files:
	```python
	logging.basicConfig(level=logging.DEBUG)
	```

	## Scaling

	To add more GPUs:
	1. Start additional workers with higher GPU IDs
	2. Workers automatically register with the dispatcher
	3. Queue processing automatically utilizes all available workers

	The system scales horizontally - add as many workers as you have GPUs available.

	## API Endpoints

	### Dispatcher
	- `GET /` - Serve the web interface
	- `WebSocket /ws` - User connections
	- `POST /register_worker` - Worker registration
	- `POST /worker_ping` - Worker health pings

	### Worker
	- `POST /process_input` - Process user input
	- `POST /end_session` - Clean up session
	- `GET /health` - Health check