Distributed Object detection training - notnamegg/tensorflow-model GitHub Wiki
Usage
Prepare Container
# docker pull notname/tensorflow-model:[tag
]
# docker run --runtime=nvidia -it -p 3000:3000 -p 3001:3001 -p 3002:3002
-v [your log dir
]:/tmp/trainlog --name [your container name
] -d
-v [your dataset dir
]:/data/dataset notname/tensorflow-model:[tag
]
Modify your config in /data/notnamegg-tensorflow-Distributed-training/models/research/object_detection/samples/configs/[your config
]
Run in each host
#docker exec -it [your container name
] bash
#cd /data/notnamegg-tensorflow-Distributed-training/models/research
#python object_detection/distributed_train.py --type=[master/ps/worker
] --id=0 --master=192.168.1.28 --ps=192.168.1.28 --train_dir=/tmp/train_log --pipeline_config_path=object_detection/samples/configs/[your config
]
Result
When all rule host get connected, training will start.
The training result will be in [your log dir
] in your host.