PaddlePaddle CI on TeamCity - PaddlePaddle/Paddle GitHub Wiki
- I need to restart my build
- I need to check my build log
- I need to setup the whole CI system
- I need to add an agent
- I need to backup/restore teamcity
- I need to upgrade teamcity
- General trouble shooting
- Add build agent license
Continuous Integration (CI) is one of the formative concepts, driven by a need to regularly integrate new and changed code back into the master repository. Teams compile software, and run it through a series of tests in a production-identical development environment to ensure the success of the build.
TeamCity is a proprietary offering by JetBrains, which serve as CI tools that allow developers to integrate code branches during the development process and run a series of automated tests against them.
To reduce the burden of deploying and maintaining PaddlePaddle, we adopted a novel approach whereby we specifically built the latest PaddlePaddle image within TeamCity agent container, then run unit test based upon generated image.
To login and monitor the status of tasks, simply click the following dashboard.
Current CI DashBoard
You can login as a guest by clicking "Log in as guest". For administrator login info, please ask in Paddle Hi Group.
-
Click on "Details" in the TeamCity build section.
-
Login or click "Log in as guest".
-
You will be landed on the "Overview" tag, there are some truncated build log, that is not what we want:
-
Click "Build Log". You can download the full log by clicking "Download full build log".
- Visit https://paddleci.ngrok.io/ and login as admin (please ask from your colleague or in the Hi Group for the password)
- Find the "PR_CI" section and click the "..." on right end (top right in the screenshot).
- Select your PR branch (same as your PR number, "2133" in this example) and click "Run Build".
Teamcity is a distributed builds system, which has 2 major roles in it, the master and agents. We are going to guide you through the process of setting up both parties.
We are going to launch teamcity master with Docker Image distribution.
TeamCity stores build history, users, build results and some run time data in an SQL database. It comes with a pre-configured built-in HSQLDB. For the concerns of reliability and performance, we are going to setup an external database. In our case, we are going to use PostgreSQL.
We are going to setup the PostgreSQL Server with docker.
docker run -d --name teamcity-db \
-p 5432:5432 \
-e POSTGRES_PASSWORD=XXXXXX \
postgres \
-c "synchronous_commit=off" -c 'shared_buffers=512MB' -c 'max_wal_size=1500MB' -c 'checkpoint_completion_target=0.9'
Command above starts a container with postgres
images, named teamcity-db
, exposed container port 5432
to host's port 5432
, postgres' db master password XXXXXX
, and other db configurations recommended by teamcity with parameters followed by -c
. The command will return the id of the container, we assume it as 09d34686
in this guide.
Now the db server is up, we need to create a database to hold the teamcity data. Let's connect to the container and run some SQL.
docker exec -it 09d34686 bash
Now we are connected to the container. Now we need to run PostgreSQL client psql
as user postgres
su - postgres
psql
We are in the psql
, now let's create teamcity's database, and exit.
CREATE DATABASE teamcity \g
\q
PostgreSQL Server is now ready for teamcity.
Let's start teamcity docker instance by running the following:
docker run -d --name teamcity-server-instance -v /home/teamcity_server/_data/:/data/teamcity_server/datadir -v /home/teamcity_server/logs/:/opt/teamcity/logs -p 8111:8111 jetbrains/teamcity-server
Above command started jetbrains/teamcity-server
instance, and mounted data and log directories and exposed container port 8111
to host's 8111
. It will return the docker id of the teamcity-instance, let's assume it to be 9d41a5cb7c25
in this guide.
Now let's open a browser and open http://localhost:8111
to finish basic setups.
When the UI prompt about database type, select PostgreSQL
, then do as described in the UI to download JDBC driver
and fill the rest of the form with following database credentials Database Host: <Machine IP>, Database Name: teamcity, User name: postgres, Password: XXXXXX
, then click on the Proceed
button.
Now your teamcity master is ready to go.
Note, you will need to login as admin to change some settings. In login window, there is an option to login as admin with token. After you click that link, run docker logs 9d41a5cb7c25
in master machine, you will see the admin token at the last line of the log.
Teamcity has a script for database maintenance. We are going to utilize it to do daily data backup. Create a script as below and put it in teamcity container's opt/teamcity/bin/daily_backup.sh
/opt/teamcity/bin/maintainDB.sh backup --all -F /data/teamcity_server/datadir/backup/daily_backup
cd /data/teamcity_server/datadir/backup/; find . -type f -mtime +30 -exec rm {} \;
Then add the following command in master server's crontab
1 0 * * * docker exec -d 9d41a5cb7c25 bash /opt/teamcity/bin/daily_backup.sh
above command runs backup script everyday at 1am and store backup file at /home/teamcity_server/_data/backup/daily_backup.zip
The TeamCity server is behind a firewall, we are using ngrok as a tunnel for external access.
Run command: ~/ngrok http -subdomain paddleci 8111
to start ngrok.
As a distributed builds system, we still need to setup at least one agent to start building.
In our case, we recommend either Ubuntu or CentOS as host system. If you need to build IOS or OSX distributions, you will need to setup Mac Agents.
To properly build PaddlePaddle in Ubuntu or CentOS, makes sure you have the nvidia-docker
properly installed.
Run the following command:
$ nvidia-docker run --rm nvidia/cuda:8.0-cudnn5-runtime-ubuntu16.04 nvidia-smi
If you see output similar to:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT... Off | 0000:08:00.0 Off | N/A |
| 22% 41C P8 27W / 250W | 0MiB / 12204MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
then nvidia-docker is working correctly, we can proceed to the next step.
If the command prints error message or got stuck, use systemctl
to start docker
and nvidia-docker
services:
$ systemctl start docker
$ systemctl start nvidia-docker
If anything goes wrong starting the services, systemctl status docker
, systemctl status nvidia-docker
, journalctl -xe
is your debugging friend.
We are going to push the agent binary from the master to agent machine and change some settings a bit in our case.
-
Push agent binary to agent machine
Open
http://<master ip>:8111/agents.html?tab=agent.push
, then click onInstall agent ...
button. Then fill thehost
,username
,password
of the agent machine to have agent pushed. -
Change
Server URL
SSH to your agent machine, open
$HOME/BuildAgent
, then editconf/buildAgent.properties
, change theserverUrl
value fromhttps://paddleci.ngrok.io
to master's internal IP address.the reason to do so is to avoid dns name solving overhead.
-
Restart Agent
In agent machine, run:
sudo $HOME/BuildAgent/bin/agent.sh stop sudo $HOME/BuildAgent/bin/agent.sh start
-
Authorize the Agent
Now you should be able to see the newly setup agent in
http://<master ip>:8111/agents.html?tab=unauthorizedAgents
Click
Authorize
button to add it to agent pool. -
Routine docker cleanup
add following command to crontab in agent machine
0 0 * * * docker system prune -f -a
above command will remove all unused container, image and volumes at 0:00 everyday
Currently, we have three projects on TeamCity CI system.
-
PaddlePaddle CPU and GPU Docker Images
If new pull requests are merged into develop branch of PaddlePaddle, TeamCity will detect and build new PaddlePaddle images for both CPU and GPU. Then, they will be pushed into Docker Hub automatically.
-
PaddlePaddle GPU Unit Test
If new pull requests are merged into develop branch of PaddlePaddle, TeamCity will pull the latest PaddlePaddle GPU image and run unit test to validate the modification. Most of pull requests are irrelevant to GPU, thus GPU Unit Test is triggered per hour by TeamCity.
-
PaddlePaddle Book CPU and GPU Docker Images
If new pull requests are merged into develop branch of PaddlePaddle Book, TeamCity will detect and build new PaddlePaddle Book images for both CPU and GPU. Then, they will be pushed into Docker Hub automatically.
To back up the data, there are 2 ways to do so
-
Login to master machine, and run
docker exec -d <teamcity container id> bash /opt/teamcity/bin/daily_backup.sh
, backup file will be located at/home/teamcity_server/_data/backup/daily_backup.zip
-
open
http://<master ip>:8111/admin/admin.html?item=backup
thenstart backup
To Restore the data, there are also 2 ways to do so
-
Login to master machine, and run
docker exec -d <teamcity container id> bash /opt/teamcity/bin/maintainDB.sh restore -F <backup file path>
-
open
http://<master ip>:8111/admin/admin.html?item=import
To upgrade teamcity is pretty simple, just pull the new image and start the server as follows
docker pull jetbrains/teamcity-server
docker run -d --name teamcity-server-instance -v /home/teamcity_server/_data/:/data/teamcity_server/datadir -v /home/teamcity_server/logs/:/opt/teamcity/logs -p 8111:8111 jetbrains/teamcity-server
Be sure to update crontab's daily data backup script with latest container id.
crontab -e
And update 1 0 * * * docker exec -d 9d41a5cb7c25 bash /opt/teamcity/bin/daily_backup.sh
container id.
Go to http://172.19.32.197:8111/admin/admin.html?item=cleanup , click the button "Start clean-up now"
Please check out TeamCity Documentation to find more details.
This is usually caused by running agent under non-root user. The solution is to login to agent machine and start agent with sudo
sudo <teamcity root>/bin/agent.sh start
This is usually caused by docker images taking too much space. try docker system prune -f -a
to remove unused resources in docker to free some space.
If you agent is running in CentOS, there is another possibility: docker's overlay fs is running inside cl-root, which is limited to 50G in most of the case. The solution is to point docker graph path to a larger disk partition. To do so, I found this and this very useful.
TeamCity offers 3 free build agents, additional agent needs to be purchased. You will be given a license key after the purchase. To register the license, login into the TeamCity webpage, click "Administration" on the top right corner, then click "Licenses".
- TeamCity vs Jenkins for Continuous Integration, https://www.upguard.com/articles/teamcity-vs.-jenkins-for-continuous-integration