Week 6 - tom-howard/com2009 GitHub Wiki
You should be able to complete the exercises on this page within a two-hour lab session.
- Exercise 1: Using the
rqt_image_viewnode whilst changing the robot's viewpoint - Exercise 2: Object Detection
- Exercise 3: Locating image features using Image Moments
- Exercise 4: Line following
- The Initial Object Detection Code (for Exercise 2)
- A Complete Worked Example of the
object_detection.pyNode - A
line_followerTemplate (for Exercise 4)
This week we are finally going to make use of that camera that has been sat on top of our robot staring at us for the last 5 weeks! Last week you got a bit of a sneak preview of how we work with images within ROS nodes and this week you will explore this in more depth. You will learn how to capture images and process them, and build further intelligence into your robot by using the information that is obtained from its camera to inform decision making and control.
By the end of this session you will be able to:
- Use a range of ROS tools to interrogate camera image topics on a ROS Network and view the images being streamed to them.
- Use the computer vision library OpenCV with ROS, to obtain camera images and process them in real-time.
- Apply filtering processes to isolate objects of interest within an image.
- Develop object detection nodes and harness the information generated by these processes to control a robot's position.
- Use camera data as a feedback signal to implement a line following behaviour using proportional control.
-
Launch your WSL-ROS environment by running the WSL-ROS shortcut in the Windows Start Menu.
-
Remember to restore your work from last week by running the restore script in the Ubuntu terminal instance that should now be active (TERMINAL 1):
[TERMINAL 1] $ wsl_ros restore
This week we will start by working with the mystery world environment from last week! This time, in TERMINAL 1, use the following roslaunch command to load this:
[TERMINAL 1] $ roslaunch com2009_simulations coloured_pillars.launch
There are a number of tools that we can use to view the live images that are being captured by a robot in ROS. As with all robot data, these streams are published to topics, so we firstly need to identify those topics.
In a new terminal instance (TERMINAL 2), run rostopic list to see the full list of topics that are currently active on our system. Conveniently, all the topics related to our robot's camera - and the data coming from it - are prefixed with camera! Filter the rostopic list output using grep, to only show those topics that are camera-specific:
[TERMINAL 2] $ rostopic list | grep /camera
This should provide the following filtered list:
/camera/depth/camera_info
/camera/depth/image_raw
/camera/depth/points
/camera/parameter_descriptions
/camera/parameter_updates
/camera/rgb/camera_info
/camera/rgb/image_raw
/camera/rgb/image_raw/compressed
/camera/rgb/image_raw/compressed/parameter_descriptions
/camera/rgb/image_raw/compressed/parameter_updates
/camera/rgb/image_raw/compressedDepth
/camera/rgb/image_raw/compressedDepth/parameter_descriptions
/camera/rgb/image_raw/compressedDepth/parameter_updates
/camera/rgb/image_raw/theora
/camera/rgb/image_raw/theora/parameter_descriptions
/camera/rgb/image_raw/theora/parameter_updates
The real TurtleBot3 Waffles that we have at the university have a slightly different camera module to that used by the simulated robots that we are working with here. Despite this though, the camera data on our real robots is published to topics using the same ROS message formats as used in simulation, making it fairly straight-forward to transfer nodes that we develop in simulation here onto the real robots, when the time comes.
The first items in the list of camera topics above indicate that we have depth information available here. Much like the real robots, the simulated versions that we are working with here also have a camera module capable of determining depth information as well as simply capturing images. Remember from Week 3 though, that we also have a very capable LiDAR sensor to give us this type of information too, and so we won't really be using the depth capabilities of our camera for this session.
The main thing we are actually interested in here is the RGB images that are captured by the camera, and the key topic that we'll therefore be using this week is:
/camera/rgb/image_raw
Run rostopic info on this to identify the message type. What type of message is used on this topic, and which package is this message derived from?
Now, run rosmsg info on this message type to find out exactly what information is published to the topic. You should end up with a list that looks like this:
std_msgs/Header header
uint32 seq
time stamp
string frame_id
uint32 height
uint32 width
string encoding
uint8 is_bigendian
uint32 step
uint8[] data
Using rostopic echo and the information about the topic message (as shown above) determine the size of the images that our robot's camera will capture (i.e. its dimensions, in pixels). It will be quite important to know this when we start manipulating these camera images later on.
Finally, considering the list above again, which part of the message do you think contains the actual image data?
We can view the images being streamed to the above topic (in real-time) in a variety of different ways and we will explore a couple of these now.
One way is to use RViz, which can be launched using the following roslaunch command:
[TERMINAL 2] $ roslaunch turtlebot3_gazebo turtlebot3_gazebo_rviz.launch
Once RViz launches, find the camera item in the left-hand panel and tick the checkbox next to it. This should open up a camera panel with a live stream of the images being obtained from the robot's camera! The nice thing about this is that the real-time LiDAR data will also be overlaid on top of the images too!
Close down RViz by entering Ctrl+C in TERMINAL 2.
Another tool we can use to view camera data-streams is the rqt_image_view node.
-
To launch this, use
rosrunas follows:[TERMINAL 2] $ rosrun rqt_image_view rqt_image_viewNote: You might see some errors when you run this but don't worry about them, the
rqt_image_viewnode should still launch with no problems.
This is a nice tool that allows us to easily view images that are being published to any camera topic on the ROS network. Another useful feature is the ability to capture those images and save them (as
.jpgfiles) to the filesystem: See the "Save as image" button highlighted in the figure above. This might be useful later on in this week's session. -
Click the drop-down box in the top left of the window to select an image topic to display. Select
/camera/rgb/image_raw(if it's not already selected). -
Keep this window open now, and launch a new terminal instance (TERMINAL 3).
-
Launch the
turtlebot3_teleopnode, either using the full command or a handytb3_teleopalias! Rotate your robot on the spot, keeping an eye on therqt_image_viewwindow as you do this. Stop the robot once one of the coloured pillars in the arena is roughly in the centre of the robot's field of vision, then close theturtlebot3_teleopnode and therqt_image_viewnode by enteringCtrl+Cin TERMINAL 3 and TERMINAL 2 respectively.
OpenCV is a mature and powerful computer vision library designed for performing real-time image analysis, and it is therefore extremely useful for robotic applications. The library is cross-platform and there is a Python API (cv2), which we will be using to do some computer vision tasks of our own during this lab session. While we can work with OpenCV using Python straight away (via the API), the library can't directly interpret the native image format used by the image messages published to the ROS /camera topics that we identified earlier, so there is an interface that we need to use. The interface is called CvBridge, which is a ROS package that handles the conversion between ROS and OpenCV image formats. We'll therefore need to use these two libraries (OpenCV and CvBridge) hand-in-hand when developing ROS nodes to perform computer vision related tasks.
One common job that we often want a robot to perform is object detection, and we will illustrate how this can be achieved by colour filtering to detect the coloured pillar that your robot should now be looking at. In this exercise you will learn how to use OpenCV to capture images, filter them and perform other analysis to confirm the presence and location of features that we might be interested in.
Step 1
-
First create a new package in your
catkin_ws/srcdirectory calledweek6_visionwithrospy,cv_bridge,sensor_msgsandgeometry_msgsas dependencies. -
Then, run
catkin buildon the package and then re-source your environment (as you have done in previous weeks). -
In the
srcfolder of the package you have just created, create a new Python file calledobject_detection.py. What else do we need to do to this file before we can run it?! Do it now! -
Copy the code here, save the file, then review the code explainer so that you understand how this node works and what should happen when you run it.
-
Run the node using
rosrun.MAKE SURE YOU CLOSE THE POP-UP WINDOW! This node should capture an image and display it in a pop-up window. The node will wait until you close the pop-up window down before doing anything else (such as saving the image to the filesystem)!
-
As you should know from reading the explainer, the node has just obtained an image and saved it to a location on the filesystem. Navigate to this filesystem location and view the image using
eog.
What you may have noticed from the terminal output when you ran the object_detection.py node is that the robot's camera captures images at a native size of 1080x1920 pixels (you should already know this from interrogating the /camera/rgb/image_raw/width and /height messages using rostopic echo earlier). That's over 2 million pixels in total in a single image (2,073,600 pixels per image, to be exact), each pixel having a blue, green and red value associated with it - so that's a lot of data in a single image file! The size of the image file (in bytes) was actually printed to the terminal when you ran the object_detection.py node... how big was it exactly?
Processing an image of this size is therefore hard work for a robot: any analysis we do will be slow and any raw images that we capture will occupy a considerable amount of storage space. The next step then is to reduce this down by cropping the image to a more manageable size.
Step 2
We're going to modify the object_detection.py node now to:
- Capture a new image in its native size
- Crop it down to focus in on a particular area of interest
- Save both of the images (the cropped one should be much smaller than the original).
-
In your
object_detection.pynode locate the line:show_and_save_image(cv_img, img_name = "step1_original")
-
Underneath this, add the following additional lines of code:
crop_width = width - 400 crop_height = 400 crop_y0 = int((width / 2) - (crop_width / 2)) crop_z0 = int((height / 2) - (crop_height / 2)) cropped_img = cv_img[crop_z0:crop_z0+crop_height, crop_y0:crop_y0+crop_width] show_and_save_image(cropped_img, img_name = "step2_cropping")
-
Run the node again.
Remember: Images are only saved once you close down the corresponding pop-up window. Make sure you close all of these pop-up windows down after viewing them (i.e. don't just minimise them) to ensure that your images are saved to the filesystem and the node executes successfully.
The code that you have just added here has created a new image object called
cropped_img, from a subset of the original by specifying a desiredcrop_heightandcrop_widthrelative to the original image dimensions. Additionally, we have also specified where in the original image (in terms of pixel coordinates) we want this subset to start, usingcrop_y0andcrop_z0. This process is illustrated in the figure below:
The original image (
cv_img) is cropped using a process called "slicing":cropped_img = cv_img[crop_z0:crop_z0+crop_height, crop_y0:crop_y0+crop_width]
This may seem quite confusing, but hopefully the figure below illustrates what's going on here:
Step 3
As discussed above, an image is essentially a series of pixels each with a blue, green and red value associated with it to represent the actual image colours. From the original image that we have just obtained and cropped, we then want to get rid of any colours other than those associated with the pillar that we want the robot to detect. We therefore need to apply a filter to the pixels, which we will ultimately use to discard any pixel data that isn't related to the coloured pillar, whilst retaining data that is.
This process is called masking and, to achieve this, we need to set some colour thresholds. This can be difficult to do in a standard Blue-Green-Red (BGR) or Red-Green-Blue (RGB) colour space, and you can see a good example of this in this article from RealPython.com. We will apply some of the steps discussed in this article to convert our cropped image into a Hue-Saturation-Value (HSV) colour space instead, which makes the process of colour masking a bit easier.
-
First, analyse the Hue and Saturation values of the cropped image. To do this, run the following ROS Node, located in the
com2009_examplespackage, supplying the path to your cropped image as an additional argument:[TERMINAL 2] $ rosrun com2009_examples image_colours.py /home/student/myrosdata/week6_images/step2_cropping.jpg -
The node should produce a scatter plot, illustrating the Hue and Saturation values of each of the pixels in the image. Each data point in the plot represents a single image pixel and each is coloured to match its RGB value:
-
You should see from the image that all the pixels related to the coloured pillar that we want to detect are clustered together. We can use this information to specify a range of Hue and Saturation values that can be used to mask our image: filtering out any colours that sit outside of this range and thus allowing us to isolate the pillar. The pixels also have a Value (or "Brightness"), which isn't shown in this plot. As a rule of thumb, a range of brightness values between 100 and 255 generally works quite well.
In this case then, we select upper and lower HSV thresholds as follows:
lower_threshold = (115, 225, 100) upper_threshold = (130, 255, 255)Use the plot that has been generated here to determine your own upper and lower thresholds.
OpenCV contains a built-in function to detect which pixels of an image fall within a specified HSV range:
cv2.inRange(). This outputs a matrix, the same size and shape as the number of pixels in the image, but containing onlyTrue(1) orFalse(0) values, illustrating which pixels do have a value within the specified range and which don't. This is known as a Boolean Mask (essentially, a series of ones or zeroes). We can then apply this mask to the image, using a Bitwise AND operation, to get rid of any image pixels whose mask value isFalseand keep any flagged asTrue(or in range). -
To do this, first locate the following line in your
object_detection.pynode:show_and_save_image(cropped_img, img_name = "step2_cropping")
-
Underneath this, add the following:
hsv_img = cv2.cvtColor(cropped_img, cv2.COLOR_BGR2HSV) lower_threshold = (115, 225, 100) upper_threshold = (130, 255, 255) img_mask = cv2.inRange(hsv_img, lower_threshold, upper_threshold) show_and_save_image(img_mask, img_name = "step3_image_mask")
-
Now, run the node again. Three images should be generated and saved now. As shown in the figure below, the third image should simply be a black and white representation of the cropped image, where the white regions should indicate the areas of the image where pixel values fall within the HSV range specified earlier. Notice (from the text printed to the terminal) that the cropped image and the image mask have the same dimensions, but the image mask file has a significantly smaller file size. While the mask contains the same number of pixels, these pixels only have a value of
1or0, whereas - in the cropped image of the same pixel size - each pixel has a Red, Green and Blue value: each ranging between0and255, which represents significantly more data.
Step 4
Finally, we can apply this mask to the cropped image, generating a final version of it where only pixels marked as True in the mask retain their RGB values, and the rest are simply removed. As discussed earlier, we use a Bitwise AND operation to do this and, once again, OpenCV has a built-in function to do this: cv2.bitwise_and().
-
Locate the following line in your
object_detection.pynode:show_and_save_image(img_mask, img_name = "step3_image_mask")
-
And, underneath this, add the following:
filtered_img = cv2.bitwise_and(cropped_img, cropped_img, mask = img_mask) show_and_save_image(filtered_img, img_name = "step4_filtered_image")
-
Run this node again, and a fourth image should also be generated now, this time showing the cropped image taken from the robot's camera, but only containing data related to coloured pillar, with all other background image data removed (and rendered black):
You have now successfully isolated an object of interest within your robot's field of vision, but perhaps we want to make our robot move towards it, or - conversely - make our robot navigate around it and avoid crashing into it! We therefore also need to know the position of the object in relation to the robot's viewpoint, and we can do this using image moments.
The work we have just done above resulted in us obtaining what is referred to as a colour blob. OpenCV also has built-in tools to allow us to calculate the centroid of a blob of colour like this, allowing us to determine where exactly within an image the object of interest is located (in terms of pixels). This is done using the principle of image moments: essentially statistical parameters related to an image, telling us how a collection of pixels (i.e. the blob of colour that we have just isolated) are distributed within it. You can read more about Image Moments here and - from this - we can learn that the central coordinates of a colour blob can be obtained by considering some key moments of the image mask that we obtained from thresholding earlier:
-
M00: the sum of all non-zero pixels in the image mask (i.e. the size of the colour blob, in pixels) -
M10: the sum of all the non-zero pixels in the horizontal (y) axis, weighted by row number -
M01: the sum of all the non-zero pixels in the vertical (z) axis, weighted by column number
Note: We refer to the horizontal as the y axis and the vertical as the z axis here to match the terminology that we have used previously to define our robot's principal axes.
We don't really need to worry about the derivation of these moments too much though. OpenCV has a built-in moments() function that we can use to obtain this information from an image mask (such as the one that we generated earlier):
m = cv2.moments(img_mask)
So, using this we can obtain the y and z coordinates of the blob centroid quite simply:
cy = m['m10']/(m['m00']+1e-5)
cz = m['m01']/(m['m00']+1e-5)
Notice, that we are adding a very small number to the M00 moment here to make sure that the divisor in the above equations is never zero and thus ensuring that we never get caught out by any "divide-by-zero" errors. When might this be the case?
Once again, there is a built-in OpenCV tool that we can use to add a circle onto an image to illustrate the centroid location within the robot's viewpoint: cv2.circle(). This is how we produced the red circle that you can see in the figure above. You can see how this is implemented in a complete worked example of the object_detection.py node from the previous exercise.
In our case, we can't actually change the position of our robot in the z axis, so the cz centroid component here might not be that important to us for navigation purposes. We may however want to use the centroid coordinate cy to understand where a feature is located horizontally in our robot's field of vision, and use this information to turn towards it (or away from it, depending on what we are trying to achieve). We'll look at this in a bit more detail now.
Inside the com2009_examples package there is a node that has been developed to illustrate how all the OpenCV tools that you have explored so far could be used to search an environment and stop a robot when it is looking directly at an object of interest. All the tools that are used in this node should be familiar to you by now, and you are now going to make a copy of this node and modify it to enhance its functionality.
-
The node is called
colour_search.py, and it is located in thesrcfolder of thecom2009_examplespackage. Copy this into thesrcfolder of your ownweek6_visionpackage by first ensuring that you are located in the desired destination folder:[TERMINAL 2] $ roscd week6_vision/src -
Then, copy the
colour_search.pynode usingcpas follows:[TERMINAL 2] $ cp ~/catkin_ws/src/COM2009/com2009_examples/src/colour_search.py ./ -
In the same way as last week, you'll also need to copy the
tb3.pymodule across from thecom2009_examplespackage too, as this is used by thecolour_search.pynode to make the robot move:[TERMINAL 2] $ cp ~/catkin_ws/src/COM2009/com2009_examples/src/tb3.py ./ -
Open up the
colour_search.pyfile in VS Code to view the content. Have a look through it and see if you can make sense of how it works. The overall structure should be fairly familiar to you by now: we have a Python class structure, a Subscriber with a callback function, a main loop where all the robot control takes place and a lot of the OpenCV tools that you have explored so far in this session. Essentially this node functions as follows:- The robot turns on the spot whilst obtaining images from its camera (by subscribing to the
/camera/rgb/image_rawtopic). - Camera images are obtained and cropped, then a threshold is applied to the cropped images so as to detect the blue pillar in the simulated environment.
- If the robot can't see a blue pillar then it turns on the spot quickly.
- Once detected, the centroid of the blue blob representing the pillar is calculated to obtain its current location in the robot's viewpoint.
- As soon as the blue pillar comes into view the robot starts to turn more slowly instead.
- The robot stops turning as soon as it determines that the pillar is situated directly in front of it, using the
cycomponent of the blue blob centroid to determine this. - The robot then waits for a while and then starts to turn again.
- The whole process repeats until it finds the blue pillar once again.
- The robot turns on the spot whilst obtaining images from its camera (by subscribing to the
-
Run the node as it is to see this in action. Observe the messages that are printed to the terminal throughout execution.
-
Your task is to then modify the node so that it stops in front of every coloured pillar in the arena (there are four in total). For this, you may need to use some of the methods that we have explored so far this week.
- You might first want to use some of the methods that we used to obtain and analyse some images from your robot's camera:
-
Use the
turtlebot3_teleopnode to manually move the robot, making it look at every coloured pillar in the arena individually. -
Run the
object_detection.pynode that you developed in the previous exercise to capture an image, crop it, save it to the filesystem and then feed this cropped image into theimage_colours.pynode from thecom2009_examplespackage (as you did earlier) -
Alternatively, you could use the
rqt_image_viewnode to obtain some images and save these to your filesystem too.Note: If you chose this option, it might be better to capture images from the
/camera/rgb/image_raw/compressedtopic, to make the files more manageable and slightly quicker to analyse. -
From the plot that is generated by the
image_colours.pynode, determine some appropriate HSV thresholds to apply for each coloured pillar in the arena.
-
- Once you have the right thresholds, then you can add these to your
colour_search.pynode so that it has the ability to detect every pillar in the same way that it currently detects the blue one.
- You might first want to use some of the methods that we used to obtain and analyse some images from your robot's camera:
OK, time for something a bit more interesting now: line following!
-
Make sure that your
colour_search.pynode is no longer running now, and also close down the gazebo simulation that is currently active by enteringCtrl+Cin TERMINAL 1. -
In TERMINAL 2 you should still be located in your
week6_vision/srcdirectory, but if not then go there now:[TERMINAL 2] $ roscd week6_vision/src -
Then, perform the necessary steps to create a new empty Python file called
line_follower.pyand prepare it for execution. -
Once that's done open up the empty file in VS Code.
-
Then, have a look at the template Python code for this exercise, and be sure to read the explainer too.
-
Copy and paste this into your empty
line_follower.pyfile and save it. -
Now, in TERMINAL 1 run a new simulation:
[TERMINAL 1] $ roslaunch com2009_simulations line_following.launchYour robot should be launched in an environment with a coloured track printed on the floor:
-
In TERMINAL 2, run the
line_follower.pynode as it is (usingrosrun) and see what happens... Not very impressive eh?! That's where you come in! There are a few issues with this that you will need to address:-
Firstly, you might want to have a look at the cropping that is currently being performed to hone in on a particular region of the robots viewpoint... does this look optimal? Could it be improved?
-
Next, how about the HSV thresholds that are being applied to detect the line colour, do these look right? Determine some appropriate HSV thresholds to apply using methods that you have used earlier in this session.
-
The Node uses a proportional controller. Essentially, we specify a proportional gain (
kp), and multiply this gain by a position error to obtain an angular velocity. What happens if this calculation outputs an angular velocity that is excessive, or is larger than the maximum angular velocity that our robot can achieve? Consider, therefore, doing some error checking on theang_velcalculation output to ensure that any angular velocity commands are kept within some sensible limits before being published to the/cmd_veltopic. -
PID gains need to be tuned in order to be appropriate for a particular system. Make adjustments to the proportional gain parameter (
kp) until a sensible response is achieved and the robot follows the line effectively. -
The angular velocity is determined by considering the y coordinate of a colour blob centroid. What happens in situations where the blob of colour can't be seen in the robot's field of vision though? What influence would this have on the velocity commands that are published to the
/cmd_veltopic? Consider a situation where the robot can't see the line to begin with... Try launching the robot in the arena in a different location instead, and think about how you might approach this situation:[TERMINAL 1] $ roslaunch com2009_simulations line_following.launch x_pos:=3 y_pos:=-3 yaw:=0 -
Finally, what happens when the robot reaches the finish line? How could you add additional functionality to ensure that the robot stops when it reaches this point? What features of the arena could you use to trigger this?
-
In this session you have learnt how to use data from a robot's camera to extract further information about its environment. The camera allows our robot to "see" and the information that we obtain from this device can allow us to build even more advanced robot control behaviours such as searching for objects, follow things or - conversely - moving away or avoiding them. You have learnt how to do some basic tasks with OpenCV, but this is a huge and very capable library of computer vision tools, and we encourage you to explore this further yourselves to enhance some of the basic principles that we have shown you today.
Remember to save the work you have done here today by run the wsl_ros backup script in any idle WSL-ROS Terminal Instance now:
$ wsl_ros backup
Navigating This Wiki:
← Week 5: ROS Actions [Previous]