finding 3d coordinates of a given point which is specified in the 2d pixel coordinates
Hi,
My question is regarding finding 3d coordinates of a given point which is specified in the 2d pixel coordinates. I have read some similar threads on the above issue.
In the realsense-viewer, when we turn on "Stereo Module" as well as the "RGB Camera" and then access the 3D view, we can find the 3d coordinates of any point within the image. This is useful to confirm the accuracy of the results reported by the following function is used for the purpose: rs2_deproject_pixel_to_point
Following discussion on the following post,
https://github.com/IntelRealSense/librealsense/issues/11031
I have tested the first approach that was mentioned by MartyG
Method 1 - align depth to color, deproject pixel into point in color coordinate systems, then transform the point back to depth coordinate system.
I received the following outputs:
center of the image in color coordinates: (-0,-0,0)
center of the image in depth coordinates: (0.0590469,-0.000145956,-0.000172043)
However, when I access the realsense-viewer, and 3D view, This is the result I see,
Here, we can see the center of the image, (origin of the coordinate system) is at (0.007, -0.008, 2.393) meters.
I have not tested the other two approaches mentioned there.
Can you please explain if my understanding as to the origin of the coordinate system is correct. What can be the reason behind this error. And, which point I can fix to check the accuracy of the method that were mentioned in the above post.
thanks,
-
Hi Zahid Iqbal When aligning depth to color with the align_to instruction, the center-line of the color sensor becomes the 0, 0, 0 origin of depth. When depth and color are not aligned, the 0,0,0 origin of depth is the center-line of the left infrared sensor.
The -0.000172043 value for the Z-depth coordinate of the script does not sound correct. I would recommend printing the value of the dist variable (which should be the 3D real-world distance from the camera of that coordinate in meters) to see what it says. I would expect it to be more like the 2.383 value that you got from the Viewer's pointcloud.
I would recommend not comparing the outputs of the Method 1 script and the Viewer directly like-for-like though as they use different alignment methods to combine depth and color. The Method 1 script uses the align_to instruction, whilst the Viewer's pointcloud maps depth and color together with the map_to instruction (demonstrated in the Method 3 script).
I would suggest trying the Method 3 script to see how its coordinate output compares to the Viewer.
-
Following up the previous post, I have tried to use the second approach
Method 2 – do not align, use the original depth and color images, rs2_project_color_pixel_to_depth_pixel, then rs2_deproject_pixel_to_point.
Here is the code that I have used,
void realSenseWrapper :: getRealSense3DPoint(float result3DPoint[3], int c, int r)
{
//const rs2::depth_frame& depth, rs2_intrinsics* depth_intrin, rs2_intrinsics* color_intrin, rs2_extrinsics* color_extrin_to_depth, rs2_extrinsics* depth_extrin_to_color)
rs2::config cfg;
cfg.enable_stream(RS2_STREAM_DEPTH, 1280, 720, RS2_FORMAT_Z16, 15);
cfg.enable_stream(RS2_STREAM_COLOR, 1280, 720, RS2_FORMAT_RGBA8, 15);
selection = pipe.start(cfg);
const rs2::stream_profile color_profile = selection.get_stream(RS2_STREAM_COLOR);
const rs2::stream_profile depth_profile = selection.get_stream(RS2_STREAM_DEPTH);
auto depth_intrin = (depth_profile.as<rs2::video_stream_profile>().get_intrinsics());
auto color_intrin = (color_profile.as<rs2::video_stream_profile>().get_intrinsics());
auto depth_extrin_to_color = (depth_profile.as<rs2::video_stream_profile>().get_extrinsics_to(color_profile));
auto color_extrin_to_depth = (color_profile.as<rs2::video_stream_profile>().get_extrinsics_to(depth_profile));
rs2::frameset frames = pipe.wait_for_frames();
rs2::depth_frame depth = frames.get_depth_frame();
float rgb_pixel[2], depth_pixel[2];
int int_depth_pixel[2];
rgb_pixel[0] = float(c); // col in RGB frame
rgb_pixel[1] = float(r); // row in RGB frame
float depth_scale = depth.get_units();
rs2_project_color_pixel_to_depth_pixel(depth_pixel,
reinterpret_cast<const uint16_t*>(depth.get_data()),
depth_scale,
0.20, 2.0, // From 0 to 2 meters Min to Max depth
&depth_intrin,
&color_intrin,
&color_extrin_to_depth,
&depth_extrin_to_color, rgb_pixel);
int_depth_pixel[0] = (int)round(depth_pixel[0]);
int_depth_pixel[1] = (int)round(depth_pixel[1]);
depth_pixel[0] = int_depth_pixel[0];
depth_pixel[1] = int_depth_pixel[1];
auto dist = depth.get_distance(static_cast<int>(depth_pixel[0]), static_cast<int>(depth_pixel[1]));
rs2_deproject_pixel_to_point(result3DPoint, &depth_intrin, depth_pixel, dist);
pipe.stop();
}However, my output is the following for the
getRealSense3DPoint(result3DPoint, 640, 360);3D coordinates: (0.0987119,-0.0107843,2.464)
Which does not seem to agree with the reading from the realsense viewer.
Please, any comments for this.
thanks,
-
Hi MartyG,
about your comment, I have the following outputs
pixel[0]: 640, pixel[1]: 360
dist: 0
center of the image in color coordinates: (-0,-0,0)
center of the image in depth coordinates: (0.0590469,-0.000145956,-0.000172043)The image is (1280 by 720), so that is the center pixel of the image.
Here is the code:
void realSenseWrapper :: important_untested_function(float point_in_color_coordinates[3], float point_in_depth_coordinates[3]){rs2::align align_to(RS2_STREAM_COLOR);rs2::config cfg;cfg.enable_stream(RS2_STREAM_DEPTH, 1280, 720, RS2_FORMAT_Z16, 15);cfg.enable_stream(RS2_STREAM_COLOR, 1280, 720, RS2_FORMAT_RGBA8, 15);
selection = pipe.start(cfg);
while (true){// Wait for the next set of frames from the cameraauto frames = pipe.wait_for_frames();
auto depth = frames.get_depth_frame();auto color = frames.get_color_frame();
rs2_extrinsics color_extrin_to_depth = color.get_profile().as<rs2::video_stream_profile>().get_extrinsics_to(depth.get_profile().as<rs2::video_stream_profile>());
frames = frames.apply_filter(align_to);auto color_aligned = frames.get_color_frame();auto depth_aligned = frames.get_depth_frame();
rs2_intrinsics intr_depth = depth_aligned.get_profile().as<rs2::video_stream_profile>().get_intrinsics();
// pixel in color coordinate systemfloat u = 0.5;float v = 0.5;
int w = color_aligned.get_width();int h = color_aligned.get_height();
int c = w * u;int r = h * v;
// assume depth and color same size after alignment, find depth pixel in color coordinate systemfloat pixel[2];pixel[0] = c;pixel[1] = r;
cout << "pixel[0]: " << c <<", pixel[1]: " << r << endl;
//float point_in_color_coordinates[3]; // 3D point in color sensor coordinates//float point_in_depth_coordinates[3]; // 3D point in depth sensor coordinates
auto dist = depth.get_distance(static_cast<int>(pixel[0]), static_cast<int>(pixel[1]));
cout << "dist: " << dist << endl;
rs2_deproject_pixel_to_point(point_in_color_coordinates, &intr_depth, pixel, dist);rs2_transform_point_to_point(point_in_depth_coordinates, &color_extrin_to_depth, point_in_color_coordinates);
break;}} -
If you received a z-depth 0f 2.464 from Method 2 (the same as the Viewer) and Method 1 is reporting 0 for Z-depth then it seems that it would be best to drop Method 1 and use Method 2 instead.
Using the align_to instruction (as Method 1 does) introduces the possibility of inaccuracy caused by the alignment process. So Method 2, which does not use alignment and instead makes use of intrinsics, may provide more reliable results.
Another factor to bear in mind is that higher resolutions provide greater accuracy, whilst lower resolutions such as 640x480 will result in lower accuracy.
-
MartyG, for the third approach with point cloud,
float u = 0.609375; // 780float v = 0.488889; // 352shouldn't these be
float u = 0.5;float v = 0.5;instead? Because, we need the center pixel of the image. I received the exactly same results as with the first approach.
3D coordinates: (0.0590469,-0.000145956,-0.000172043)
center of the image in color coordinates: (-0,-0,0) -
The coordinates of the center pixel of an image at a particular resolution will be exactly half of that resolution. So if 1280x720 resolution was being used then the center XY pixel coordinate would be 640, 360, whilst the center of 640x480 would be 320, 240.
In regard to the values of floats u and v, I do not have information about why those particular values were chosen by the Intel RealSense team member who provided the script. However, a RealSense user at the link below who also read #11031 does use 0.5 for u and v in their script.
https://github.com/IntelRealSense/librealsense/issues/12067#issuecomment-1676895454
-
Following some threads, I have the following piece of code, which seems to give me better results, but still with some inaccuracy in one of the dimensions. First I will write the code, and then I will report the results that I received.
rs2::align align_to(RS2_STREAM_COLOR);
rs2::config cfg;
cfg.enable_stream(RS2_STREAM_DEPTH, 1280, 720, RS2_FORMAT_Z16, 30);
cfg.enable_stream(RS2_STREAM_COLOR, 1280, 720, RS2_FORMAT_BGRA8, 30);
selection = pipe.start(cfg);
while(true)
{
auto frames = pipe.wait_for_frames();
auto aligned_frameset = align_to.process(frames);
rs2::video_frame color_stream = aligned_frameset.get_color_frame();
rs2::depth_frame aligned_depth_stream = aligned_frameset.get_depth_frame();
rs2::video_stream_profile depth_stream_profile = aligned_depth_stream.get_profile().as<rs2::video_stream_profile>();
const auto depth_intrinsics = depth_stream_profile.get_intrinsics();
const int w = color_stream.as<rs2::video_frame>().get_width();
const int h = color_stream.as<rs2::video_frame>().get_height();
rs2_extrinsics color_extrin_to_depth = color_stream.get_profile().as<rs2::video_stream_profile>().get_extrinsics_to(aligned_depth_stream.get_profile().as<rs2::video_stream_profile>());
// pixel in color coordinate system
float u = 0.5;
float v = 0.5;
// assume depth and color same size after alignment, find depth pixel in color coordinate system
float pixel[2];
pixel[0] = c;
pixel[1] = r;
float dist = aligned_depth_stream.get_distance(static_cast<int>(pixel[0]), static_cast<int>(pixel[1]));
rs2_deproject_pixel_to_point(point_in_color_coordinates, &depth_intrinsics, pixel, dist);
rs2_transform_point_to_point(point_in_depth_coordinates, &color_extrin_to_depth, point_in_color_coordinates);
break;
}This is the result that I get:
dist: 2.494
center of the image in color coordinates: (-0.0488563,-0.00127967,2.494)
center of the image in depth coordinates: (-0.0488563,-0.00127967,2.494)And, from the realsense viewer, I get the following: (0.003, -0.004, 2.490)
error in the x-coord: 45.8563 mm,
error in the y-coord: 2.72033 mm,
error in the z-coord: 4 mm
The error in the x direction is still quite big about 4.5 cm. Please, can you comment on that.
Thanks,
-
Are there any further settings that could help improve the accuracy. Also, how to run/activate the emitter. I read at another place, this helps the depth measurements, and probably will lead to better estimates.
-
If you are using C++ then you can enable the emitter with the instruction RS2_OPTION_EMITTER_ENABLED. A C++ script for doing so can be found at the link below under the librealsense2 heading.
https://dev.intelrealsense.com/docs/api-how-to#controlling-the-laser
Setting a Visual Preset camera configuration called Medium Density provides a good balance between accuracy and a high amount of detail on the depth image. The link below has C++ code for setting a preset.
https://github.com/IntelRealSense/librealsense/issues/9071#issuecomment-878552273
In the code, change RS2_RS400_VISUAL_PRESET_HIGH_ACCURACY to RS2_RS400_VISUAL_PRESET_MEDIUM_DENSITY to set the Medium Density preset. If High Accuracy is used then it tends to greatly reduce the amount of detail on the depth image.
-
MartyG thanks,
I will check these details. Something very strange just happened. I do not seem to change anything, but the same code is returning (0,0,0). I wonder, did I reset something.
-
I have run my program for a few times. The error I receive between what my function returns and from the image center in realsense-viewer is usually high along the x-coordinate, ranging anywhere between 30~60 millimeters. On the y-coordinate is minimal, but on the z-coordinate is anywhere between 4 ~ 29 mm.
-
In programs that use the instruction get_distance() like yours does, the X coordinate tends to be accurate at the center of the image but become increasingly inaccurate as you move away from the image center towards its outer edges. This can be caused if the incorrect intrinsics are used.
For example, if aligning depth to color then the color intrinsics should be used instead of the depth intrinsics. This is because when depth is aligned to color, the origin of depth becomes the center-line of the RGB sensor.
Please sign in to leave a comment.
Comments
14 comments