Transform the default perspective view image to orthographic view image
Hello, I'm new to computer vision and image processing field and am not very familiar with my Intel Realsense D435 and D435-i camera. I want to do a task to transform my perspective view image to pure orthographic view image where objects away from the center doesn't show up their sidefaces. Is this possible? If yes, could you please let me know what I need and how I can do this? Thank you very much!
-
Hi Ziheye May I first ask whether your situation is similar to the one at the link below, where the camera is overhead looking downwards at humans and only the top of the head of people is captured at the center of the field of view but the sides of the body are captured towards the side of the field of view?
https://github.com/IntelRealSense/librealsense/issues/8044
In that case, it was suggested that a threshold filter be applied to exclude depth information from the image that is further away from the camera than a certain maximum distance.
-
Hi MartyG, thanks a lot for your reply! I think this post is helpful. However my object height is only about 2-3 cm, which does not fit the unit change of 0.1m in thresholding part. Is there any other thing I can do? Thank you very much!
Best
-
MartyG Hi Marty, the three images above are after I turn on the preset high accuracy mode. The first image is the configuration snapshot. The second image is the rgb image. The third image is the depth image. From raw estimation I think the distance between my camera and the bottom of the bin is about 1 meter (as you can see, even at this height there are still some side face black shadow near those small cubes. I want to use these perspective view images as my base and get the binary thresholding image where those small cubes' side faces are gone and their x, y positions are real x, y positions in the real world). Thank you!
-
In the above images the Viewer is in 2D mode. If you left-click on the '3D option in the top corner of the Viewer window then a 3D pointcloud scan will be generated. If you move the mouse cursor over the image then 3D XYZ real-world coordinates are displayed.
The pointcloud can be exported into a pointcloud data file called a .ply using the 'Export' option at the top of the Viewer window when in 3D mode.
Are the sides still visible when in 3D pointcloud mode?
-
MartyG Hi Marty, I got the 3D point cloud rendered. But I don't know how to use code to control and filter out to get my target orthogonal view image. I'll have two images submitted here in simulation real quick so I can better explain what I need. Thank you very much!
-
MartyG Hi Marty, the following image is a image in simulation. The left part is the orthogonal view image where only the top faces show up. My workspace is 30cm*30cm, so when I detect an object's center I can easily convert that pixel with a linear function to map to the real world x, y position. On the right hand side is the perspective view image where side faces show up. If I understand you correctly I can use depth to filter out the side faces and only keep the top face first, but I don't know how hard or what I need to do (I'd really appreciate it if you can provide me some suggestion). Besides, how can I do mapping from a perspective image pixel to real world position? Is that based on 3D point cloud or just and rgb-d image can help me do this? Sorry for bothering with these questions, and I really appreciate your time and help!
-
In the Post Processing section of the RealSense Viewer's options side-panel, you can use a filter called the Threshold Filter to set a maximum depth distance, which is 4 meters by default on this filter. If the sides of the object were at a distance beyond this maximum then they would be excluded from the depth data, leaving only the top surfaces. This would depend though on the objects and the camera always being at the same heights in relation to each other, and also the depth values of the object sides being measured accurately by the camera.
The Threshold Filter can also be used outside of the RealSense Viewer in your own RealSense application by programming it into a script. Please let me know which language you are using (C++, Python or C#) if you would like to try this.
In the 3D point cloud mode, you do not need to convert 2D pixel coordinates to 3D world coordinates as the Viewer does this for you. When using 2D images, it is possible to convert 2D pixel coordinates to 3D world points in the RealSense SDK. The RealSense Viewer tool does not have this feature built in for 2D images, though it can be programmed into your own RealSense application.
The SDK C++ example program rs-align-advanced provides an example of aligning depth to color so that when the maximum depth range is restricted, the color pixels associated with excluded depth pixel coordinates are removed from the image automatically.
https://github.com/IntelRealSense/librealsense/tree/master/examples/align-advanced
-
MartyG, Hi Marty. I use python. I tried that depth threshold filter but it seems that the minimum unit is 0.1meter while my object is around 2-3cm height. Is there any other ways that I can convert a perspective view image to an orthogonal view image very easily and simply? Thank you a lot for your help!
-
You stated earlier in the discussion that "the distance between my camera and the bottom of the bin is about 1 meter". So the minimum distance of the camera would only be a problem if the objects were closer than 0.1 meters (10 cm) to the camera lenses, as depth is measured from the front glass of the camera on the D435 / D435i cameras.
As the objects are 1 meter from the camera at the bottom of the bin, if the height of the camera is fixed then setting the maximum distance of the camera in the threshold filter to 1 m instead of the default 4 m, or a slightly smaller value such as 0.99, may help the eliminate the floor of the bin and the sides of the objects.
The sides may look more like the top (giving the impression of a totally flat object) if the strength of the light source that is cast into the bin is increased, like with the Lego brick-sorting neural network project at the link below that "blasts" light into the brick container below the overhead mounting point of the camera in order to eliminate shadow.
-
MartyG Hi Marty, what method should I use to get the extrinsic parameters and intrinsic parameters of my camera with respect to a world coordinate point? Thank you!
-
The link below provides an example of a Python script that first accesses intrinsics and extrinsics and then afterwards uses the SDK instruction rs2_deproject_pixel_to_point() to deproject 2D pixel coordinates to 3D world points.
https://github.com/IntelRealSense/librealsense/issues/1904#issuecomment-398434434
-
MartyG Hi Marty, thank you for this info! May I ask what this extrinsic parameters mean? Is it the relative pose of camera to a known object or pose in the real world? I have a hand-eye robotic system now with the camera facing down to my grasping workspace, but I don't know how to build up the relationship between camera pose and robot end-effector pose. I'd really appreciate it if you can provide me some suggestion and resources about this issue. Thanks again!
-
Extrinsics describe the relationship between separate 3D coordinate systems on the camera (for example, between color and depth, or color and left-side infrared). The RealSense SDK's Projection documentation provides information about extrinsics at the link below.
https://dev.intelrealsense.com/docs/projection-in-intel-realsense-sdk-20#extrinsic-camera-parameters
There are links to robot arm hand-eye calibration tools that are compatible with RealSense 400 Series cameras here:
https://support.intelrealsense.com/hc/en-us/community/posts/360051325334/comments/360013640454
-
MartyG Thank you Marty! I'll look into these resources!
Please sign in to leave a comment.
Comments
17 comments