Sense hands on table when realsense 415 & projector are fixed above.
Hi - I've got the 415 realsense sdk 2.0 working within Unity which is great and v. colourful.
I'm building an interactive table like the picture below, and am hoping to use the realsense to work out where peoples hands are.
What mention would you suggest I use to go about this ? - Currently using the sdk examples I can't make it tell the difference between objects between 2.8m - 3m from the camera (where the hands would be on table).
any help much appreciated :)
-Jerry
-
There was a RealSense game application called 'Tanked!' in 2015, where a downward-facing RealSense R200 stereo camera (which had a depth sensing range of 4 m) pointed downwards from a mounting point atop a TV at a sandpit table. As the players moved the real-life sand around with their hands, the RealSense camera read the updated topography of the sand, and a projector projected imagery onto the sand. It is not quite the same as your application (the sand is read instead of the hands) but the principle of overhead camera and projector is similar, demonstrating the viability of your application.
https://www.youtube.com/watch?v=tz0Aa4qjxeQ
For the 400 Series cameras, the commercial software package Nuitrack SDK may be useful. It can track the left and right wrists (which should be sufficient to approximate hand position) and can be tightly integrated into Unity. It costs $39.99 a year, and has a free trial version so you can test it in Unity to see if it is suitable for your needs.
Here's a tutorial example on setting up skeleton tracking in Unity with Nuitrack SDK.
-
thanks for the reply Marty -
I'll try and contact the Tanked developers Nathan and Alex at Design Mill to see how they went about it.
I contacted Nuitrack last month and they said it was unlikely they could help as the camera was fixed above the users and their system worked front-facing only. Shame really, as they were quick to reply !
If you can think of any other examples please pass them on,as while a lot of museums seem to have similar interactives to the one I'm trying to build - no-one has documented the process (with Realsense anyway).
thanks,
Jerry
-
I remember that the creators of Tanked! did a making-of article. The original page is broken but the article is still accessible here:
http://hyd23.rssing.com/chan-12487997/all_p99.html
I'm also reminded of a RealSense display wall tech showcased last year called Brixels.
https://www.intelrealsense.com/brixels-powered-by-intel-realsense-technologies/
I also just remembered about the HP Sprout. It was a desktop computer in the original RealSense generation that had a RealSense camera mounted pointing downward at a flatbed scanning surface. The camera used inside it was the F200, which was the original RealSense camera and the predecessor of the SR300 (both of which can detect hand joints).
https://www.youtube.com/watch?v=H1d1KhFFCUw
Another method of generating 'holographic' images on a display surface other than projection from above is a very old method called "Pepper's Ghost". Basically, you can put the projector underneath a sheet of glass (the display surface) and project up onto the underside of the glass via a mirror. The link below explains its use in a pinball machine.
https://www.libertygames.co.uk/blog/how-do-sterns-ghostbusters-holographic-pinball-targets-work/
Conceivably, you could have a RealSense camera in each end of the table, pointing to the person at the opposite end of the table so that the hands of two table users at the same time can be detected. The camera could be mounted relatively horizontally, pointing up at a shallow angle like the projector, and that should solve the problem of not being able to detect joints when the camera points downward.
By adding more cameras around the inside edge of the table to cover more sides of the table, you could increase the number of supported users. Nuitrack can detect up to six skeletons.
-
That article is very good - and thanks for the other links too.
I've had a full day of diddling with this, and have yet to experience an epiphany yet - so a couple more questions.
I'm using the latest/last SDK from here. And the examples all work fine.
https://github.com/IntelRealSense/librealsense/tree/master/wrappers/unity
- In the readme it mentions a Step 1 - where you use CMake to build dependancies, and create a .unitypackage. Now I was using the unitypackage linked to at the top of the same page - does this mean I don't need to go through the CMake process ?
- On the unity asset store there are two versions of OpenCV - one is OpenCVforUnity, the other OpenCVplusUnity.
The examples in OpenCVforUnity will work with a regular webcam, but not with the Realsense 415 - I get a "could not connect pins - renderstream() . The OpenCVplusUnity works with the realsense 415 but I can't get it to use the depth option.
- Whichever I use, I cannot seem to access anything previous realsense developers have used - most of the code snippets I've found reference pxcmStatus & pxcmSenseManager
- are these element no longer in the SDK ? - should be ignoring anything a read here - as it's only relevant to earlier camera ? https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?doc_devguide_introduction.html
ok thanks for any help , I'm taking the camera hope to show the kids because they're probably know how to deal with it !
cheer,s
Jerry
-
If you are using Windows and want to get the RealSense 400 Series camera working in Unity then there is a useful shortcut process that avoids having to do any building with CMake. You just use pre-made files taken out of the binary version of the SDK (the one you download from the SDK's "Releases" page as an .exe file). The link below describes the process.
https://forums.intel.com/s/question/0D70P0000068Yo0SAE
SDK releases can be found here:
https://github.com/IntelRealSense/librealsense/releases/
After installing the SDK's .exe, a folder of files is created that you can retrieve the library files from for transfer into Unity using the process described.
OpenCV For Unity is the best option on the basis of reviews and reputation, though it is not inexpensive, as I'm sure you have seen. Having OpenCV For Unity does offer up compatibility with Playmaker and with the best features of programs such as DLib Face Detector.
https://assetstore.unity.com/packages/tools/integration/dlib-facelandmark-detector-64314
Anything that references 'PXC' or 'SenseManager' is related to the old 2016 RealSense SDK, which is compatible with the SR300 camera but not the 400 Series cameras. Which is a pity, as the 2016 SDK is superb for hand and face tracking. I spent a couple of years writing detailed step by step guides for it.
https://software.intel.com/en-us/forums/realsense/topic/676139
-
thanks once again for the reply Marty - I'm currently pestering people in multiple countries in order to solve my problem !
I'm now looking into what occurs in the ProcessFrame(VideoFrame frame) function in RsStreamRenderTexture class in the sdk - I'm keen to understand how it generates that VideoFrame information - I can't see a VideoFrame class or struct anywhere - is it hidden somewhere ?
I'd like to see if the depth data can discern between where the table will be (aprox 3 m from the projector) and where the hands are ( probab 30 cm in front of the table ) When I set min range to 2 and the max to 4, the hand and the table seem to be the same colour - do you think it will it be able to tell the difference or is it the movement of things that will allow detection ?
ok - I think people in Iowa are waking up in 4 hours so I'm hoping to poke the developers at DesignMill some more for pointers - their Torch projection system does seem to do exactly what I'm aiming for !
ok thanks again and have a great weekend,
-Jerry
-
The workings of the scripting in the Unity wrapper is mostly outside of my knowledge, unfortunately. The RealSense team members on the GitHub site should be better able to answer it. In basic general terms though, the Unity wrapper takes the camera data and applies it to a Unity material that is displayed via a shader, rather than displaying the camera data in its original raw form.
https://github.com/IntelRealSense/librealsense/tree/master/wrappers/unity#images
I suspect that if you have moving objects (the hands) rather than static ones, tracking is probably going to work better. There was a recent discussion on the GitHub about tracking an object as small as a bee. The Chief Technology Officer (CTO) of the RealSense Group at Intel offered some input on that.
https://github.com/IntelRealSense/librealsense/issues/4175#issuecomment-507448389
-
quick one before the weekend hits Marty - I see the SR305 is out at the end of July - will this use the 2016 sdk for the 200/300 series from that allows blob data tracking ?
I have a terrible feeling the 415 isn't going to be able to do what I want, but I'm going to wait for the github team to decide that when they reply.
-
The RealSense SDK Manager, Dorodnic, said that the SR305 will work with the current RealSense SDK 2.0 but not with the old 2016 SDK unfortunately. As finger tracking is in the SR305's feature list on the Click store page, I am not sure what the RealSense team's plans for making use of its finger tracking in the present day are. I've flagged up a query about this to Dorodnic on the RealSense GitHub to get his advice.
Have a great weekend!
-
Hi Again Marty - hope you have a good weekend.
I've got openCv detecting blob data in the texture created in the RsStreamTextureRenderer - which is a step forward. But I've got issues now.
1 - I can't discount objs picked up by the camera - There's a tripod 50 cm from the board representing a user head/shoulder that I want the blob detect to ignore, but even though it's out of the depth mat min-max range it will still show up. How could I get the openCV algorithm to ignore it ?
2 - When an object is just above the board it's too vague to pick up. The second tripod in the pic (up and to the right) shows this - it is roughly where the users hand would be. I've changed the depth multiplier to try make this more obvious to pick up but can't get it to make a difference. Is there another approach I should be taking ?
I'm going to keep testing with the blob detect parameters in openCV to see if I can solve these issues in the meantime.
-cheers,
Jerry
-
Thanks for sharing your progress!
I thought very carefully about your problem. OpenCV programming is mostly outside of my knowledge, unfortunately. In general though, I would suspect that setting a depth max from within your OpenCV code using "thresholding" of pixel depth might help isolate the tripod from detection.
https://docs.opencv.org/2.4.13.7/doc/tutorials/imgproc/threshold/threshold.html
-
Cheers Marty. I'll look into that.
I think the key to solving this will be in getting access to that Rsstream depth data before it creates the depth texture - so I can discount the too-close-data.
Is this data unavailable in the RealSense Unity wrapper ? - I'd like to mail @dorodnic about it. Is a direct email to him an ok approach ? Or should I continue posting in gitHub ?
*also* how is this person getting their depth data out ? Is it because they're using something other than Unity ?
https://support.intelrealsense.com/hc/en-us/community/posts/360033348574-Gets-the-depth-of-the-pixel
ok cheers, and thanks for all this excellent help :)
-
Posts on the GitHub are the best way to talk to Dorodnic.
The application in that link looks like a user-created script rather than a Unity program. I cannot tell from the post how they programmed it.
Apparently though, the Unity wrapper has a depth cutoff processing block - that may be worth investigating.
https://github.com/IntelRealSense/librealsense/issues/3273#issuecomment-465110170
-
hi Marty - based on your last post I've switched from 2d to 3d to use that depth cutoff - and am running the result into a texture that gets blob detected.
The cut-off part works which is good, but there's a heck of a lot of noise generated by the point cloud that's getting picked up by openCVs blob detect - is there a good way to reduce this amount of flicker ?
I'm going to mess around with the blob params a bit to see if i can filter them out, but they are as large as the area generated by my hand, so it won't be simple eh.
ok cheers,
-
If the camera is mounted from above then I assume that the depth noise is not caused by your hand being less than the camera's minimum depth sensing range.
Another cause of disruption in the image can be if the location that you are using the camera in has fluorescent lights such as ceiling strip lights.
https://forums.intel.com/s/question/0D70P0000069D3qSAE
Even if the lights are not fluorescent, having the camera closer to the ceiling than normal can cause noise even from ordinary bulb lights if the light-source is strong. I recall there was once a user who tried to scan a workshop of car engines from above with the camera pointing downwards. The area of the image nearest to the lights was covered in large white blobs, with the engines only appearing normal at the parts of them nearest to the floor (i.e furthest from the light-source).
-
In the case of fluorescent light disruption, disruption may sometimes be reduced by running the camera's FPS at a speed that is closer to the frequency that the light-source is flickering at. For example, if a fluorescent light was operating at 60 hz then setting the camera to 60 FPS may minimize noise caused by the light. So you could try changing the camera FPS to see if it negates some of the effects of the projector light.
-
hi marty - I'm now using the point cloud data because it can do the depth-cutoff that the 2d depth scan can't.
I'm passing the resulting image into openCV to get it blob-detected and the problem for me now is it can't tell where the hand is - it just reads blob centre.
The attached pic is the circular table from above with 3 hands coming into view. Do you know a way that I could make it detect the hand here ? I'm probably going to raycast toward the table centre until i hit a white spot to discern touch area, but that won't always work, plus it's already v.slow with all the image parsing that openCV is doing from the unity Realsense sdk.
Admittedly - this is probably more of an openCV thing, but I just wanted to keep the aspect active for anyone in my situation - especially as I still have yet to resolve it :)
ok cheers,
-
The problem with blob tracking is that it cannot recognize the hand joints - it just literally sees a large area of blob (such as the flat-ish back of the hand) and reacts to it.
RealSense depth camera users who wanted to track transform positions with a robot (e.g a robot arm picking items from a stock bin) have generated "pose" data using OpenCV. Pose tracking data is normally only available with the T265 tracking camera, but these users found workarounds for their depth cameras.
The T265 tracking camera can also be paired with a depth camera to gain access to very high quality tracking data, though that is an additional expense ($199).
-
Jerry, did you ever find a solution to this?
I'm doing virtually the same thing with the D435. I was looking at using the Cubemos sdk because I could have upwards of 8-10 people around the table with a projector and needing to simulate touch, but I'm hitting some road blocks thinking about the angle of the camera in relation to the projector and the Unity program.
Please sign in to leave a comment.
Comments
20 comments