For my thesis work, one of the important early steps is reconstructing real-world 3D locations of image features based on the stereo data. This isn't actually very complicated, and I got it working this week. Except... just by staring at the coordinates, I couldn't convince myself it was working correctly. So I spent some of today throwing together a visualization just to be sure.

This is a frame from the same test series I posted in segmented form last week.
On the left are the row/column locations of the SURF features extracted from the image. The stereo disparity data for those points is then used to generate a x/y/z location for that feature in the real-world, relative to the camera. I took these coordinates for each feature and automatically generated a POV-Ray scene file set to mimic the attributes of the stereo camera. This was rendered, giving the image on the right. The important thing is that the dots are all in the same locations, meaning the 3D reconstruction is valid. And you can clearly see that the 3D positions make sense -- the features which are from closer parts of the original image are rendered as being closer to the camera (i.e., they're bigger).
Next step: using the reconstructed 3D positions of the SURF features, calculate the ego-motion of the camera between frames using basic least-squares. Then I can start extracting the motion features from each track and building the classifier.

This is a frame from the same test series I posted in segmented form last week.
![]() | ![]() |
On the left are the row/column locations of the SURF features extracted from the image. The stereo disparity data for those points is then used to generate a x/y/z location for that feature in the real-world, relative to the camera. I took these coordinates for each feature and automatically generated a POV-Ray scene file set to mimic the attributes of the stereo camera. This was rendered, giving the image on the right. The important thing is that the dots are all in the same locations, meaning the 3D reconstruction is valid. And you can clearly see that the 3D positions make sense -- the features which are from closer parts of the original image are rendered as being closer to the camera (i.e., they're bigger).
Next step: using the reconstructed 3D positions of the SURF features, calculate the ego-motion of the camera between frames using basic least-squares. Then I can start extracting the motion features from each track and building the classifier.


no subject
no subject
no subject