Friday, 24 August 2012

Update + Versioning

I have started using BitBucket and Mercurial to version and track my project.  They can be found at :


In the Web Application, I am still working on the Javascript and AJAX trying to update the face as the Kinect Application runs.

Saturday, 11 August 2012

Kinect Face Tracking

So lately I've been trying to use the Kinect face tracking, as opposed to OpenCV's face tracking algorithms. OpenCV didnt work as well with the Kinect as I had hoped, and the Face Tracking Library for Kinect gave a clear cut example of implementing face tracking with the Kinect device.  So I have got an implementation that tracks the face, and outputs its position to a text file.

The next goal (which I have done a bit of work on already) is to write some javascript to update the face and detect states of conversation.  Updating the face with AJAX is not complicated, however resolving conversation to states in conversation is going to require more thought. Each state should match some clearly defined criteria for conversation.

My other objective while I am programming javascript, is to make the system navigate for the user, when they ask certain questions directed at LATTE in general, Projects, and People.

Kinect for Microsoft SDK (Face Tracking) http://msdn.microsoft.com/en-us/library/hh855347.aspx

Thursday, 19 July 2012

Kinect Kiosk Progress Update

I had been working on designing a simple interface for the kinect kiosk that provides access to information about various.  It's seperated into several areas, 'About', 'Projects' and 'People', which can be navigated between.

Also since my last post, my computer broke, and I got a new computer (yay), but had to reinstall and recover all my data.  I've been having trouble setting up the develpment environment to work and compile my kinect application, so this has been a priority.

For my reading, I have been learning a bit about the AIML system, javascript, and Kinects new face tracking features.  At the moment, it allows me to set answers to key words or phrases, provided by the speech input element.  It does not work well when typing in questions though, as it check for new input every few seconds.

Because the AIML keywords are quite simple in keywordsand patterns that it picks up on, this could be used to navigate the Kinect Kiosk page with voice.  Likewise, the avatar can react to the user talking accessing certain information via touch.

Kinect Face Tracking features in the new SDK release could solve problems I have been having with OpenCV's face classifiers to tracking faces.

I will update when I get the kinect application working again.

Tuesday, 22 May 2012

Speech Input to Browser App

Capturing Speech in the Web Browser


Recently I have been looking to integrate audio into the web application, and as I understand HTML5 has a very simple speech-to-text input element.

I've found an example here

And the full specification here

Specific Uses

This speech-to-text input element can be used to allow voice input to the Kinect Kiosk.

Users shall be able to ask their questions, and the system shall interpret their speech and attempt to infer an answer using Sitepals AIML.

Limitations

The speech-to-text element requires users to press the button to start a recording, part of the specification requires that users know they are being recorded.

It also requires Google Chrome at the moment, and is not a standard for web browsers or W3C.  In other browser like IE and Firefox, the speech element appears as a normal textbox.

Other Work


I've also been making progress on face/motion detection using the Kinect and OpenCV, though the classifiers are still not working.  I'll make another post on my progress with that when it is running, and then I'll provide some sample code.

Sunday, 13 May 2012

Progress with Integration

Since the last post some work has been done to set up the Kiosk with existing software, and made progress with integration between C++ applications and Sitepal's Javascript API.

I will cover three things in this post:

Setting up Windows 7 as a Kiosk


I had found a kiosk mode for the google chrome web browser, which allows me to display the webpage and prevent users from browsing other websites.  One problem however is that users can still use alt + f4 to exit the browser and access the desktop.  There are ways to disable alt + f4 which I may explore, but a temporary solution is to provide the on screen keyboard.

Existing Sitepal software

To enable the sitepal software to send and receive HTML requests for my initial experimentation I setup a local apache server.  The webpage uses Javascript, AJAX and HTML 5 to allow for several features.  AJAX is used to poll a datasource at set intervals and update the Sitepal.  I am currently using JSON to define data.

At the moment users can type into text boxes phrases and questions.  The avatar has a limited knowledge base for questions at the moment and can be improved with the Aritificial Intelligence Markup Language (AIML).

For future work, HTML 5 allows for recording devices to be detected on the system via the webbrowser, and this may be implemented with Kinect Microphone array to feed audio into Sitepal.

Integration between C++ and Javascript

A C++ application runs on the local server and captures and processes information from the Kinect sensor.  In order to output data to the server, and Javascript, I am writing information to a file in JSON format.  The file is read using javascript, and sends a HttpRequest to Sitepal every 500 milliseconds.

Tuesday, 10 April 2012

Kinect Progress

After a bit of fuddling with the best method to interface with the Kinect for this project, I've found OpenCV to be the best C++ API for this task. OpenCV is an open source API used to display and manipulate image data. Its relatively light weight and not as obfuscated as the Windows API, which means I can complete whole tasks much quicker. Also I have chosen not to use OpenGL, as its pipeline adds additional complexity to how images are loaded from the video stream.

A link to the OpenCV site is here http://opencv.willowgarage.com/wiki/

My first implementations have been capturing and displaying color and depth frames from the device.  I have begun implementing OpenCV's face and eye classifiers to detect a face in the video stream, I have implemented the classifier, however it has not detected a face and I am unsure why.

A link to a tutorial on implementing OpenCV classifiers can be found here http://opencv.itseez.com/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html

The main functions are the CVKinect::Init() Which begins the main loop, and the getColor/Depth/Face() functions.  I am also using OpenCV2.3.



#include <stdio.h>
#include <iostream>

#include <cv.hpp>
#include <highgui\highgui.hpp>
#include <objdetect\objdetect.hpp>
#include <imgproc\imgproc.hpp>

#include <NuiApi.h>

#define RGB_WIDTH 640
#define RGB_HEIGHT 480

#define DEPTH_WIDTH 320
#define DEPTH_HEIGHT 240

#define RGB_CHANNELS 3
#define RGBA_CHANNELS 4

class CVKinect
{
public:

CVKinect( );
~CVKinect( );

int getColor( );
int getDepth( );
int getSkeleton( );
int getFace( );

RGBQUAD Nui_ShortToQuad_Depth( unsigned short s );

HRESULT init( CVKinect * );
void uninit( );

private:

cv::String face_cascade_name;// = "haarcascade_frontalface_alt.xml";
cv::String eyes_cascade_name;// = "haarcascade_eye_tree_eyeglasses.xml";
cv::CascadeClassifier face_cascade;
cv::CascadeClassifier eyes_cascade;
cv::RNG rng;


IplImage* rgb;
IplImage* depth;
IplImage* skeleton;

HANDLE rgbStream;
HANDLE rgbEvent;

HANDLE depthStream;
HANDLE depthEvent;

HANDLE skeletonStream;
HANDLE skeletonEvent;
};


using namespace std;
using namespace cv;

unsigned char imageBuffer[DEPTH_WIDTH*DEPTH_HEIGHT*RGB_CHANNELS];

CVKinect myCVKinect;

CVKinect::CVKinect( )
{
rgbEvent = CreateEvent( NULL, TRUE, FALSE, NULL );
rgbStream = 0;

depthEvent = CreateEvent( NULL, TRUE, FALSE, NULL );
depthStream = 0;

skeletonEvent = 0;//CreateEvent( NULL, TRUE, FALSE, NULL );

rng = RNG(12345);
face_cascade_name = "C:\\Program Files\\OpenCV2.3\\data\\haarcascades\\haarcascade_frontalface_alt.xml";
eyes_cascade_name = "C:\\Program Files\\OpenCV2.3\\data\\haarcascades\\haarcascade_eye_tree_eyeglasses.xml";
}

CVKinect::~CVKinect( )
{

}

HRESULT CVKinect::init( CVKinect * myCVKinect )
{
rgb = cvCreateImageHeader(
cvSize( RGB_WIDTH, RGB_HEIGHT ),
IPL_DEPTH_8U,
RGBA_CHANNELS);

depth = cvCreateImageHeader(
cvSize( DEPTH_WIDTH, DEPTH_HEIGHT ),
IPL_DEPTH_8U,
RGB_CHANNELS);

HRESULT hr = NuiInitialize( NUI_INITIALIZE_FLAG_USES_COLOR | NUI_INITIALIZE_FLAG_USES_DEPTH );

if ( FAILED ( hr ) )
{
printf( "Error: The kinect sensor failed to initialize!\n" );
Sleep(1000);
return hr;
}

hr = NuiImageStreamOpen( NUI_IMAGE_TYPE_COLOR,
NUI_IMAGE_RESOLUTION_640x480,
0,
2,
myCVKinect->rgbEvent,
&myCVKinect->rgbStream);

if ( FAILED ( hr ) )
{
printf( "Error: The color stream failed to open!\n" );
return hr;
}

hr = NuiImageStreamOpen( NUI_IMAGE_TYPE_DEPTH,
NUI_IMAGE_RESOLUTION_320x240,
0,
2,
myCVKinect->depthEvent,
&myCVKinect->depthStream );

if ( FAILED ( hr ) )
{
printf( "Error: The depth stream failed to open!\n" );
return hr;
}

while (1)
{
WaitForSingleObject( rgbEvent, INFINITE );
myCVKinect->getFace( );

WaitForSingleObject( depthEvent, INFINITE );
myCVKinect->getDepth( );

int c = cvWaitKey(1);

if ( c == 'ESC' )
break;
}
return 1;

}

void CVKinect::uninit( )
{
cvReleaseImageHeader(&rgb);
cvReleaseImageHeader(&depth);
cvDestroyWindow("Video");
cvDestroyWindow("Depth");
NuiShutdown( );
}

int CVKinect::getColor( )
{
const NUI_IMAGE_FRAME * imageFrame = NULL;
HRESULT hr = NuiImageStreamGetNextFrame( rgbStream, 0, &imageFrame );

if ( FAILED ( hr ) )
{
return hr;
}

INuiFrameTexture * texture = imageFrame->pFrameTexture;
NUI_LOCKED_RECT lr;
texture->LockRect( 0, &lr, NULL, 0 );

if ( lr.Pitch != 0 )
{
BYTE * buffer = (BYTE *) lr.pBits;
cvSetData( rgb, buffer, lr.Pitch );
}

NuiImageStreamReleaseFrame( rgbStream, imageFrame );

cvShowImage("RGB Image", rgb);

return 1;
}

int CVKinect::getDepth( )
{

const NUI_IMAGE_FRAME * imageFrame = NULL;
HRESULT hr = NuiImageStreamGetNextFrame( depthStream, 0, &imageFrame );

if ( FAILED ( hr ) )
{
return hr;
}

INuiFrameTexture * texture = imageFrame->pFrameTexture;
NUI_LOCKED_RECT lr;
texture->LockRect( 0, &lr, NULL, 0 );

//13bits of depth data, 3 bits of player data
if ( lr.Pitch != 0 )
{
unsigned short * buffer = (unsigned short *) lr.pBits;

for ( unsigned int i = 0; i < DEPTH_WIDTH*DEPTH_HEIGHT; i ++ )
{

RGBQUAD color = Nui_ShortToQuad_Depth( buffer[i] );

imageBuffer[i*RGB_CHANNELS] = 255 - color.rgbRed;
imageBuffer[(i*RGB_CHANNELS)+1] = 255 - color.rgbGreen;
imageBuffer[(i*RGB_CHANNELS)+2] = 255 - color.rgbBlue;

}


cvSetData( depth, imageBuffer, DEPTH_WIDTH*RGB_CHANNELS );

}


NuiImageStreamReleaseFrame( depthStream, imageFrame );
cvShowImage( "Depth Image", depth );

return 1;
}

int CVKinect::getSkeleton( )
{
return 1;
}

int CVKinect::getFace( )
{

const NUI_IMAGE_FRAME * imageFrame = NULL;
HRESULT hr = NuiImageStreamGetNextFrame( rgbStream, 0, &imageFrame );

if ( FAILED ( hr ) )
{
return hr;
}

INuiFrameTexture * texture = imageFrame->pFrameTexture;
NUI_LOCKED_RECT lr;
texture->LockRect( 0, &lr, NULL, 0 );

if ( lr.Pitch != 0 )
{
BYTE * buffer = (BYTE *) lr.pBits;
cvSetData( rgb, buffer, lr.Pitch );
}

NuiImageStreamReleaseFrame( rgbStream, imageFrame );

//IplImage rgb
Mat frame = rgb;
std::vector<cv::Rect> faces;
Mat frame_gray;

cvtColor( frame, frame_gray, CV_RGB2GRAY );
equalizeHist( frame_gray, frame_gray );

//-- Detect faces
face_cascade.detectMultiScale(
frame_gray,
faces,
1.1,
2,
0|CV_HAAR_SCALE_IMAGE,
Size(20, 20)
);

for( int i = 0; i < faces.size(); i++ )
{
Point center( faces[i].x + faces[i].width*0.5, faces[i].y + faces[i].height*0.5 );
ellipse( frame, center, Size( faces[i].width*0.5, faces[i].height*0.5), 0, 0, 360, Scalar( 255, 0, 255 ), 4, 8, 0 );

Mat faceROI = frame_gray( faces[i] );
std::vector<cv::Rect> eyes;

//-- In each face, detect eyes
eyes_cascade.detectMultiScale( faceROI, eyes, 1.1, 2, 0 |CV_HAAR_SCALE_IMAGE, Size(30, 30) );

for( int j = 0; j < eyes.size(); j++ )
{
  Point center( faces[i].x + eyes[j].x + eyes[j].width*0.5, faces[i].y + eyes[j].y + eyes[j].height*0.5 );
  int radius = cvRound( (eyes[j].width + eyes[j].height)*0.25 );
  circle( frame, center, radius, Scalar( 255, 0, 0 ), 4, 8, 0 );
}
}

cvSetData(rgb, frame.data, frame.step);
cvShowImage("Face Image", rgb);

return 1;
}

RGBQUAD CVKinect::Nui_ShortToQuad_Depth( unsigned short s )
{
    unsigned short RealDepth = NuiDepthPixelToDepth(s);
    //USHORT Player    = NuiDepthPixelToPlayerIndex(s);

    // transform 13-bit depth information into an 8-bit intensity appropriate
    // for display (we disregard information in most significant bit)
    // BYTE intensity = (BYTE)~(s >> 4);
unsigned char intensity1;
unsigned char intensity2;
unsigned char intensity3;

intensity1 = (unsigned char)(s >> 8);
intensity2 = (unsigned char)(s >> 9);
intensity3 = (unsigned char)(s >> 10);

    // tint the intensity by dividing by per-player values

    RGBQUAD color;

    color.rgbRed   = intensity1 ;
    color.rgbGreen = intensity2 ;
    color.rgbBlue  = intensity3 ;

    return color;
}

int main(int argc, char* argv[])
{
myCVKinect.init( & myCVKinect );

myCVKinect.uninit( );

myCVKinect.~CVKinect( );


}





Wednesday, 28 March 2012

User Attention Detection

The project has changed a bit, to enhance the current avatar software (www.sitepal.com) I am going to work on a method to detect the attention of a user at the Kinect Kiosk.  This will replace the kinect mouse cursor.

At the moment the avatar software installed can respond to user input using an AI system that can be trained to respond to certain questions.

The supervisor would like the system to greet users that show some attention to the kiosk and wish to use it. I have done some research into human detection in video, as well as face detection in video. The primary method will be to use some feature detection algorithms that will detect human presence, and compare this with how we expect a user using the system to appear.

Several feature detection methods and algorithms can be used such as Face detection, Knowledge-based, Feature invariant, Template matching, Appearance-Based and Movement detection.[1]

Furthermore, for real time processing, classifiers can be stacked, using less accurate, yet faster methods first to detect where faces certainly aren't, then using more robust methods to detect actual faces.[2]


[1]  Muhammad Usman Ghani Khan, Atif Saeed, Human detection in videos, Theoretical and Applied Information Technology, Volume 5, Issue 2, Februaru 2009, ISSN 1992-8645
Keywords: Video processing, Computing vision, Human detection, Face recognition


[2]  Yun Tie, Ling Guan, Automatic face detection in video sequences using local normalization and optimal adaptive correlation techniques, Pattern Recognition, Volume 42, Issue 9, September 2009, Pages 1859-1868, ISSN 0031-3203, 10.1016/j.patcog.2008.11.026.
(http://www.sciencedirect.com/science/article/pii/S003132030800486X)
Keywords: Local normalization; Optimal adaptive correlation (OAC); Adaboost algorithm