Tuesday, May 31, 2011

Video annotations made easy

There's a project in the lab that looks at the use of some new technology and how this technology is best applied within a certain context (also perhaps, how people should change their behavior to improve the outcome). Anyway, what we will end up doing is observe a number of people and making video recordings. Throughout the experiment, these people will be talking to exchange ideas and point all noses in the same direction. At the same time, some interaction will occur through this technology, which is not easily captured on screen. However, because we deal with the tech directly, we can send events or information on a different channel, such that it can be superimposed back on video or at least related in time with certain points in the interaction.

I considered that for the purposes of analysis, it would be handy to make annotations on the video stream itself and refer to it later in time. The idea is that you can call attention or apply markers on the stream, such that particular events are easier to recognize and navigate to later. In essence, it's the same as what YouTube provides, except that we don't want these videos to be put on there yet, also because the duration of the video may be well about an hour or so.

ELAN is a nice tool that I found that has all the characteristics that we intend to use. It allows you to import an audio or video stream, which then can be annotated over the entire timeline for different events. As far as the technology events go, I've proposed to overlay that on the original video using a library called opencv. What you get is a static image that has all the events of the interaction between people, their audio, the things they did using the technology with annotations (in the form of subtitles) added by the experimenters. That way, the output video is a comprehensive output of the entire experiment, which can be replayed in good quality video players that can use subtitles in the SRT format.

Anyway, ELAN can also export to other text formats, including HTML or anything, which will also allow you to translate or transcribe entire videos and post a log of what happened somewhere. The only thing that it doesn't allow you to do yet is to output the whole thing to some directory, where you get a flash video file of some sorts, together with an HTML file + Javascript to be able to jump to each annotation from there.

Sunday, May 15, 2011

Arduino MAX7456 OSD & APRS

I've just finished up the hardware and software on an Arduino Duemilanove, connected to a MAX7456 OSD chip and implementing an APRS / AX.25 datastream over the audio channel. Video from a PAL or NTSC cam is fed into the OSD chip, which overlays the image with some relevant data variables. These variables then become visible to a human pilot in the form of some kind of HUD, allowing the pilot to make better decisions on throttle settings, landing, coming back home and so on.

The APRS / AX.25 link borrows most of the code for the signal sending from the Trackuino project. With the right filter and decoupling behind it and the right Fast PWM implementation, the signal quality is very impressive indeed (with quality meaning how perfectly the signal approximates sine waves at different frequencies).

The OSD used to be hooked up by a simple loop, where the OSD was temporarily turned off to refresh the video buffer and then turned on again. Needless to say, this results in flicker occurring at times and also characters sometimes appearing in wrong locations (due to the internal generation of VSYNC signals and the write operations being carried out at the same time).

The current hardware implementation uses INT0 on the arduino (Pin 2 on Duemilanove), which is connected through a 1K Ohm resistor to +5V and with a wire to the VSYNC pin on the OSD chip. This allows the chip to work already. Interesting points here:
  • I used to refresh the buffer every VSYNC trigger, resulting in no image whatsoever. The OSD now writes new information every x cycles or whenever anything has changed.
  • After every change to the buffer, you should re-enable the display by writing 0x0C to VM0.
The APRS / AX.25 link on audio was already seemingly working, but I couldn't get the data parsed for some reason. I suspected that the tools I was using ( multimon / soundmodem ) couldn't deal with the data or were expecting different formats. By closely inspecting the incoming audio signal however, I noticed some strange plateau's in the signal, as if the arduino stopped writing in the y-direction for a brief moment in time. Turned out that the VSYNC interrupt was interfering with the AX.25 modem interrupts, so I just made sure that either of these interrupts is active at any given time and wait for the other interrupt to finish before starting the other work. This shouldn't cause a huge performance problem for receivers downstream.

The RC circuit I use to clean the signal is in the config.h file of the trackuino sources:

// 8k2 10uF
// Arduino out o--/\/\/\---+---||---o
// R | Cc
// ===
// 0.1uF | C
// v

This reduces the 5V pin-out signal to 500mV peak-to-peak in the process of generating a very nice output. Together with the FastPWM implementation, this generates a very nice sine wave indeed.

It is very important that this signal is clean and sine-wave like. The slight delay caused by the VSYNC meant that, due to CRC checking at the RX end, the signal didn't validate. I caught on to this by being able to, once in a while, decipher a single slash '/', but longer strings couldn't be parsed at all.

The output of this signal goes to the mono audio in of the A/V transmitter on the craft. The audio signal is received by the receiver, is converted into a line-out, which is then sampled by the on-board ADC within my USB Hauppauge stick. The laptop can query the digital audio samples from the stick directly and analyze the signal to determine the frequencies. The frequency modulation is converted into a bitstream of 0's and 1's and eventually, the complete string rematerializes at the receiver end.

As said, there are some utilities for doing this on an Ubuntu computer. I've tried out soundmodem, which gives you a KISS / MKISS interface, but it's probably too complex for the simple purpose I need this for (which is to parse the string out of the data and hand this to some other process). I found 'multimon' as well in AFSK1200 mode and this does the job very nicely as well. 'multimon' was written in 1997 or so and works using the OSS interface on Linux, which is now deprecated (the old /dev/dsp interface ).

You can however load a set of alsa oss tools to simulate OSS devices and convert things on the CPU if needed. What I use to use multimon on an ALSA computer without having to modify any of the internal code:

> aoss multimon -a AFSK1200

This then outputs the data strings to the console.

So there you have it. One single, heavily used Arduino board to generate the OSD video stream and periodically (300ms?) send more telemetry (to your liking) to the ground station using APRS/AX.25 on the audio channel of the A/V transmitter. It is not a weight-effective means of doing this, because it adds one full arduino board to the weight, but it does handle all the processing quite nicely. You do need a 328P processor at least due to the size of the execution image that is to be loaded and the RAM that the code uses for internal buffers and so on.

Monday, May 09, 2011

HAM license

Well, nothing to do with the picture at the left actually, but I got my HAM license. This basically means that I can, as amateur and non-commercially, use some otherwise restricted frequency bands to perform research and other experimentations. One of the reasons to look into this relates to my work/hobby of dealing with UAV's. These require stable control lines, where delays in reception or processing over one second may incur a loss of the craft and also relates to getting direct video feeds from these aircraft using transmission equipment and sophisticated antennas.

Interestingly, the uav hobby seems to be ever increasing, especially recently when there are more kits around that are affordable and where one can get a craft in the air for under $200. There are also more self-built models in the sky and people are fooling around with new and old antennas and finding ways of making them easier or less expensive to build.

I don't have my callsign yet. I may at some point acquire some tx/rx equipment and start listening on some frequencies and explore this world a bit further. About the exam: 2 questions wrong out of 40. This is not a bad score at all. One question wrong about the use of capacitors in a feed line to a loudspeaker and the other one was I think something about legislation.