Oct 01

Just read an interesting article on AdAge today, which talks about a sharp decline in percentage of users who click on online display ads. Couple this with recent numbers of online ad spend decline and you’ve got yourself a pretty scary picture of the state of content monetization. But is it really?

Semi, or unrelated display ads, are not like other links on the page. They usually bring no utility or immediate value to users; more like “content high jacking” than “content monetization”. It’s no surprise then that people are fed up with this noise. We do still click on things  - to get to more relevant content - so click-through is not dead, just getting smarter.

I think it’s got something to do with the packaging. When we visit web pages we very easily separate between “content” and “noise”.  The former is what we came in for, while the latter is what we have to put up with because we are not willing to pay for the real content. We’re obviously less likely to click on the “noise” which drives its price down, making it even more difficult for good content providers to survive. It’s not the underlying content though, it’s the ineffective way of associating the message with a call-for-action.

One of the main reasons for this, I believe, is the lack of relevance due to weak contextual association. In order to have an effective “participation” from viewers, one has to bring closer the enabler (primary content) and the CFA or ad (secondary content). There are two dimensions to this. The first one is space: if it’s relevant to the content then couple it with the content, don’t just throw it into the same container (page) and expect the two messages to stick. The second is time: most content is fundamentally linear (text you read through, video you watch though, or audio you listen to) and comprises of lots of smaller pieces of information (a sentence, a frame in a video or segment of audio). It is those pieces, rather than the big piece, that could be monetized, just like hypertext links to relevant/contextual content are being followed.

The ultimate goal is to personalize this, to get the right message delivered to the right person at the right time in the right context. Lots of progress is being made towards these goals by my company (Overlay.TV) and others.

Click-through is not dead, it’s just getting smarter.

[Orginally posted at  http://kishkush.com/2009/10/01/is-click-through-really-dead/]

Aug 13

Why in-video context inference is so difficult

Google AdSense changed the world of online advertising, using standard Information Retrieval techniques to match ad inventory with web pages. This has opened new possibilities for advertisers to get a better bang for their buck and send the right message to the right audience. Can the same be done for video? Easier said than done.

Web pages lend themselves nicely for textual analysis. Not only can content be stemmed, counted for term frequency and enhanced semantically, cues in an HTML document such as headings, page title and metadata provide a much richer base for context inference.

Video, on the other hand, is a much tougher nut to crack. It is unstructured to begin with and has much fewer metadata elements that could be of any use in drawing semantic conclusions. Nevertheless, video is what web users are now consuming more than ever. The number of clips served daily by video sharing and premium content sites is staggering, and so is the ever-increasing time users spend on online entertainment.

It is, therefore, not surprising that more advertising dollars are now being diverted to online media on the expense of traditional TV campaigns. However, the fundamental question still begs – can this money be spent more effectively delivering not only the right message to the right audience, but also at the right time?

Time introduces the most challenging factor in video advertising. While concepts discussed on a web page are linear for reading but parallel to access (the whole page shows as one piece in the browser), scenes in a video are temporal and can only be consumed sequentially. Not only do we need to wait for content to be streamed to our client before we can scan it, skimming over the timeline is not quite the same as scrolling up and down a web page. To this, lets add lack of anchors and in-content links to realize that inter- and intra- navigation in a video space is very different from navigation in hypertext.

So how can advertisers tap into this elusive medium and deliver contextual messaging? Currently, with great difficulty and without much success. One can obviously use the surrounding textual information of the embedding page as well as look at the video title, description and tags to make some assumptions as to what the video is all about; but that would be like drawing similar conclusions based on a whole web site rather than at the page level – to use a web analogy. Nothing in and around the video can actually tell us much about the particular scenes or even define when those start and finish.

Better tools are, therefore, needed - if visual data is of not much help then how about audio? That, indeed, is the flavor of the month in contextual video advertising with companies using speech-to-text techniques to transform audio-visual data to temporal text signals to drive good old semantic analysis. Alas, verbal information only captures a small fraction of what actually goes on in a video. It is enough to consider narration over background visuals in a documentary or the stylishly rich details of a music video clip to see how so much information is lost in this translation.

There is no silver bullet to automate this just yet. No clever algorithm – trained, unsupervised, adaptive or other – can come close to an average person’s ability to easily describe what they see in a video. Any person, you say? Why not use plenty of those then? By the laws of large numbers, they’re bound to reasonably describe it collectively.

Stay tuned for the next post on crowd-sourcing context inference in video and why advertisers should care.

Follow us on: facebook twitter RSS
   
preload preload preload