Feed on
Posts
Comments

Pandora thinks I am Latino

Have you read the hilarious If TiVo Thinks You Are Gay article? I recently faced similar issue with Pandora - a music discovery service. On one of my stations Pandora started playing Latino music and never recovered from it. If I remember it right I had created a Jethro Tull station to begin with. But rating Santana and Gypsy King high resulted in getting lots of latino recommendations.  I enjoyed the change initially but when I wanted to get back to my mainstream recommendations things got out of hand. I tried to rate certain songs unfavorably to get the system back to where I wanted but that didn’t work. My only option was to delete the station and create a new one.

At the time the Wall Street Journal article was written (in 2002) the personal recommendation space was brand new and evolving but my experience was quite recent. The following quote from the 2002 WSJ article is still applicable to most of the recommendation engines:

Many consumers appreciate having computers delve into their hearts and
heads. But some say it gives them the willies, because the machines either
know them too well or make cocksure assumptions about them that are way off
base.

I think the one of the problems with the most of these sites or products that provides recommendations is the Black Box nature of their recommendation engines. The users have no idea how their behavior is being interpreted by these engines. I would like to how these recommendations engines are storing and using my preferences.

Powered by ScribeFire.

I have been registered user of Amazon.com for many years now. I still remember when Amazon started to recommend books to me upon log-in. I wasn’t very impressed with the recommendations made by Amazon’s initial implementation of the recommendation system. But that was my first experience with the on-line recommendation systems, which have become very common (popular) now-a-days. Over these years, Amazon must have collected lots of data about my behavior on their site - What am I searching for? Which items am I buying, rating or recommending?. Along with me they have such information about millions of other users as well. Amazon is using this metadata for good use in their recommendations. Over years their recommendations have improved considerably.

Most on-line recommendation systems, including Amazon.com, use some variation of an algorithm called Collaborative Filtering.

Collaborative filtering (CF) is the method of making automatic predictions (filtering) about the interests of a user by collecting taste information from many users (collaborating).

There are three common implementations of the recommendation algorithms: Traditional Collaborative Filtering, Cluster Model and Search Based Model. Amazon has implemented another variant called item-to-item collaborative filter. This algorithm focuses on finding similar items, not similar customers (as done in the traditional CF and Cluster model). For each of user’s purchased and rated items, the algorithm finds similar items to generate recommendations.

Recently I chanced upon a recommendation engine called Criteo. According to their site they are the leading experts in real-time personalized recommendation solution. I haven’t used their recommendation engine so I can’t comment on that claim. Criteo’s recommendation engine appears to be one or more implementations of collaborative filtering. I tried out their Movie Recommendation Demo Application to see how their engine works. The demo application looks interesting but depends on the user’s active ratings. Based on user’s ratings, the application tries to find twin users - probably other users who have rated similar movies before. Interestingly I found the following statement on their site:

To be efficient, content approaches need a complete preliminary configuration of products. Unfortunately, this is barely possible in an open environment. Moreover, results are in general very disappointing in terms of predictive accuracy. For these reasons, content approaches are losing ground on the Internet.

Even though I agree that the content based approach is more complicated to set up, I don’t think it produces disappointing recommendations. In my opinion the converse is probably true.

I am certainly interested in exploring open-source implementations of Collaborative Filtering libraries like Taste (Java -based), Cofi (Java -based) and Vogoo (PHP-based) .

Recently Google entered the recommendation space with their new initiative called Personalized Search Engine. This space is getting very interesting day-by-day.

But as I see it, the implementations of Personalisation or Recommendation Systems as they exist today, are very fragmented. I have to be a registered user of Amazon.com, Google, Barnes and Noble and Criteo to get my personalized recommendations and each system interprets me differently. Does my taste in books or movies change when I visit Amazon.com instead of Barnes and Noble?

I would like to maintain same personality as I participate in the Ubiquitous Web. Won’t it be cool if the web can be treated as one big community where everybody knows who I am (or rather what my preferences are, without knowing me)?
Won’t it be nice if the user uses Google to research a particular topic and then visits Amazon.com to gets related recommendations on books? Now, I am not expecting exact same recommendations from all the web sites that I visit (because that won’t be fun, right?), just hoping that I don’t have to define myself on each and every such site.

I guess this was the most interesting news of the morning… I wasn’t surprised by the news, may be because I was expecting it all along. With the way Google is promoting and experimenting with their AdWords platform, this was the next logical step for them. Google is exploring every possible avenue to tap in to the advertising revenue. Once again, personalization or collaborative filtering (CF) will be very important for the success of these tests.

“TV is becoming like the Web. You have audience segmentation; users care about relevant messaging; advertisers care about aggregating and audience efficiently and getting measurements on how they’re messaging with you; and inventory owners like to monetize their viewership, even if it’s a small viewership”

Tracking user behavior is going be the key in serving appropriate ads to the appropriate audience. Even though TV is becoming very interactive these days, it is definitely not as interactive as the Web. Apparently Google is trying to track the viewer behavior and use the passive filtering to effectively serve ads.

Tracking will be done via set-top boxes, with reporting to include aggregate impressions (with no individual- or household-specific data), as well as data on how long an ad was viewed.

So what’s next?

Google recently launched a new business model called Pay-Per-Action (PPA) for its AdSense products. This model gives up its revenue generating Pay-Per-Click model in favor of a Pay-Per-Action model that could tackle the click-fraud problem associated with the Pay-Per-Click model.

The Pay-Per-Click model was popular with the on-line advertisers because they were charged only if a user clicks on an ad. This was a huge improvement over the traditional banner ads. Now, Google is taking next step by introducing PPA ads where the advertisers pay only when a user takes a specific action on the advertiser’s site, such as purchase a product. I am not sure how this is going to affect Google’s ad revenue model.

Initially I wondered, how is Google going to successfully track actions taken by users on the advertiser’s site. I have never used Google’s AdSense products, but I think Google Conversation Tracking solves this problem for Google.

So the next question is, what will it take to make this model effective? In my opinion, Google will have to be very clever in targeting the right audience with appropriate ads. Google recently introduced a new service called Personalized Search Engine. It could be the PPA model and the personalized search engine are designed to work together. If Google can effectively personalize searches for the user, they can effectively display the right ads to them. Now, that would make the new PPA model successful and very popular.

In recent years, we have all experienced a Digital Media Explosion on the internet. The dramatic advancement in technology has resulted in a superior quality of digital audio, video and other types of content media as well as (almost) instant delivery of rich content to the consumers. The improved infrastructure has resulted in a reliable and fast internet access to the common public all over the world. Apart from content produced by media companies like BBC, CNN, Apple iTunes, Sony Music, FOX etc., contributions by individuals cannot be ignored. WordPress alone has over 800,000 bloggers churning out blogs, Youtube has made home videos popular and publicly available…

It is very easy to get overwhelmed by this mind-boggling amount of information and content available to you 24/7/365. This has resulted in an Information Overload that can easily negate the effectiveness and usefulness of such information. Too much information also results in confusion (can’t understand the information or have doubts about validity) and inaccessibility (don’t know where to find it or don’t know if it exists).

How can I avoid information overload and still get relevant information/content regularly? Enter Personalization or P13N in short. P13N is probably the answer to the problem created by the exploding digital content on the internet. Can I get the content that I like, or will probably like, delivered to me? And can that happen without me having to search the internet?

Many online service providers have already implemented P13N in some form. The most common examples are Amazon’s Recommendation Engine or the latest push from Google to personalize search results. I applaud the idea and would love to see its effectiveness. What I don’t like in both these cases is that I have to register with them. Where is my anonymity?