For this assignment you will design a hybrid filtering algorithm. You will not implement it, but you will explain your design criteria and provide a filtering algorithm in sufficient technical detail to convince me that it might actually work — including psuedocode.
You may choose to filter:
- Facebook status updates, like the Facebook news feed
- Tweets, like Trending Topics or the many Tweet discovery tools
- The whole web, like Prismatic
- something else, but ask me first
Your filtering algorithm can draw on the content of the individual items, the user’s data, and other users’ data. The assignment goes like this:
1. List all available information that you have available during the debate. If you want to filter Facebook or Twitter, you may pretend that you are either of these companies and have access to all of their tweets etc. You also also assume you have a web crawler or a firehose of every RSS feed or whatever you like, but you must be specific and realistic about what data you are operating with.
2. Argue for the design factors that you would like to influence the filtering, in terms of what is desirable to the user, what is desirable to the publisher (e.g. Facebook or Prismatic), and what is desirable socially. Explain as concretely as possible how each of these (probably conflicting) might be achieved through in software. Since this is a hybrid filter, you can also design social software that asks the user for certain types of information (e.g. likes, votes, ratings) or encourages users to act in certain ways (e.g. following) that generate data for you.
3. Write psuedo-code for a function that produces a “top stories” list. This function will be called whenever the user loads your page or opens your app, so it must be fast and frequently updated. You can assume that there are background processes operating on your servers if you like. Your psuedo-code does not have to be executable, but it must be specific and unambiguous, such that a good programmer could actually go and implement it. You can assume that you have libraries for classic text analysis and machine learning algorithms.
4. Write up steps 1-3. The result should be no more than five pages. However, you must be specific and plausible. You must be clear about what you are trying to accomplish, what your algorithm is, and why you believe your algorithm meets your design goals (though of course it’s impossible to know for sure without testing; but I want something that looks good enough to be worth trying.)
The assignment is due before class on Monday, October 29.