Issue Details (XML | Word | Printable)

Key: FLM-45
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Filip Pacanowski
Reporter: Borys Musielak
Votes: 0
Watchers: 0
Operations

Clone this issue
Create sub-task
If you were logged in you would be able to see more operations.
Filmaster

Recommendation engines test

Created: 05/Apr/09 05:39 PM   Updated: 05/May/10 04:30 PM
Component/s: Recommendations
Affects Version/s: 1.0.4
Fix Version/s: 1.0.4

Time Tracking:
Not Specified

File Attachments: 1. File run_test_recommendation_engine.sh (0,6 kB)
2. File test_recom.py (1 kB)



 Description  « Hide
Write a script testing the relevancy of the current recommendation algorithm allowing also to be easily configured to test any new algorithms implemented.

The script should be given one parameter: the reference to the guess rating field in Rating to be used (for the current algorithm: guess_rating_alg1)
The script should simply compute the average difference between the real rating (the one manually assigned by the user) and the guessed rating, i.e. sum(abs(rating-guess_rating)) / number_of_ratings_with_guess_rating_computed

It should be possible with one PSQL query.

When implementing new algorithms we'll be keeping the old ones and only adding new fields:
- new score fields in RatingComparator (current algorithm: score - to be renamed to score_alg1, new ones to follow the schema)
- new guess rating fields in Rating (guess_rating_alg2, guess_rating_alg3, etc)

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Filip Pacanowski added a comment - 27/Jun/09 10:27 PM
I've written a simple test using root square mean as a measure of algorithm accuracy.

Borys Musielak added a comment - 27/Jun/09 10:44 PM
Cool! I have just run it with the following result:

$ python run_test_recommendation_engine.sh
RMSE for alg1: 1.57456966323
RMSE for alg2: 2.03362124647

Now... what does it mean? :>

Filip Pacanowski added a comment - 28/Jun/09 11:19 AM
> Now... what does it mean? :>
Well, it's just average error of algorithm. Instead of suggested arithmetic mean, I used quadric mean, as it's used to rate solutions in Netflix Prize. It also means that alg1 is more accurate than alg2.

Borys Musielak added a comment - 28/Jun/09 04:57 PM
OK, makes sense. Alg2 was based on "normalized" (not really, but that's how we called it) ratings and it looked worse from the beginning.

If you think this task is finished, feel free to resolve it and check in the changes into SVN.

Borys Musielak added a comment - 01/Sep/09 10:55 AM
Tested on prod, thus closing the issue.

Borys Musielak added a comment - 28/Apr/10 07:15 PM
Retested based on current ratings. With more ratings old algorithm got better, but the new one by Jakub Tlalka beats it badly:

$ python run_test_recommendation_engine.py
RMSE for alg1: 1.54905903957
RMSE for alg2: 1.40973588554

( checked in as http://bitbucket.org/filmaster/filmaster-test/changeset/3207fbf7bff1 )

Borys Musielak added a comment - 05/May/10 04:30 PM
With the updated version of the new algorithm, the RMSEs look as follows:

RMSE for alg1: ~1.55
RMSE for alg2: ~1.30

Getting better!