Weitere ähnliche Inhalte Ähnlich wie 20320140501009 2 (20) Mehr von IAEME Publication (20) Kürzlich hochgeladen (20) 20320140501009 21. International Journal JOURNAL OF ADVANCED RESEARCH Technology (IJARET),
INTERNATIONAL of Advanced Research in Engineering and IN ENGINEERING
ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 1, January (2014), © IAEME
AND TECHNOLOGY (IJARET)
ISSN 0976 - 6480 (Print)
ISSN 0976 - 6499 (Online)
Volume 5, Issue 1, January (2014), pp. 73-82
© IAEME: www.iaeme.com/ijaret.asp
Journal Impact Factor (2013): 5.8376 (Calculated by GISI)
www.jifactor.com
IJARET
©IAEME
ENHANCING MOVIE RECOMMENDER SYSTEM
1
Ronak Patel ,
2
3
Priyank Thakkar , K Kotecha
1
Assistant Professor, CE Department, C. S. Patel Institute of Technology,
Changa - 388421, Gujarat, India
2
Assistant Professor, CSE Department, Institute of Technology, Nirma University,
Ahmedabad - 382 481, Gujarat, India
3
Director, Institute of Technology, Nirma University, Ahmedabad - 382 481, Gujarat, India
ABSTRACT
Recommender system helps customers buying products/items efficiently and at the same time
benefits the business. It can be built using approaches like: (1) Collaborative Filtering (2) Content
Based Filtering and (3) Hybrid Filtering. In Collaborative Recommender System, ratings of the most
similar users (in case of user based collaborative filtering) or items (in case of item based
collaborative filtering) are used to predict the rating of the new item. In Content Based Filtering, user
profile is constructed based on the contentof theitems liked by the user in the past and then based on
similarity between user and item profile, recommendations are made. Hybrid Filtering combines
collaborative and content based approach. In this paper, we focus on movie recommendation task.
Prediction task is modelled as classification task where our aim is to predict whether the item (movie
in our case) will be liked or disliked by the user. In our work, we propose an item based
recommender which combines usage, tag and movie specific data such as genres, star castand
directors to improve the accuracy of the Recommender System. We have tested ourapproach using
Hetrec2011-movielens-2kdataset. We use Accuracy and F-measure to evaluate the performance of
our proposed system.
Key words: Movie Recommender System, Content Based Filtering, Collaborative Filtering, Hybrid
Recommender System.
1. INTRODUCTION
The information about the products is increasing with exponential rate. As the e-commerce
industry is growing and becoming complex. In such environment, it has become difficult for the
customer to find optimal information about products/items from the tremendous amount of
information. To help their customers to choose products/items more efficiently, major e-business
73
2. International Journal of Advanced Research in Engineering and Technology (IJARET),
ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 1, January (2014), © IAEME
companies are developing their recommender systems (RS). The customers get benefitted by
receiving some truly useful information about the products which they are planning to buy. At the
same time, business is benefitted with a growth of its sales. Recommender systems emerged in the
mid-90s in order to filter out irrelevant information and select content that meets user’s needs.
Recommendation system has been described as “An information filtering technology that produces
individualized recommendations as a nout put or have the effect of guiding the user in a
personalized way to interesting or useful objects in a large space of possible options”[1]. These
systems can be used for different purposes in several domains from offering products to
consumerine-commerce to finding proper information in research.
Various movie businesses like Netflix [2], IMDB [3], and Hulu [4] etc. also recommend
movies to their customer. Although there are several factors which affect the quality of
recommender system, recommendations based on common view points of user have become more
and more trust worthy and widely used. There commendation task is often times, reduced to the
problem of estimating what rating a user would give for an unseen item, or to find a list of items that
the user is most likely to enjoy. Movie recommendation is an open research area with unanswered
problems and with growing social networking data. There is a need of systematically fusing
different types of data about movies and users from various sources to improve the quality of
recommendations.Recommendation systems are categorized ascontent-based, collaborative or
hybrid recommender system [5].
Content-based recommendation system recommends user, items similar to the ones, the user
favoured in the past. However, it suffers from the problem such as limited content analysis,
overspecialization and new user problem [5].
User-based collaborative filtering (CF) is a technique for producing personalized
recommendations by computing the similarity between the current user and other users with similar
choices.Thus, the current user choice is predicted by gathering choice information from other users
with similar preferences. If choices matched in the past, it is assumed that they will match in future
as well. However, it suffers from the problem such as sparsity, new user problem and new item
problem [5]. In item-based collaborative filtering first similarity between items is found and then to
predict the rating of item ݅ by user ,ݑratings of ݑfor most similar items of ݅ are used.
Hybrid approaches combine collaborative and content-based methods to overcome certain
limitations of these individual techniques.Hybrid Recommendercan be built by different ways such
as: combining separate recommenders, adding content-based characteristics to collaborative models,
adding collaborative characteristics to content-based models and developing a single unifying
recommendation model [5].
In this paper, we propose an item based hybrid recommender that combines usage, tag and
movie specific data such as genres, star castand directors to improve the accuracy of the
recommender system.
2. RELATED WORK
A separate collaborative and content-based system can be implemented and then can be used
to build the hybrid recommender system. Outputs obtained from individual recommendation systems
are combined linearly in [6] while [7] uses the voting scheme for the same. In [8], additional ratings
are calculated using a pure content-based predictor. These ratings are then used to augment the user’s
rating vector in collaborative filtering. Latent Semantic Indexing is used in [9] to generate a
collaborative view of a collection of user profiles. A rule-based classifier using content-based and
collaborative characteristics is proposed in [10].
The book recommender system proposed by Liang in [11] is built from tag information only.
The authors state that tags can capture the content information of items. However, tags are
74
3. International Journal of Advanced Research in Engineering and Technology (IJARET),
ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 1, January (2014), © IAEME
sometimes meaningful only to the users that assigned them. They can be ambiguous and can also
have a lot of synonyms. Authors proposed a way to address this problem by expanding the tag set.
Weighted Tag Rating Recommender (WTRR) proposed in [1] is an extension to the work carried out
in Weighted Tag Recommender (WTR) [11]. WTR exploits tag data but does not use ratings’ data
and other information available about the items. Tags may not always capture the true preferences of
users.This is addressed in WTRR by also using actual ratings with tags. One main difference
between WTR and WTRR is, instead of simply counting the number of times a user ݑ has tagged an
item with the tag ݐ௫ , ratings are also considered of the movies which are tagged withݐ௫ by user ݑ .
We have made two key observations about WTRR: (1) it is a user-based recommender system and it
does not use all the information available about items apart from tags and ratings. (2) during
prediction, it only uses ratings of those movies which have been tagged as well.
In our approach, we also use item (movie in our case) specific information like genre, star
cast and director of the movie.This information is used alongside ratings and tags to find similarity
between items. During rating prediction, we also use all the available ratings rather than considering
ratings of only those movies, which are tagged also.
3. ITEM-BASED COLLABORATIVE FILTERING
The first objective in item-based collaborative filtering is to find similarity between items. In
our implementation of basic item-based collaborative filtering, we have used Pearson Correlation to
find similarity between items (movies) where items’ profile is in terms of ratings given to them by
different user. The formula for finding similarity between items ݅ and ݆is as given in equation (1).
݉݅ݏሺ݅, ݆ሻ ൌ
ത
ത
∑௨ ఢ ሺܴ௨, െ ܴ ሻሺܴ௨, െ ܴ ሻ
ത
ത
ට∑௨ ఢ ሺܴ௨, െ ܴ ሻଶ ට∑௨ ఢ ሺܴ௨, െ ܴ ሻଶ
ሺ1ሻ
ത
Where, ܷ is the set of users who have rated both݅ and ݆,ܴ is the average rating of item ݅ and
ܴ௨, is the rating of item ݅ by user .ݑOnce the similarity between items is calculated, we predict
ratings of unseen items as under.To predict the rating of user ݑfor an unseen movie ݉, formula as
shown in equation (2) is used.
ݎ௨ , ൌ
∑௩ ఢ ேሺሻ ݉݅ݏሺ݉, ݒሻ ݎ௨,௩
∑௩ ఢ ேሺሻ|݉݅ݏሺ݉, ݒሻ|
ሺ2ሻ
Where ܰሺ݉ሻ is the ordered set of movies which are most similar to ݉ and rated by user .ݑIf
the predicted rating is more than 3, than we consider that the user will like the movie otherwise it is
considered that user will dislike the movie.
4. PROPOSED APPROACH
As stated earlier, we have used Hetrec2011-movielens-2k dataset. From this dataset, first of
all, we have constructed user-movie rating matrix and user-movie-tag matrix. User-movie rating
matrix stores ratings of users to movies while user-movie-tag matrix stores number of times a tag is
assigned to the movie by the user. A third matrix, user-movie sub-rating matrix is then constructed
from the two matrices. This matrix stores only those movies for which every user provided a tag as
well as a rating.After preparing matrices as above, under mentioned steps are followed.Our proposed
approach is inspired from the work done in [1] and [11].
75
4. International Journal of Advanced Research in Engineering and Technology (IJARET),
ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 1, January (2014), © IAEME
4.1 Movie’s Tag Profile Generation
All the users that tagged movies in hetrec2011-movielens-2k dataset [12][3][18][19] are
confined in the user set ܷ ൌ ሼݑଵ , ݑଶ , . . . , |ݑ| ሽ. All the movies from the corpus are contained in the
movie set ܯൌ ሼ݉ଵ , ݉ଶ , . . . , ݉|ெ| ሽ, while all the tags used by the users in ܷ to label movies in ܯ
are enclosed in the tag set ܶ ൌ ሼݐଵ , ݐଶ , . . . , |்|ݐሽ. Finally, ܴ ൌ ሼ0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5ሽis
used to denote the set of all possible ratings that users can give. Following steps are performed to
construct Movie’s Tag Profile.
• Calculate the relevance of a tag to a movie as a weight.
• Calculate relevance of a tag to a user as a weight.
• Estimate relatedness between two tags using these weights.
• Construct the tag profile of the movie using relatedness.
4.1.1 Relevance of a Tag to a Movie
To find relevance of a tag to a movie which captures ratings in addition to the tag, following
equation (3) is proposed in [1].
ݓ ሺݐ௫ ሻ ൌ
∑௨ೕ ఢ
, ೣ
ݎ௨ೕ ,௧ೣ ሺ݉ ሻ
∑௨ೕ ఢ ,௧்א ݎ௨ೕ,௧ ሺ݉ ሻ
ሺ3ሻ
where the numerator represents a summation of the ratings ݎ௨ೕ,௧ೣ ሺ݉ ሻ assigned to the movie
݉ byall the users ݑ who used ݐ௫ to annotate it.ܷ , ௧ೣ denotes the set of users who usedݐ௫ to tag ݉ .
A summation of all the ratings from the users whotagged ݉ is represented by the denominator. The
true popularity of the tag ݐ௫ with respectto a movie ݉ is now captured by the value of ݓ ሺݐ௫ ሻ.
4.1.2 Relevance of a Tag to a User
In [1], relevance of a tag to a user which signifies how strongly the user feels about the tag is
defined as stated in equation (4).
ݓ௨ ሺݐ௫ ሻ ൌ
∑ೕ ఢெೠ
, ೣ
ݎ௨ೕ,௧ೣ ൫݉ ൯
∑ೕ ఢெೠ ,௧்אೠ ݎ௨ ,௧ ൫݉ ൯
ሺ4ሻ
where a summation of the ratings assigned to the movie ݉ by all the userswho used ݐ௫ to
annotate it is represented by the numerator, and the summation over,all ratings assignedto the movie
݉ by all the users who tagged it is signified by the denominator.
4.1.3 Tag Relatedness Metric for the Movie
We can calculate the relatednessof two tags with respect to a movie given the relevance of a
tag with respect to the user. The relatedness metric is used to avoid semantic ambiguity while
constructing the movieprofiles.The relatedness metric between two tags ݐ௫ and ݐ௬ is denoted
byܿ ሺݐ௫ , ݐ௬ ሻ) and it represents the degree of correspondence (orconnection) between tags with
respect to movie ݉ . Itmeasures similarity between tagsݐ௫ and ݐ௬ in the contextof themovie ݉ . The
formula to calculate tag relatedness metric is given in equation (5).
ܿ ൫ݐ௫ , ݐ௬ ൯ ൌ
1
หܷ , ௧ೣ ห
௨ೕ ఢ ,ೣ
76
ݓ௨ೕ ൫ݐ௬ ൯
ሺ5ሻ
5. International Journal of Advanced Research in Engineering and Technology (IJARET),
ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 1, January (2014), © IAEME
4.1.4 Movie’s Tag Profile
Assuming tagݐ௬ as the representative for the movie ݉ , the weight or relevance of tag ݐ௬ to
the movie ݉ is calculated as a summation of relatedness between the tags used by movie ݉ (i.e.,
ݐ௫ ܶ א ) and target tagݐ௬ . The total relevance weight of ݐ௬ for the movie ݉ is denoted as ܹ ሺݐ௬ ሻ.
It is defined in equation (6).
ܹ ൫ݐ௬ ൯ ൌ ݓ ሺݐ௫ ሻ ܿ ൫ݐ௫ , ݐ௬ ൯
௧ೣ ఢ்
ሺ6ሻ
Similar to the concept of the inverse document frequency (IDF) in information retrieval,to
measure the general importance of the tag inthe topic preference identification of the movie,a tag’s
occurrence for all movies must be taken into consideration.We denote ݂݅݉ሺݐ௬ ሻ as the inverse movie
frequency of tag ݐ௬ and it is defined in equation (7).
݂݅݉ሺݐ௬ ሻ ൌ
1
ሺ7ሻ
log ቀ݁ ቚܯ௧ ቚቁ
Where ቚܯ௧ ቚ is the number of movies that is tagged with ݐ௬ and ݁ is the Euler’s number. It is easy to
note that 0 ݂݅݉൫ݐ௬ ൯ 1. Tag profile for each movie is then defined as in equation (8).
ܯ் ൌ ൛ܹ ሺݐ௬ ሻ · ݂݅݉ሺݐ௬ ሻ|ݐ௬ ܶ אൟ
ሺ8ሻ
4.2 Movie’s Rating Profile Generation
Movie’s user preference takes the popularity ofmovie into the consideration for two movies
and it is given in equation (9).
݉݅ݏ ൫݉ , ݉ ൯ ൌ
∑௨ೖ ఢ
תೕ
݂݅݉ሺݑ ሻ
ටหܷ ห ቚܷೕ ቚ
ሺ9ሻ
Where, หܷ ห specifies the number of users who have tagged movie ݉ , ݂݅݉ሺݑ ሻ designates the
inverse movie frequency of user ݑ and it is defined in equation (10).
݂݅݉ሺݑ ሻ ൌ
1
log ሺ݁ |ܯ௨ೖ |ሻ
ሺ10ሻ
Where ܯ௨ೖ indicates the number of movies which have been tagged by user ݑ .
4.3 Neighborhood Formation
In order to predict user’s rating for an unseen movie m, we first find the list of movies similar
to m. The fundamentalidea is to recognize for each movie݉, an ordered list of ܰ most similar
movies, ܯൌ ሼ݉ଵ , ݉ଶ , … , ݉ே ሽ such that ݉ ܯ אand ݉݅ݏሺ݉, ݉ଵ ሻ is maximum, ݉݅ݏሺ݉, ݉ଶ ሻ is the
second highest and so on. The ܰ-nearest movies are selected based on the similarity value.
Each movie is encoded with its own topic preferencesand user preferences,where topic preferences
are captured by tags while user preferences are captured by ratings.The similarity between two
77
6. International Journal of Advanced Research in Engineering and Technology (IJARET),
ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 1, January (2014), © IAEME
்
movies ݉ and ݉ based on tags is denoted as ݉݅ݏ ൫݉ , ݉ ൯where ܶ is the setsof tags used to tag
movie ݉ and ݉ .We use Pearson correlation coefficient to measure similaritybetween two movies
்
which are represented by the set of allweighted tags. ݉݅ݏ ൫݉ , ݉ ൯ is defined in equation (11).
்
݉݅ݏ ൫݉ , ݉ ൯ ൌ
∑௧ ఢ ்൫ݓ௧ , െ ݓ௧ ቀݓ௧ ,ೕ െ ݓ௧
തതത൯
തതതቁ
ଶ
ට∑௧ ఢ ் ൫ݓ௧ , െ ݓ௧ ∑௧ ఢ ் ቀݓ௧ , െ ݓ௧
തതത൯
തതതቁ
ೕ
ଶ
ሺ11ሻ
Whereas, the similarity between two movies based on user movie preference is denoted as
ܷ is theset of all users.
Given the tag and rating profiles of movies ݉ and ݉ , the similarity between these two
based on the tag and rating profile is given by equation (12).
݉݅ݏ ൫݉ , ݉ ൯where,
்
்
்
݉݅ݏ൫݉ , ݉ ൯ ൌ ߱ · ݉݅ݏ ൫݉ , ݉ ൯ ሺ1 െ ߱ሻ · ݉݅ݏ ൫݉ , ݉ ൯
ሺ12ሻ
߱ is a weighting parameter such that 0 ߱ 1. It controls the extent of the collaborative
dimension of thealgorithm. As we decrease the value of ߱ the algorithm will bepredominantly
collaborative, as the contribution of the moviesuser preferences will dominate. During the
experimental phase,we have varied߱ to see the impact on the quality of recommendations. We have
also found similarity between movies from their genre, star cast and director profile. We have also
experimented with combinations of these profiles in calculation of similarity between movies. In
star-cast profile of an item, we have included only first five actors of each movie according to the
order in which they appear on the movie IMDB cast page. Pearson correlation is used throughout to
calculate similarity between movies.
Figure 1. Conceptual Flow of the Proposed Framework
78
7. International Journal of Advanced Research in Engineering and Technology (IJARET),
ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 1, January (2014), © IAEME
4.4 Rating Prediction Formula
To predict the rating of user ݑfor an unseen movie ݉, we have used the formula as given in
equation (2). If the predicted rating is more than 3, than we consider that the user will like the movie
otherwise it is considered that user will dislike the movie. These steps are summarized in the Fig. 1.
5. EMPIRICAL EVALUATION
In this section, we discuss about the dataset used, experimental methodology and measures
used to evaluate our system.
5.1 Dataset
We have used the dataset hetrec2011-movielens-2k [12][3][18][19] dated May 2011 in our
experiments. Cantador et al. [12] have made it available to the public. It is based on the original
MovieLens10Mdataset, published by the Group Lens research group. In this dataset, movies
alsorefer to their corresponding web pages at the IMDB website. The dataset contains 2,113 users,
10,197 movies and a total of 13,222 uniquetags.These tags fall into 47,957 tag assignment tuples of
the form [user, tag, movie]. It also contains855,598 user ratings ranging from 0.5 to 5.0, in
increments of 0.5, leading to a total of 10 distinctrating values. There is an average of 405 ratings per
user, and 85 per movie. There are 20 genre types, 20,809movie-genre assignments, 4060 directors
and 95321 actors.There are average 22 actors per movie. We have preprocessed the data to construct
user-movie rating matrix and user-movie-tag matrix.A third matrix, user-movie sub-rating matrix is
then constructed from the two matrices. This matrix stores rating of only those movies which have
been tagged also.In construction of star-cast profile of movie, only those actors who have worked in
more than 2 movies are considered. This data sethas been previously used in [13][14][15].
5.2 Evaluation Measure
Accuracy and f-measure is used as the evaluation measures in our work. Accuracy is the ratio
of the number of correctly classified instances in the test set to the total number of instances in the
test set. In our work, we consider user liking a movie as positive class while user disliking a movie as
negative class. In this sense, true positive (TP), false negative (FN), false positive (FP) and true
negative (TN) are defined as under [16].
TP: the number of correct classifications of the positive instances
FN: the number of incorrect classifications of the positive instances
FP: the number of incorrect classifications of the negative instances
TN: the number of correct classifications of the negative instances
Based on the above formulations precision ( )and recall ( )ݎare defined in equations (13) and (14)
respectively.
ܶܲ
ൌ
ሺ13ሻ
ܶܲ ܲܨ
ݎൌ
ܶܲ
ܶܲ ܰܨ
ሺ14ሻ
F-measure (F) is used to compare classifier on a single measure and it is represented by the
equation (15)
2ݎ
ሺ15ሻ
ܨൌ
ݎ
79
8. International Journal of Advanced Research in Engineering and Technology (IJARET),
ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 1, January (2014), © IAEME
Precision, recall and f-measure for positive examples is calculated using above formulations.
We also compute the same for negative examples and then report their weighted average in the
results section.
5.3 Experimental Methodology
To evaluate and compare outcome of our experiments, we have carried out 5-fold cross
validation for all the experiments performed by us.For each of the experiment we have selected 20
items as the target items. These items are rated and tagged by minimum of 20 and maximum of 50
users. Following experiments are carried out by us.
• Experiment 1: Basic item-based collaborative filtering where similarity between movies is
found using user-movie rating matrix and predictions are made using user-movie rating
matrix.
• Experiment 2: Hybrid filtering where similarity between movies is found using genre profile
of the movies and predictions are made using user-movie rating matrix.
• Experiment 3: Hybrid filtering where similarity between movies is found using genre and star
cast profile of the movies and predictions are made using user-movie rating matrix.
• Experiment 4: Hybrid filtering where similarity between movies is found using genre, star
cast and director profile of the movies and predictions are made using user-movie rating
matrix.
• Experiment 5: Hybrid filtering where similarity between movies is found using Boolean tag
profile of the movies and predictions are made using user-movie rating matrix.
• Experiment 6: Hybrid filtering where similarity between movies is found using bag-of-words
tag profile of the movies and predictions are made using user-movie rating matrix.
• Experiment 7: Hybrid filtering where similarity between movies is found using termfrequency (TF) [17] tag profile of the movies and predictions are made using user-movie
rating matrix.
• Experiment 8: Hybrid filtering where similarity between movies is found using termfrequency inverse document frequency (TFIDF) [17] tag profile of the movies and
predictions are made using user-movie rating matrix.
• Experiment 9: Hybrid filtering where similarity between movies is found by setting ߱ ൌ 0.9
in equation (12) and predictions are made using user-movie sub-rating matrix.
• Experiment 10: Hybrid filtering where similarity between movies is found by setting ߱ ൌ
0.9in equation (12) and predictions are made using user-movie rating matrix.
• Experiment 11: Hybrid filtering where similarity between movies is found by setting ߱ ൌ
1.0in equation (12) and predictions are made using user-movie rating matrix.
• Experiment 12: Hybrid filtering where similarity between movies is found by modeling
movie profiles as combination of tag profiles (߱ ൌ 1.0 in equation (12)) and ratings and
predictions are made using user-movie rating matrix.
• Experiment 13: Hybrid filtering where similarity between movies is found by modeling
movie profiles as combination of tag profiles (߱ ൌ 1.0 in equation (12)) and genre profile
and predictions are made using user-movie rating matrix.
• Experiment 14: Hybrid filtering where similarity between movies is found by modeling
movie profiles as combination of ratings and genre profile and predictions are made using
user-movie rating matrix.
• Experiment 15: Hybrid filtering where first we find similarity between movies using movie’s
tag profile and then we find similarity between movies using their genre profile. To compute
the final similarity between movies we combine these two similarities with weight 0.8 and
0.2 respectively.
80
9. International Journal of Advanced Research in Engineering and Technology (IJARET),
ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 1, January (2014), © IAEME
6. RESULTS AND DISCUSSION
Performance of the recommender under various settings as discussed in the last section is
shown in Table 1. We have experimented with varying size of the neighbourhood but due to space
limitation, for each of the technique, we report results for that size of neighbourhood where it has
performed the best.It is evident from the result that hybrid recommender system outperforms the
basic item-based collaborative filtering in all settings apart from that in experiment 5,6,7 and 9. The
approach proposed in [1] uses the user-movie sub-rating matrix for the calculation of rating to be
predicted. We proposed to use user-movie rating matrix to calculate ratings to be predicted. Use of
user-movie sub-rating matrix is obvious for the construction of movie profile and finding similarity
between movies but we advocate using user-movie rating matrix rather than user-movie sub-rating
matrix during the phase of rating prediction. This allows us to predict rating based on more number
of ratings which leads to improvement in the performance.
7. CONCLUSION & FUTURE WORK
We propose an item-based hybrid filtering approach which combines usage,tag and content
data of movies. Movie recommendation task is modelled asclassification problem where our aim is
to predict whetherthe user will like or dislike the movie.MovieRecommender system proposed by us
exploits movie specificdata such as movie genres, star cast and directors in addition to the ratings
and tags. Results show that combining the right type of data in the right manner in the phases of
constructing the item profile and calculating the item similarity improves the quality of
recommendations. It is also seen that using user-movie rating matrix rather than user-movie subrating matrix for predictions improves the quality of recommendations. In future, we plan to model
the problem in machine learning framework.
Table 1. Experimental Results
Experiment
Number of Nearest
Neighbours
Accuracy
F-measure
Experiment 1
Experiment 2
Experiment 3
Experiment 4
Experiment 5
Experiment 6
Experiment 7
Experiment 8
Experiment 9
Experiment 10
Experiment 11
Experiment 12
Experiment 13
Experiment 14
Experiment 15
100
100
500
500
40
40
40
40
5
20
20
100
100
100
10
0.7024
0.7198
0.7454
0.7454
0.7101
0.7090
0.7090
0.7465
0.6698
0.7570
0.7570
0.7117
0.7430
0.7198
0.7726
0.6880
0.7220
0.7510
0.7510
0.6775
0.6567
0.6567
0.6956
0.5957
0.7384
0.7384
0.6921
0.7258
0.7220
0.7511
81
10. International Journal of Advanced Research in Engineering and Technology (IJARET),
ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 1, January (2014), © IAEME
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
Swapnill Nagar, A Hybrid Recommender: User Profiling from Tags/Keywords and Ratings
Master’s Thesis, Rajiv Gandhi Technical University, 2012.
http://www.netflix.com
http://www.imdb.com
http://www.hulu.com
Alexander Tuzhilin, Gedimin as Adomavicius, Toward the Next Generation of Recommender
Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on
Knowledge and Data Engineering, Volume 17(6), pages 734-749, June 2005.
M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes, and M. Sartin, Combining ContentBased and Collaborative Filters in an Online Newspaper, Proc. ACM SIGIR ’99 Workshop
Recommender Systems: Algorithms and Evaluation, Aug. 1999.
M. Pazzani, A Framework for Collaborative, Content-Based, and Demographic Filtering, Artificial
Intelligence Rev., pages 393-408, Dec.1999.
P. Melville, R.J. Mooney, and R. Nagarajan, Content-Boosted Collaborative Filtering for
Improved Recommendations, Proc.18th Nat’l Conf. Artificial Intelligence, 2002.
I. Soboroff and C. Nicholas, Combining Content and Collaboration in Text Filtering, Proc. Int’l
Joint Conf. Artificial Intelligence Workshop: Machine Learning for Information Filtering,
Aug.1999.
C. Basu, H. Hirsh, and W. Cohen, Recommendation as Classification: Using Social and ContentBased Information in Recommendation, Recommender Systems. Papers from 1998 Workshop,
Technical Report WS-98-08, AAAI Press 1998.
Huizhi Liang, Yue Xu, Yuefeng Li, Richinayak, Gavin Shaw, A Hybrid Recommender Systems
Based on Weighted Tags, International Conferenceon Data Mining (SDM2010), May2011.
Cantador, P. Brusilovsky, and T. Kuflik, Second Workshop on Information Heterogeneity and
Fusion in Recommender Systems (Hetrec 2011), In Proceedings of the Fifth ACM Conference on
Recommender Systems, pages 387-388. ACM, 2011.
E. Bothos, K. Christidis, D. Apostolou, and G. Mentzas, Information Market Based Recommender
Systems Fusion. In Proceedings of the 2nd International Workshop on Information Heterogeneity
and Fusion in Recommender Systems, pages 1-8, ACM, 2011.
A. Said, E.W. De Luca, B. Kille, B. Jain, I. Micus, and S. Albayrak. Kmule, A Framework for
User-Based Comparison of Recommender Algorithms. In Proceedings of the 2012 ACM
International Conference on Intelligent User Interfaces. ACM, 2012.
C. Jones, J. Ghosh, and A. Sharma. Learning multiple models for exploiting predictive
heterogeneity in recommender systems. In Proceedings of the 2nd International Workshop on
Information Heterogeneity and Fusion in Recommender Systems, HetRec '11, pages 17{24, New
York, NY, USA, 2011. ACM.
Bing Liu, Web Data Mining – Exploring Hyperlinks, Contents, and Usage Data, Springer, 2007.
Zdravko Markov, Daniel T. Larose, Data Mining the Web – Uncovering Patterns in Web Content,
Structure and Usage, Wiley-Interscience, A John Wiley & Sons, Inc., Publication, 2007.
http://www.grouplens.org
http://www.rottentomatoes.com
Paulo J. G. Lisboa, Huda Naji Nawaf and Wesam S. Bhaya, “Recommendation System Based on
Association Rules Applied to Consistent Behavior Over Time”, International Journal of Computer
Engineering & Technology (IJCET), Volume 4, Issue 4, 2013, pp. 412 - 421, ISSN Print:
0976 – 6367, ISSN Online: 0976 – 6375.
Anuj Verma and Kishore Bhamidipati, “A Survey of Memory Based Methods for Collaborative
Filtering Based Techniques for Online Recommender Systems”, International Journal of Computer
Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 366 - 372, ISSN Print:
0976 – 6367, ISSN Online: 0976 – 6375.
82