SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Social Relation Based Scalable
 Semantic Search Refinement
  Yi Zeng1, Xu Ren1, Yulin Qin1,2, Ning Zhong1,3,
         Zhisheng Huang4, Yan Wang1

1. International WIC Institute, Beijing University of Technology, China
                   2. Carnegie Mellon University, USA
               3. Maebashi Institute of Technology, Japan
            4. Vrije University Amsterdam, the Netherlands




                                                                          1
Motivation
•   Vague/Incomplete queries over large scale semantic data
    (How to get more refined queries to reduce the size of the result set?).
•   Large scale semantic data vs most relevant data for a specific user

                  Diversity for different users in the
                 context of large scale semantic data


          User interests                        Network of friends,
                                                collaborators, etc.

        Interests based                         Search refinement
       search refinement                    through social relationship


               Group interests based search refinement

                                                                          2
Social Relations and Social Networks
             • Most of the social networks follow the power law distribution.
             • Using the FOAF vocabularies, the DBLP coauthor network is created.




    Fig. 1: Coauthor number distribution in     Fig. 2: log-log diagram of Figure 1.
    the SwetoDBLP dataset.

•    Approximate power law distribution         not many authors who have a lot of
     coauthors, and most of the authors are with very few coauthors.
•    Considering the scalability issue, when the number of authors expand
     rapidly, it will not hard to rebuild the coauthor network since most of the
     authors will just have a few links.
                                                                                       3
Search Refinement through
                      Social Relationship
Table 1: A partial result of the expert finding search task         Domain experts
“Artificial Intelligence authors”(User name: John McCarthy).           dataset


Satisfied Authors without     Satisfied Authors with
social relation refinement   social relation refinement             User URIs

  Carl Kesselman (312)       Hans W. Guesgen (117) *
 Thomas S. Huang (271)         Virginia Dignum (69) *        Coauthor Network
                                                                 dataset
  Edward A. Fox (269)          John McCarthy (65) *
    Lei Wang (250)              Aaron Sloman (36) *   Bridging two separate
 John Mylopoulos (245)         Carl Kesselman (312)   datasets together and help to
   Ewa Deelman (237)          Thomas S. Huang (271) refine the expert finding task.
           ...                            ...
In an enterprise setting, if the found experts have some previous
relationship with the employer, the cooperation may be smoother.
                                                                                     4
Social Network based Interest Retention
    Models for Search Refinement




                                          5
Obtaining the Retained Interests
 •  Are retained interests appeared more frequently than others?
  (Frequency) Total Interest : TI (i ) = ∑n m(i, j )
                                          j =1
 • Except for frequency, what else is important to correctly obtain retained
    interests?
    Forgetting mechanism in cognitive memory retention
   (exponential function model, power function model) [Anderson, Schooler 1991].

     (Frequency and Recency) Memory Retention:
                                    P = Ae−bT ; P = AT −b




Pictures from: [Schooler 1993] Schooler, L. J. & Anderson, J. R.: Recency and Context: An
Environmental Analysis of Memory. In Proceedings of the Fifteenth Annual Conference of the
Cognitive Science Society, pp. 889-894, 1993.
                                                                                             6
Obtaining the Retained Interests
•   (Frequency and Recency) Exponential Model for Interest Retention :
    EIR(i ) = ∑ j =1 m(i, j ) × Ae
                     n                   − bTi , j



•   (Frequency and Recency) Power Model for Interest Retention :

    PIR(i ) = ∑ j =1 m(i, j ) × ATi , j − b
                    n




    [Zeng 2009a] Cognitive Memory Retention Based Starting Point for Query Extension and
    Granular Selection, Yi Zeng, Haiyan Zhou, Ning Zhong, Yulin Qin, Shengfu Lu, Yiyu Yao, Yang
    Gao. In: Cognitive Memory Component (v1), LarKC deliverable 2-3-1, Coordinated by Jose
    Quesada and Yi Zeng, March 30, 2009.
    [Zeng 2009b] Yi Zeng, Yiyu Yao, Ning Zhong. DBLP-SSE: A DBLP Search Support Engine, In:
    Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE
    Computer Society, Milan, Italy, September 15-18, 2009.
    [Maanen 2009] Leendert van Maanen, Julian N. Marewski.: Recommender Systems for
    Literature Selection: A Competition between Decision Making and Memory Models, CogSci 2009,
    July 31-August 1, 2009.

                                                                                                  7
Obtaining the Retained Interests
                                                                          •   To some extend, current
                                                                              interests are relevant to
                                                                              interest retention.
                                                                              Using the power law
                                                                              model, under A=0.855,
                                                                              and b=1.295, we
                                                                              selected all the authors
                                                                              whose publication
                                                                              numbers are above 100,
    Figure 7a: A comparative study of          Figure 7b: Difference on       and we predict their top 9
    total research interests from 1990 to      the contribution values        interests from 2000 to
    2008 and retained interests in 2009        from papers published in       2007 using interest
    (based on both the power law and           different years
    exponential law models)                                                   retention (1226 persons).
                                                                              49.54% of this samples
                                                                              can predict 3 out of 9
                                                                              interests.
                                              • We analyzed research Interest retention for all
                                              the 615,124 computer scientists based on the
                                              SwetoDBLP dataset. We released the “computer
                                              scientists’ research interest RDF dataset :
                                                                http://www.iwici.org/dblp-sse
Figure 7c: A Comparison of Total Interests and Interest Retentions http://wiki.larkc.eu/csri-rdf
of the author “Ricardo A. Baeza-Yates”. (Nov, 2009 from DBLP)                                      8
Network
                                                                                          Link
             Retained Interests Search                                               Search
                                                                                                                PageRank
          in a Social Environment
                                   Information    Retrieval                Web
                                                                         Web                  Carlos Castillo
Group Retained Interests :                       Query                       Content
                                                                                                                     Spam
• Diversity               Challenge                       Ricardo A. Baeza-Yates
• Consistency                                    Engine        Mining               Analysis Analysis           Detection

Group Retained Interest :                            Top 9 Retained          Top 9 Group Retained
                                                        Interests                  Interests
             ⎧1 (i ∈ RItop 9 )
             ⎪          p
                                                 Web              7.81       Search               35
 E (i, p ) = ⎨                 ,                 Search           5.59       Retrieval            30
             ⎪0 (i ∉ RI p )
                        top 9
             ⎩                                   Retrieval        3.19       Web                  28
 GIR (i ) = ∑ p =1 E (i, p ),
                n
                                                 Information      2.27       Information          26
                                                 Query            2.14       System               19
For most prolific authors in DBLP                Engine           2.10       Query                18
(publication number >50):
                                                 Minining         1.26       Analysis             14
5161 persons
                                                 Challenge        …          Text                 …
On average, 52.55% of an
individual’s retained interests are              Analysis         …          Model                …
consistent with his/her group                    Top 9 interests retention of a user and his group
                                                 interests retention. (Ricardo A. Baeza-Yates,
retained interests.                              based on May 2008 version of SwetoDBLP).                           9
Search Refinement by Interests
               from Different Perspectives
•   Vague/incomplete queries may produce too many results that the
    users have to wade through.

•   Research interests may be very related with search tasks.

•    Research interests can be evaluated from various perspectives.
    (1) Total Interests;
    (2) Retained Interests;
    (3) Co-author Group retained interests;




                                                                      10
Refinement with Retained interests,
     group retained interests

                             8 requests to DBLP authors
                                 were sent out.
                             7 replied.

                             Participants 7 DBLP authors:
                             • Preference order 100% :
                                 List 2, List 3    List 1
                             •   Preference order 100% :
                                 List 2 ≈ List 3
                             •   Preference order 83.3% :
                                 List 2 > List 3    List 1
                             •   Preference order 16.7% :
                                 List 3 > List 2    List 1




                                                             11
Future Research




                  12
Semantic Similarity
                  ---- Obtaining More Accurate Interest Descriptions and
                             Observations of Interest Dynamics
                                                                         Network
                                                               Link
                                                        Search
                                    Search
                                                                                 PageRank       search      retrieval   0.645
                                                                                                search      query       0.552
    Information    Retrieval                   Web
                                             Web                  Carlos Castillo               search      pagerank    0.813
                  Query                          Content
                                                                                      Spam      retrieval   query       0.467
   Challenge                Ricardo A. Baeza-Yates                                              retrieval   pagerank    0.293
                                                       Analysis Analysis            Detection
                  Engine         Mining
                                                                                                query       pagerank    0.098
Figure 14. Consistent interests without consideration of semantic
similarity.                                                                                     logic       reasoning   0.667
                                                        Network
                                                           Link
                                                                                                logic       inference   0.606
                                                      Search
                                 Search                                          PageRank       reasoning   inference   0.909
                Retrieval                                                                       ontology    OWL         0.805
 Information                                Web
                                          Web                  Carlos Castillo                  Table . Some examples on
        Query                                 Content                                           semantic similarities based on
                                                                                      Spam
Challenge                                                                                       Normalized Google Distance.
                        Ricardo A. Baeza-Yates
                                                     Analysis Analysis           Detection
               Engine          Mining
Figure 15. Consistent interests with consideration of semantic
similarity.                                                                                                             13
Thank You!




             14

Weitere ähnliche Inhalte

Ähnlich wie Social Relation Based Scalable Semantic Search Refinement

Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...Yi Zeng
 
DBLP-SSE: A DBLP Search Support Engine
DBLP-SSE: A DBLP Search Support EngineDBLP-SSE: A DBLP Search Support Engine
DBLP-SSE: A DBLP Search Support EngineYi Zeng
 
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...Sampath Jayarathna
 
Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsSimon Knight
 
Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Behrang Mehrparvar
 
Toward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k ProcessingToward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k Processingasapteam
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU LIBER Europe
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory acijjournal
 
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...summersocialwebshop
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collectiondnac
 
Extracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesExtracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesSuvodeep Mazumdar
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 

Ähnlich wie Social Relation Based Scalable Semantic Search Refinement (20)

Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...
 
DBLP-SSE: A DBLP Search Support Engine
DBLP-SSE: A DBLP Search Support EngineDBLP-SSE: A DBLP Search Support Engine
DBLP-SSE: A DBLP Search Support Engine
 
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...
 
Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic Commitments
 
Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)
 
Toward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k ProcessingToward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k Processing
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Ap26261267
Ap26261267Ap26261267
Ap26261267
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
119 128
119 128119 128
119 128
 
119 128
119 128119 128
119 128
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Extracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesExtracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication Exchanges
 
Extracting Semantic
Extracting Semantic Extracting Semantic
Extracting Semantic
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 

Kürzlich hochgeladen

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Social Relation Based Scalable Semantic Search Refinement

  • 1. Social Relation Based Scalable Semantic Search Refinement Yi Zeng1, Xu Ren1, Yulin Qin1,2, Ning Zhong1,3, Zhisheng Huang4, Yan Wang1 1. International WIC Institute, Beijing University of Technology, China 2. Carnegie Mellon University, USA 3. Maebashi Institute of Technology, Japan 4. Vrije University Amsterdam, the Netherlands 1
  • 2. Motivation • Vague/Incomplete queries over large scale semantic data (How to get more refined queries to reduce the size of the result set?). • Large scale semantic data vs most relevant data for a specific user Diversity for different users in the context of large scale semantic data User interests Network of friends, collaborators, etc. Interests based Search refinement search refinement through social relationship Group interests based search refinement 2
  • 3. Social Relations and Social Networks • Most of the social networks follow the power law distribution. • Using the FOAF vocabularies, the DBLP coauthor network is created. Fig. 1: Coauthor number distribution in Fig. 2: log-log diagram of Figure 1. the SwetoDBLP dataset. • Approximate power law distribution not many authors who have a lot of coauthors, and most of the authors are with very few coauthors. • Considering the scalability issue, when the number of authors expand rapidly, it will not hard to rebuild the coauthor network since most of the authors will just have a few links. 3
  • 4. Search Refinement through Social Relationship Table 1: A partial result of the expert finding search task Domain experts “Artificial Intelligence authors”(User name: John McCarthy). dataset Satisfied Authors without Satisfied Authors with social relation refinement social relation refinement User URIs Carl Kesselman (312) Hans W. Guesgen (117) * Thomas S. Huang (271) Virginia Dignum (69) * Coauthor Network dataset Edward A. Fox (269) John McCarthy (65) * Lei Wang (250) Aaron Sloman (36) * Bridging two separate John Mylopoulos (245) Carl Kesselman (312) datasets together and help to Ewa Deelman (237) Thomas S. Huang (271) refine the expert finding task. ... ... In an enterprise setting, if the found experts have some previous relationship with the employer, the cooperation may be smoother. 4
  • 5. Social Network based Interest Retention Models for Search Refinement 5
  • 6. Obtaining the Retained Interests • Are retained interests appeared more frequently than others? (Frequency) Total Interest : TI (i ) = ∑n m(i, j ) j =1 • Except for frequency, what else is important to correctly obtain retained interests? Forgetting mechanism in cognitive memory retention (exponential function model, power function model) [Anderson, Schooler 1991]. (Frequency and Recency) Memory Retention: P = Ae−bT ; P = AT −b Pictures from: [Schooler 1993] Schooler, L. J. & Anderson, J. R.: Recency and Context: An Environmental Analysis of Memory. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pp. 889-894, 1993. 6
  • 7. Obtaining the Retained Interests • (Frequency and Recency) Exponential Model for Interest Retention : EIR(i ) = ∑ j =1 m(i, j ) × Ae n − bTi , j • (Frequency and Recency) Power Model for Interest Retention : PIR(i ) = ∑ j =1 m(i, j ) × ATi , j − b n [Zeng 2009a] Cognitive Memory Retention Based Starting Point for Query Extension and Granular Selection, Yi Zeng, Haiyan Zhou, Ning Zhong, Yulin Qin, Shengfu Lu, Yiyu Yao, Yang Gao. In: Cognitive Memory Component (v1), LarKC deliverable 2-3-1, Coordinated by Jose Quesada and Yi Zeng, March 30, 2009. [Zeng 2009b] Yi Zeng, Yiyu Yao, Ning Zhong. DBLP-SSE: A DBLP Search Support Engine, In: Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, Milan, Italy, September 15-18, 2009. [Maanen 2009] Leendert van Maanen, Julian N. Marewski.: Recommender Systems for Literature Selection: A Competition between Decision Making and Memory Models, CogSci 2009, July 31-August 1, 2009. 7
  • 8. Obtaining the Retained Interests • To some extend, current interests are relevant to interest retention. Using the power law model, under A=0.855, and b=1.295, we selected all the authors whose publication numbers are above 100, Figure 7a: A comparative study of Figure 7b: Difference on and we predict their top 9 total research interests from 1990 to the contribution values interests from 2000 to 2008 and retained interests in 2009 from papers published in 2007 using interest (based on both the power law and different years exponential law models) retention (1226 persons). 49.54% of this samples can predict 3 out of 9 interests. • We analyzed research Interest retention for all the 615,124 computer scientists based on the SwetoDBLP dataset. We released the “computer scientists’ research interest RDF dataset : http://www.iwici.org/dblp-sse Figure 7c: A Comparison of Total Interests and Interest Retentions http://wiki.larkc.eu/csri-rdf of the author “Ricardo A. Baeza-Yates”. (Nov, 2009 from DBLP) 8
  • 9. Network Link Retained Interests Search Search PageRank in a Social Environment Information Retrieval Web Web Carlos Castillo Group Retained Interests : Query Content Spam • Diversity Challenge Ricardo A. Baeza-Yates • Consistency Engine Mining Analysis Analysis Detection Group Retained Interest : Top 9 Retained Top 9 Group Retained Interests Interests ⎧1 (i ∈ RItop 9 ) ⎪ p Web 7.81 Search 35 E (i, p ) = ⎨ , Search 5.59 Retrieval 30 ⎪0 (i ∉ RI p ) top 9 ⎩ Retrieval 3.19 Web 28 GIR (i ) = ∑ p =1 E (i, p ), n Information 2.27 Information 26 Query 2.14 System 19 For most prolific authors in DBLP Engine 2.10 Query 18 (publication number >50): Minining 1.26 Analysis 14 5161 persons Challenge … Text … On average, 52.55% of an individual’s retained interests are Analysis … Model … consistent with his/her group Top 9 interests retention of a user and his group interests retention. (Ricardo A. Baeza-Yates, retained interests. based on May 2008 version of SwetoDBLP). 9
  • 10. Search Refinement by Interests from Different Perspectives • Vague/incomplete queries may produce too many results that the users have to wade through. • Research interests may be very related with search tasks. • Research interests can be evaluated from various perspectives. (1) Total Interests; (2) Retained Interests; (3) Co-author Group retained interests; 10
  • 11. Refinement with Retained interests, group retained interests 8 requests to DBLP authors were sent out. 7 replied. Participants 7 DBLP authors: • Preference order 100% : List 2, List 3 List 1 • Preference order 100% : List 2 ≈ List 3 • Preference order 83.3% : List 2 > List 3 List 1 • Preference order 16.7% : List 3 > List 2 List 1 11
  • 13. Semantic Similarity ---- Obtaining More Accurate Interest Descriptions and Observations of Interest Dynamics Network Link Search Search PageRank search retrieval 0.645 search query 0.552 Information Retrieval Web Web Carlos Castillo search pagerank 0.813 Query Content Spam retrieval query 0.467 Challenge Ricardo A. Baeza-Yates retrieval pagerank 0.293 Analysis Analysis Detection Engine Mining query pagerank 0.098 Figure 14. Consistent interests without consideration of semantic similarity. logic reasoning 0.667 Network Link logic inference 0.606 Search Search PageRank reasoning inference 0.909 Retrieval ontology OWL 0.805 Information Web Web Carlos Castillo Table . Some examples on Query Content semantic similarities based on Spam Challenge Normalized Google Distance. Ricardo A. Baeza-Yates Analysis Analysis Detection Engine Mining Figure 15. Consistent interests with consideration of semantic similarity. 13